Article

Tackling concept drift by temporal inductive transfer

Author:

George FormanAuthors Info & Claims

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 252 - 259

https://doi.org/10.1145/1148170.1148216

Published: 06 August 2006 Publication History

Abstract

Machine learning is the mainstay for text classification. However, even the most successful techniques are defeated by many real-world applications that have a strong time-varying component. To advance research on this challenging but important problem, we promote a natural, experimental framework-the Daily Classification Task-which can be applied to large time-based datasets, such as Reuters RCV1.In this paper we dissect concept drift into three main subtypes. We demonstrate via a novel visualization that the recurrent themes subtype is present in RCV1. This understanding led us to develop a new learning model that transfers induced knowledge through time to benefit future classifier learning tasks. The method avoids two main problems with existing work in inductive transfer: scalability and the risk of negative transfer. In empirical tests, it consistently showed more than 10 points F-measure improvement for each of four Reuters categories tested.

References

[1]

Baker, L. D. and McCallum, A. K. Distributional clustering of words for text classification. In Proc. of the 21st Annual Intl. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR, Melbourne), 1998.]]

Digital Library

[2]

Fawcett, T. ROC Graphs: Notes and Practical Considerations for Data Mining Researchers. Hewlett-Packard Labs, Tech Report HPL-2003-4, 2003. See http://www.hpl.hp.com/techreports/2003]]

[3]

Fawcett, T. and Flach, P. A response to Webb and Ting's 'On the application of ROC analysis to predict classification performance under varying class distributions.' Machine Learning, 58(1):33--38, 2005.]]

[4]

Forman, G. BNS Scaling: A Complement to Feature Selection for SVM Text Classification. Hewlett-Packard Labs technical report, HPL-2006-19, 2006.]]

[5]

Forman, G. Quantifying Trends Accurately Despite Classifier Error and Class Imbalance. Submitted to the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD, Philadelphia), 2006.]]

Digital Library

[6]

Forman, G. Counting Positives Accurately Despite Inaccurate Classification. In Proc. of the European Conf. on Machine learning (ECML, Porto):564--575, 2005.]]

Digital Library

[7]

Forman, G. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. Journal of Machine Learning Research, Special Issue on Variable and Feature Selection, 3(Mar):1289--1305, 2003.]]

Digital Library

[8]

Gabrilovich, E., and Markovitch, S. Feature Generation for Text Categorization Using World Knowledge. In Proc. of the 19th Intl. Joint Conference for Artificial Intelligence (IJCAI, Edinburgh), 2005.]]

Digital Library

[9]

Han, E. and Karypis, G. Centroid-Based Document Classification: Analysis & Experimental Results. In Proc. of the 4th European Conf. on the Principles of Data Mining and Knowledge Discovery (PKDD): 424--431, 2000.]]

Digital Library

[10]

Hulten, G., Spencer, L., and Domingos, P. Mining time-changing data streams. In Proc. of the 7th Int'l. Conf. on Knowledge Discovery and Data Mining (KDD, San Francisco):97--106, 2001.]]

Digital Library

[11]

Joachims, T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proc. of the 10th European Conf. on Machine Learning (ECML, Berlin):137--142, 1998.]]

Digital Library

[12]

Karypis, G. and Han, E. Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval. In Proc. of the 9th Intl. Conf. on Information and Knowledge Management (CIKM, Virginia):12--19. 2000.]]

Digital Library

[13]

Klinkenberg, R. Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 8(3):281--300, 2004.]]

[14]

Lewis, D., Yang, Y., Rose, T., and Li, F. RCV1: A New Benchmark Collection for Text Categorization Research. J. of Machine Learning Research, 5(Apr):361--397, 2004.]]

Digital Library

[15]

National Institute of Standards and Technology (NIST) Reuters Distribution, http://trec.nist.gov/data/reuters Also: http://about.reuters.com/researchandstandards/corpus]]

[16]

Scholz, M. and Klinkenberg, R. An Ensemble Classifier for Drifting Concepts. In Proc. of the 2nd Int'l. Workshop on Knowledge Discovery in Data Streams, (ECML,Porto):53--64, 2005.]]

[17]

Silver, D., Bakir, G., Bennett, K., Caruana, R., Pontil, M., Russell, S., Tadepalli, P., organizers. Workshop on Inductive Transfer: 10 Years Later. 19th Conf. on Neural Information Processing Systems (NIPS), Dec. 9, 2005.]]

[18]

Widmer, G., Kubat, M. Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning, 23(1):69--101, 1996.]]

[19]

Witten, I. and Frank, E., Data mining: Practical machine learning tools and techniques (2nd edition), Morgan Kaufmann, San Francisco, CA, 2005.]]

Digital Library

Cited By

Zhang QLiu GJiang C(2024)The Adaptation of Concept Drift: A Fit Prediction Algorithm Based on Local OptimumIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326459411:4(4944-4954)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3264594
Patel HAdams BHassan A(2024)Post deployment recycling of machine learning modelsEmpirical Software Engineering10.1007/s10664-024-10492-229:4Online publication date: 15-Jun-2024
https://doi.org/10.1007/s10664-024-10492-2
Wang KLu JLiu AZhang GXiong L(2023)Evolving Gradient Boost: A Pruning Scheme Based on Loss Improvement Ratio for Learning Under Concept DriftIEEE Transactions on Cybernetics10.1109/TCYB.2021.310979653:4(2110-2123)Online publication date: Apr-2023
https://doi.org/10.1109/TCYB.2021.3109796
Show More Cited By

Index Terms

Tackling concept drift by temporal inductive transfer
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

An effectiveness analysis of transfer learning for the concept drift problem in malware detection
Abstract
Malware classification is a task that has acquired importance due to the increase in malware distribution. In the literature, the application of machine learning techniques is proposed to tackle this task because machine learning ...
Highlights
- Transfer learning is effective in overcoming concept drift in malware classification.
SETL: a transfer learning based dynamic ensemble classifier for concept drift detection in streaming data
Abstract
Concept drift is one of the most prominent issues in streaming data that machine learning models need to address. Most of the research in the field of concept drift targets updating the prediction model for recovery from concept drift. A little ...
Adaptive online learning for classification under concept drift

In machine learning and predictive analytics, the underlying data distributions tend to change with the course of time known as concept drift. Accurate labelling in case of supervised learning algorithms is essential to build consistent ensemble models. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

August 2006

768 pages

ISBN:1595933697

DOI:10.1145/1148170

General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR06

Sponsor:

SIGIR06: The 29th Annual International SIGIR Conference

August 6 - 11, 2006

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

59
Total Citations
View Citations
1,215
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang QLiu GJiang C(2024)The Adaptation of Concept Drift: A Fit Prediction Algorithm Based on Local OptimumIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.326459411:4(4944-4954)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2023.3264594
Patel HAdams BHassan A(2024)Post deployment recycling of machine learning modelsEmpirical Software Engineering10.1007/s10664-024-10492-229:4Online publication date: 15-Jun-2024
https://doi.org/10.1007/s10664-024-10492-2
Wang KLu JLiu AZhang GXiong L(2023)Evolving Gradient Boost: A Pruning Scheme Based on Loss Improvement Ratio for Learning Under Concept DriftIEEE Transactions on Cybernetics10.1109/TCYB.2021.310979653:4(2110-2123)Online publication date: Apr-2023
https://doi.org/10.1109/TCYB.2021.3109796
Sun YYu YJin CZhang QLiu WYu H(2023)Graph-Based Competence Model for Concept Drift Detection2023 18th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)10.1109/ISKE60036.2023.10481075(529-536)Online publication date: 17-Nov-2023
https://doi.org/10.1109/ISKE60036.2023.10481075
Park NKim S(2021)FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data StreamsSensors10.3390/s2104108021:4(1080)Online publication date: 4-Feb-2021
https://doi.org/10.3390/s21041080
Manias DShaer IYang LShami A(2021)Concept Drift Detection in Federated Networked Systems2021 IEEE Global Communications Conference (GLOBECOM)10.1109/GLOBECOM46510.2021.9685083(1-6)Online publication date: 7-Dec-2021
https://dl.acm.org/doi/10.1109/GLOBECOM46510.2021.9685083
Daughton APaul M(2021)A bootstrapping approach to social media quantificationSocial Network Analysis and Mining10.1007/s13278-021-00760-011:1Online publication date: 9-Aug-2021
https://doi.org/10.1007/s13278-021-00760-0
Zhang WZhang MZhang JLiu ZChen ZWang JRaff EMessina E(2020)Flexible and Adaptive Fairness-aware Learning in Non-stationary Data Streams2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI50040.2020.00069(399-406)Online publication date: Nov-2020
https://doi.org/10.1109/ICTAI50040.2020.00069
Li LWang YHsu CLi YLin K(2020)L-measure evaluation metric for fake information detection models with binary class imbalanceEnterprise Information Systems10.1080/17517575.2020.1825821(1-20)Online publication date: 5-Oct-2020
https://doi.org/10.1080/17517575.2020.1825821
Iosifidis VNtoutsi E(2020)$$\mathsf {FABBOO}$$ - Online Fairness-Aware Learning Under Class ImbalanceDiscovery Science10.1007/978-3-030-61527-7_11(159-174)Online publication date: 15-Oct-2020
https://doi.org/10.1007/978-3-030-61527-7_11
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents