ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Tackling concept drift by temporal inductive transfer
Full text PdfPdf (886 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Machine learning table of contents
Pages: 252 - 259  
Year of Publication: 2006
ISBN:1-59593-369-7
Author
George Forman  Hewlett-Packard Labs, Palo Alto, CA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 98,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148216
What is a DOI?

ABSTRACT

Machine learning is the mainstay for text classification. However, even the most successful techniques are defeated by many real-world applications that have a strong time-varying component. To advance research on this challenging but important problem, we promote a natural, experimental framework-the Daily Classification Task-which can be applied to large time-based datasets, such as Reuters RCV1.In this paper we dissect concept drift into three main subtypes. We demonstrate via a novel visualization that the recurrent themes subtype is present in RCV1. This understanding led us to develop a new learning model that transfers induced knowledge through time to benefit future classifier learning tasks. The method avoids two main problems with existing work in inductive transfer: scalability and the risk of negative transfer. In empirical tests, it consistently showed more than 10 points F-measure improvement for each of four Reuters categories tested.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Fawcett, T. ROC Graphs: Notes and Practical Considerations for Data Mining Researchers. Hewlett-Packard Labs, Tech Report HPL-2003-4, 2003. See http://www.hpl.hp.com/techreports/2003
 
3
 
4
Forman, G. BNS Scaling: A Complement to Feature Selection for SVM Text Classification. Hewlett-Packard Labs technical report, HPL-2006-19, 2006.
5
 
6
Forman, G. Counting Positives Accurately Despite Inaccurate Classification. In Proc. of the European Conf. on Machine learning (ECML, Porto):564--575, 2005.
 
7
 
8
Gabrilovich, E., and Markovitch, S. Feature Generation for Text Categorization Using World Knowledge. In Proc. of the 19th Intl. Joint Conference for Artificial Intelligence (IJCAI, Edinburgh), 2005.
 
9
10
 
11
12
 
13
Klinkenberg, R. Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 8(3):281--300, 2004.
 
14
 
15
National Institute of Standards and Technology (NIST) Reuters Distribution, http://trec.nist.gov/data/reuters Also: http://about.reuters.com/researchandstandards/corpus
 
16
Scholz, M. and Klinkenberg, R. An Ensemble Classifier for Drifting Concepts. In Proc. of the 2nd Int'l. Workshop on Knowledge Discovery in Data Streams, (ECML,Porto):53--64, 2005.
 
17
Silver, D., Bakir, G., Bennett, K., Caruana, R., Pontil, M., Russell, S., Tadepalli, P., organizers. Workshop on Inductive Transfer: 10 Years Later. 19th Conf. on Neural Information Processing Systems (NIPS), Dec. 9, 2005.
 
18
 
19
Witten, I. and Frank, E., Data mining: Practical machine learning tools and techniques (2nd edition), Morgan Kaufmann, San Francisco, CA, 2005.

CITED BY  9