|
ABSTRACT
Machine learning is the mainstay for text classification. However, even the most successful techniques are defeated by many real-world applications that have a strong time-varying component. To advance research on this challenging but important problem, we promote a natural, experimental framework-the Daily Classification Task-which can be applied to large time-based datasets, such as Reuters RCV1.In this paper we dissect concept drift into three main subtypes. We demonstrate via a novel visualization that the recurrent themes subtype is present in RCV1. This understanding led us to develop a new learning model that transfers induced knowledge through time to benefit future classifier learning tasks. The method avoids two main problems with existing work in inductive transfer: scalability and the risk of negative transfer. In empirical tests, it consistently showed more than 10 points F-measure improvement for each of four Reuters categories tested.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Fawcett, T. ROC Graphs: Notes and Practical Considerations for Data Mining Researchers. Hewlett-Packard Labs, Tech Report HPL-2003-4, 2003. See http://www.hpl.hp.com/techreports/2003
|
| |
3
|
|
| |
4
|
Forman, G. BNS Scaling: A Complement to Feature Selection for SVM Text Classification. Hewlett-Packard Labs technical report, HPL-2006-19, 2006.
|
 |
5
|
|
| |
6
|
Forman, G. Counting Positives Accurately Despite Inaccurate Classification. In Proc. of the European Conf. on Machine learning (ECML, Porto):564--575, 2005.
|
| |
7
|
|
| |
8
|
Gabrilovich, E., and Markovitch, S. Feature Generation for Text Categorization Using World Knowledge. In Proc. of the 19th Intl. Joint Conference for Artificial Intelligence (IJCAI, Edinburgh), 2005.
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
Klinkenberg, R. Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 8(3):281--300, 2004.
|
| |
14
|
|
| |
15
|
National Institute of Standards and Technology (NIST) Reuters Distribution, http://trec.nist.gov/data/reuters Also: http://about.reuters.com/researchandstandards/corpus
|
| |
16
|
Scholz, M. and Klinkenberg, R. An Ensemble Classifier for Drifting Concepts. In Proc. of the 2nd Int'l. Workshop on Knowledge Discovery in Data Streams, (ECML,Porto):53--64, 2005.
|
| |
17
|
Silver, D., Bakir, G., Bennett, K., Caruana, R., Pontil, M., Russell, S., Tadepalli, P., organizers. Workshop on Inductive Transfer: 10 Years Later. 19th Conf. on Neural Information Processing Systems (NIPS), Dec. 9, 2005.
|
| |
18
|
|
| |
19
|
Witten, I. and Frank, E., Data mining: Practical machine learning tools and techniques (2nd edition), Morgan Kaufmann, San Francisco, CA, 2005.
|
CITED BY 9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fernando Mourão , Leonardo Rocha , Renata Araújo , Thierson Couto , Marcos Gonçalves , Wagner Meira, Jr., Understanding temporal aspects in document classification, Proceedings of the international conference on Web search and web data mining, February 11-12, 2008, Palo Alto, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Kyosuke Nishida , Koichiro Yamauchi, Learning, detecting, understanding, and predicting concept changes, Proceedings of the 2009 international joint conference on Neural Networks, p.283-290, June 14-19, 2009, Atlanta, Georgia, USA
|
|