ACM Home Page
Please provide us with feedback. Feedback
Mining concept-drifting data streams using ensemble classifiers
Full text PdfPdf (234 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
SESSION: Research track table of contents
Pages: 226 - 235  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
Haixun Wang  IBM T. J. Watson Research, Hawthorne, NY
Wei Fan  IBM T. J. Watson Research, Hawthorne, NY
Philip S. Yu  IBM T. J. Watson Research, Hawthorne, NY
Jiawei Han  Univ. of Illinois, Urbana, IL
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 53,   Downloads (12 Months): 414,   Citation Count: 65
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956778
What is a DOI?

ABSTRACT

Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In Proc. of Very Large Database (VLDB), Hongkong, China, 2002.
 
5
William Cohen. Fast effective rule induction. In Int'l Conf. on Machine Learning (ICML), pages 115--123, 1995.
 
6
7
 
8
 
9
W. Fan, H. Wang, P. Yu, and S. Lo. Inductive learning in less than one sequential scan. In Int'l Joint Conf. on Artificial Intelligence, 2003.
 
10
W. Fan, H. Wang, P. Yu, and S. Stolfo. A framework for scalable cost-sensitive learning based on combining probabilities and benefits. In SIAM Int'l Conf. on Data Mining (SDM), 2002.
 
11
 
12
Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In Int'l Conf. on Machine Learning (ICML), pages 148--156, 1996.
13
14
 
15
16
 
17
 
18
L. Hall, K. Bowyer, W. Kegelmeyer, T. Moore, and C. Chao. Distributed learning on very large data sets. In Workshop on Distributed and Parallel Knowledge Discover, 2000.
19
 
20
 
21
 
22
S. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. Chan. Credit card fraud detection using meta-learning: Issues and initial results. In AAAI-97 Workshop on Fraud Detection and Risk Management, 1997.
23
 
24
Kagan Tumer and Joydeep Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, 8(3--4):385--403, 1996.
 
25

CITED BY  65

Collaborative Colleagues:
Haixun Wang: colleagues
Wei Fan: colleagues
Philip S. Yu: colleagues
Jiawei Han: colleagues