|
ABSTRACT
Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target marketing, network intrusion detection, etc. Conventional knowledge discovery tools are facing two challenges, the overwhelming volume of the streaming data, and the concept drifts. In this paper, we propose a general framework for mining concept-drifting data streams using weighted ensemble classifiers. We train an ensemble of classification models, such as C4.5, RIPPER, naive Beyesian, etc., from sequential chunks of the data stream. The classifiers in the ensemble are judiciously weighted based on their expected classification accuracy on the test data under the time-evolving environment. Thus, the ensemble approach improves both the efficiency in learning the model and the accuracy in performing classification. Our empirical study shows that the proposed methods have substantial advantage over single-classifier approaches in prediction accuracy, and the ensemble framework is effective for a variety of classification models.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
2
|
|
| |
3
|
|
| |
4
|
Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In Proc. of Very Large Database (VLDB), Hongkong, China, 2002.
|
| |
5
|
William Cohen. Fast effective rule induction. In Int'l Conf. on Machine Learning (ICML), pages 115--123, 1995.
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
W. Fan, H. Wang, P. Yu, and S. Lo. Inductive learning in less than one sequential scan. In Int'l Joint Conf. on Artificial Intelligence, 2003.
|
| |
10
|
W. Fan, H. Wang, P. Yu, and S. Stolfo. A framework for scalable cost-sensitive learning based on combining probabilities and benefits. In SIAM Int'l Conf. on Data Mining (SDM), 2002.
|
| |
11
|
Wei Fan , Fang Chu , Haixun Wang , Philip S. Yu, Pruning and dynamic scheduling of cost-sensitive ensembles, Eighteenth national conference on Artificial intelligence, p.146-151, July 28-August 01, 2002, Edmonton, Alberta, Canada
|
| |
12
|
Yoav Freund and Robert E. Schapire. Experiments with a new boosting algorithm. In Int'l Conf. on Machine Learning (ICML), pages 148--156, 1996.
|
 |
13
|
|
 |
14
|
Johannes Gehrke , Venkatesh Ganti , Raghu Ramakrishnan , Wei-Yin Loh, BOAT—optimistic decision tree construction, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.169-180, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
L. Hall, K. Bowyer, W. Kegelmeyer, T. Moore, and C. Chao. Distributed learning on very large data sets. In Workshop on Distributed and Parallel Knowledge Discover, 2000.
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
S. Stolfo, W. Fan, W. Lee, A. Prodromidis, and P. Chan. Credit card fraud detection using meta-learning: Issues and initial results. In AAAI-97 Workshop on Fraud Detection and Risk Management, 1997.
|
 |
23
|
|
| |
24
|
Kagan Tumer and Joydeep Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, 8(3--4):385--403, 1996.
|
| |
25
|
|
CITED BY 65
|
|
Yabo Xu , Ke Wang , Ada Wai-Chee Fu , Rong She , Jian Pei, Classification spanning correlated data streams, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
Mohamed Medhat Gaber , Shonali Krishnaswamy , Arkady Zaslavsky, Cost-efficient mining techniques for data streams, Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, p.109-114, January 01, 2004, Dunedin, New Zealand
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiawei Han , Yixin Chen , Guozhu Dong , Jian Pei , Benjamin W. Wah , Jianyong Wang , Y. Dora Cai, Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams, Distributed and Parallel Databases, v.18 n.2, p.173-197, September 2005
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haixun Wang , Jian Yin , Jian Pei , Philip S. Yu , Jeffrey Xu Yu, Suppressing model overfitting in mining concept-drifting data streams, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mohammad M. Masud , Jing Gao , Latifur Khan , Jiawei Han , Bhavani Thuraisingham, Peer to peer botnet detection for cyber-security: a data mining approach, Proceedings of the 4th annual workshop on Cyber security and informaiton intelligence research: developing strategies to meet the cyber security and information intelligence challenges ahead, May 12-14, 2008, Oak Ridge, Tennessee
|
|
|
|
|
|
|
|
|
Jing Gao , Wei Fan , Jing Jiang , Jiawei Han, Knowledge transfer via multiple model local structure mapping, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
Tiancheng Zhang , Dejun Yue , Yu Gu , Yi Wang , Ge Yu, Adaptive correlation analysis in stream time series with sliding windows, Computers & Mathematics with Applications, v.57 n.6, p.937-948, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Lu-An Tang , Bin Gui , Hong-Yan Li , Gao-Shan Miao , Dong-Qing Yang , Xin-Biao Zhou, PGG: an online pattern based approach for stream variation management, Journal of Computer Science and Technology, v.23 n.4, p.497-515, July 2008
|
|
|
Thomas Seidl , Ira Assent , Philipp Kranen , Ralph Krieger , Jennifer Herrmann, Indexing density models for incremental learning and anytime classification on data streams, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
Bin Cao , Dou Shen , Jian-Tao Sun , Xuanhui Wang , Qiang Yang , Zheng Chen, Detect and track latent factors with online nonnegative matrix factorization, Proceedings of the 20th international joint conference on Artifical intelligence, p.2689-2694, January 06-12, 2007, Hyderabad, India
|
|
|
|
|
|
|
|