|
ABSTRACT
Current models of the classification problem do not effectively handle bursts of particular classes coming in at different times. In fact, the current model of the classification problem simply concentrates on methods for one-pass classification modeling of very large data sets. Our model for data stream classification views the data stream classification problem from the point of view of a dynamic approach in which simultaneous training and testing streams are used for dynamic classification of data sets. This model reflects real life situations effectively, since it is desirable to classify test streams in real time over an evolving training and test stream. The aim here is to create a classification system in which the training model can adapt quickly to the changes of the underlying data stream. In order to achieve this goal, we propose an on-demand classification process which can dynamically select the appropriate window of past training data to build the classifier. The empirical results indicate that the system maintains a high classification accuracy in an evolving data stream, while providing an efficient solution to the classification task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. C. Aggarwal, J. Han, J.Wang, P. Yu. CluStream: A Framework for Clustering Evolving Data Streams. VLDB Conference, 2003.
|
 |
2
|
|
 |
3
|
Brian Babcock , Shivnath Babu , Mayur Datar , Rajeev Motwani , Jennifer Widom, Models and issues in data stream systems, Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 03-05, 2002, Madison, Wisconsin
[doi> 10.1145/543613.543615]
|
 |
4
|
|
| |
5
|
|
 |
6
|
|
 |
7
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 18
|
|
|
|
|
|
|
|
Jiawei Han , Yixin Chen , Guozhu Dong , Jian Pei , Benjamin W. Wah , Jianyong Wang , Y. Dora Cai, Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams, Distributed and Parallel Databases, v.18 n.2, p.173-197, September 2005
|
|
|
|
|
|
|
|
|
|
|
|
Xuanhui Wang , ChengXiang Zhai , Xiao Hu , Richard Sproat, Mining correlated bursty topic patterns from coordinated text streams, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Larry Shoemaker , Robert E. Banfield , Lawrence O. Hall , Kevin W. Bowyer , W. Philip Kegelmeyer, Using classifier ensembles to label spatially disjoint data, Information Fusion, v.9 n.1, p.120-133, January, 2008
|
|
|
Kun-Lung Wu , Kirsten W. Hildrum , Wei Fan , Philip S. Yu , Charu C. Aggarwal , David A. George , Buǧra Gedik , Eric Bouillet , Xiaohui Gu , Gang Luo , Haixun Wang, Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|