ACM Home Page
Please provide us with feedback. Feedback
Categorizing and mining concept drifting data streams
Full text PdfPdf (508 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 812-820  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Peng Zhang  Chinese Academy of Sciences, Beijing, China
Xingquan Zhu  Florida Atlantic University, Boca Raton, FL, USA
Yong Shi  University of Nebraska at Omaha, Nebraska, NE, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 42,   Downloads (12 Months): 445,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401987
What is a DOI?

ABSTRACT

Mining concept drifting data streams is a defining challenge for data mining research. Recent years have seen a large body of work on detecting changes and building prediction models from stream data, with a vague understanding on the types of the concept drifting and the impact of different types of concept drifting on the mining algorithms. In this paper, we first categorize concept drifting into two scenarios: Loose Concept Drifting (LCD) and Rigorous Concept Drifting (RCD), and then propose solutions to handle each of them separately. For LCD data streams, because concepts in adjacent data chunks are sufficiently close to each other, we apply kernel mean matching (KMM) method to minimize the discrepancy of the data chunks in the kernel space. Such a minimization process will produce weighted instances to build classifier ensemble and handle concept drifting data streams. For RCD data streams, because genuine concepts in adjacent data chunks may randomly and rapidly change, we propose a new Optimal Weights Adjustment (OWA) method to determine the optimum weight values for classifiers trained from the most recent (up-to-date) data chunk, such that those classifiers can form an accurate classifier ensemble to predict instances in the yet-to-come data chunk. Experiments on synthetic and real-world datasets will show that weighted instance approach is preferable when the concept drifting is mainly caused by the changing of the class prior probability; whereas the weighted classifier approach is preferable when the concept drifting is mainly triggered by the changing of the conditional probability.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
 
5
6
 
7
8
 
9
10
11
 
12
M. Scholz and R. Klinkenberg. 2005. An Ensemble Classifier for Drifting Concepts. In Proc. of the 2nd International Workshop on Knowledge Discovery in Data Streams.
13
 
14
15
 
16
H. Shimodaira, 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90,227--244.
 
17
M. Sugiyama, & K. Müüller, 2005. Model selection under covariate shift. In Proc. of ICANN.
18
 
19
Bickel, S., & Scheffer, T. 2007. Dirichlet-enhanced spam filtering based on biased samples. Advances in Neural Information Processing Systems.
 
20
M. Dudik, R. Schapire, & S. Phillips, 2005. Correcting sample selection bias in maximum entropy density estimation. Advances in Neural Info. Processing Systems.
 
21
J. Huang, A. Smola, A. Gretton, K. Borgwardt, & B. Schöölkopf, 2007. Correcting sample selection bias by unlabeled data. Advances in Neural Info. Proc. Systems.
 
22
K. Tumer & J. Ghosh.1996. Analysis of decision boundaries in linearly combined neural classifiers, Pattern Recognition, 29(2).
 
23
 
24


Collaborative Colleagues:
Peng Zhang: colleagues
Xingquan Zhu: colleagues
Yong Shi: colleagues