ACM Home Page
Please provide us with feedback. Feedback
OcVFDT: one-class very fast decision tree for one-class classification of data streams
Full text PdfPdf (220 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data table of contents
Paris, France
SESSION: Short research papers table of contents
Pages 79-86  
Year of Publication: 2009
ISBN:978-1-60558-668-7
Authors
Chen Li  Northwest A&F University, Yangling, Shaanxi Province, P.R. China
Yang Zhang  Northwest A&F University, Yangling, Shaanxi Province, P.R. China
Xue Li  The University of Queensland, Brisbane, Queensland, Australia
Sponsors
: Cooperating Objects Network of Excellence (CONET)
: Geographic Information Science and Technology (GIST) Group at Oak Ridge National Laboratory
: Computational Sciences and Engineering (CSE) Division at the Oak Ridge National Laboratory
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 21,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1601966.1601981
What is a DOI?

ABSTRACT

Current research on data stream classification mainly focuses on supervised learning, in which a fully labeled data stream is needed for training. However, fully labeled data streams are expensive to obtain, which make the supervised learning approach difficult to be applied to real-life applications. In this paper, we model applications, such as credit fraud detection and intrusion detection, as a one-class data stream classification problem. The cost of fully labeling the data stream is reduced as users only need to provide some positive samples together with the unlabeled samples to the learner. Based on VFDT and POSC4.5, we propose our OcVFDT (One-class Very Fast Decision Tree) algorithm. Experimental study on both synthetic and real-life datasets shows that the OcVFDT has excellent classification performance. Even 80% of the samples in data stream are unlabeled, the classification performance of OcVFDT is still very close to that of VFDT, which is trained on fully labeled stream.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
B. Calvo, P. Larranaga, and J. A. Lozano. Learning Bayesian classifiers from positive and unlabeled examples. Pattern Recognition Letters, 28:2375--2384, 2007.
 
2
F. Denis, R. Gilleron, and F. Letouzey. Learning from positive and unlabeled examples. Theoretical Computer Science, pages 70--83, 2005.
 
3
T. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10(7):1895--1923, 1998.
 
4
P. Domingos and G. Hulten. Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'00), pages 71--80. ACM New York, NY, USA, 2000.
 
5
C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'08).
 
6
U. M. Fayyad and K. B. Irani. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87--102, 1992.
 
7
G. Fung, J. Yu, H. Lu, and P. Yu. Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering, 18(1):6--20, 2006.
 
8
J. Gama, P. Medas, and P. Rodrigues. Learning Decision Trees from Dynamic Data Streams. Journal of Universal Computer Science, 11(8):1353--1366, 2005.
 
9
J. Gama, R. Rocha, and P. Medas. Accurate decision trees for mining high-speed data streams. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'03), pages 523--528. ACM Press New York, NY, USA, 2003.
 
10
G. Hulten, P. Domingos, and L. Spencer. Mining massive data streams. In The Journal of Machine Learning Research, 2005.
 
11
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'01), pages 97--106. ACM New York, NY, USA, 2001.
 
12
R. Jin and G. Agrawal. Efficient decision tree construction on streaming data. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'03), pages 571--576. ACM New York, NY, USA, 2003.
 
13
W. Lee and B. Liu. Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. In Proceedings of Twentieth International Conference on Machine Learning. (ICML'03), volume 20, page 448, 2003.
 
14
D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A New Benchmark Collection for Text Categorization Research. The Journal of Machine Learning Research, 5:361--397, 2004.
 
15
B. Liu, Y. Dai, X. Li, W. Lee, and P. Yu. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining. (ICDM'03), pages 179--186, 2003.
 
16
J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
 
17
B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7):1443--1471, 2001.
 
18
W. Street and Y. Kim. A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'01), pages 377--382. ACM New York, NY, USA, 2001.
 
19
H. Wang, W. Fan, P. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'03), pages 226--235. ACM New York, NY, USA, 2003.
 
20
H. Yu. Single-Class Classification with Mapping Convergence. Machine Learning, 61(1):49--69, 2005.
 
21
H. Yu, J. Han, and K. Chang. PEBL: web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering, 16(1):70--81, 2004.
 
22
Y. Zhang and X. Jin. An automatic construction and organization strategy for ensemble learning on data streams. ACM SIGMOD Record, 35(3):28--33, 2006.
 
23
Y. Zhang, X. Li, and M. Orlowska. One-Class Classification of Text Streams with Concept Drift. In Proceedings of the Third IEEE International Conference on Data Mining Workshops. (ICDMW'08), pages 116--125, 2008.
 
24
X. Zhu, X. Wu, and Y. Yang. Dynamic Classifier Selection for Effective Mining from Noisy Data Streams. In Proceedings of the Fourth IEEE International Conference on Data Mining. (ICDM'04), pages 305--312, 2004.