ACM Home Page
Please provide us with feedback. Feedback
Accurate decision trees for mining high-speed data streams
Full text PdfPdf (139 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
POSTER SESSION: Research track table of contents
Pages: 523 - 528  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
João Gama  Univ. do Porto, R. do Campo Alegre 823, 4150 Porto, Portugal
Ricardo Rocha  Projecto Matemática Ensino, 3810 Aveiro, Portugal
Pedro Medas  Univ. do Porto, R. do Campo Alegre 823, 4150 Porto, Portugal
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 20,   Downloads (12 Months): 125,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956813
What is a DOI?

ABSTRACT

In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining data streams is VFDT. In this paper we extend the VFDT system in two directions: the ability to deal with continuous data and the use of more powerful classification techniques at tree leaves. The proposed system, VFDTc, can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datasets. This is relevant due to the any-time property. We study the behaviour of VFDTc in different problems and demonstrate its utility in large and medium data sets. Under a bias-variance analysis we observe that VFDTc in comparison to C4.5 is able to reduce the variance component.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Blake, E. Keogh, and C. Merz. UCI repository of Machine Learning databases, 1999.
 
2
3
 
4
 
5
 
6
J. Gratch. Sequential inductive learning. In Proc. of Thirteenth National Conference on Artificial Intelligence, volume 1, pages 779--786, 1996.
7
 
8
 
9
R. Kohavi. Scaling up the accuracy of naive Bayes classifiers: a decision tree hybrid. In Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
 
10
 
11
P. Utgoff. Perceptron trees - a case study in hybrid concept representation. In Proc. of the Seventh National Conference on Artificial Intelligence. Morgan Kaufmann, 1988.
 
12

CITED BY  13
 
 
 
 
 
 
 
 

Collaborative Colleagues:
João Gama: colleagues
Ricardo Rocha: colleagues
Pedro Medas: colleagues

Peer to Peer - Readers of this Article have also read: