| Accurate decision trees for mining high-speed data streams |
| Full text |
Pdf
(139 KB)
|
| Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Washington, D.C.
POSTER SESSION: Research track
table of contents
Pages: 523 - 528
Year of Publication: 2003
ISBN:1-58113-737-0
|
|
Authors
|
|
João Gama
|
Univ. do Porto, R. do Campo Alegre 823, 4150 Porto, Portugal
|
|
Ricardo Rocha
|
Projecto Matemática Ensino, 3810 Aveiro, Portugal
|
|
Pedro Medas
|
Univ. do Porto, R. do Campo Alegre 823, 4150 Porto, Portugal
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 20, Downloads (12 Months): 125, Citation Count: 13
|
|
|
ABSTRACT
In this paper we study the problem of constructing accurate decision tree models from data streams. Data streams are incremental tasks that require incremental, online, and any-time learning algorithms. One of the most successful algorithms for mining data streams is VFDT. In this paper we extend the VFDT system in two directions: the ability to deal with continuous data and the use of more powerful classification techniques at tree leaves. The proposed system, VFDTc, can incorporate and classify new information online, with a single scan of the data, in time constant per example. The most relevant property of our system is the ability to obtain a performance similar to a standard decision tree algorithm even for medium size datasets. This is relevant due to the any-time property. We study the behaviour of VFDTc in different problems and demonstrate its utility in large and medium data sets. Under a bias-variance analysis we observe that VFDTc in comparison to C4.5 is able to reduce the variance component.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Blake, E. Keogh, and C. Merz. UCI repository of Machine Learning databases, 1999.
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
J. Gratch. Sequential inductive learning. In Proc. of Thirteenth National Conference on Artificial Intelligence, volume 1, pages 779--786, 1996.
|
 |
7
|
|
| |
8
|
|
| |
9
|
R. Kohavi. Scaling up the accuracy of naive Bayes classifiers: a decision tree hybrid. In Proc. of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
|
| |
10
|
|
| |
11
|
P. Utgoff. Perceptron trees - a case study in hybrid concept representation. In Proc. of the Seventh National Conference on Artificial Intelligence. Morgan Kaufmann, 1988.
|
| |
12
|
|
CITED BY 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yabo Xu , Ke Wang , Ada Wai-Chee Fu , Rong She , Jian Pei, Classification spanning correlated data streams, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|