ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Forest trees for on-line data
Full text PdfPdf (197 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2004 ACM symposium on Applied computing table of contents
Nicosia, Cyprus
SESSION: Data streams (DS) table of contents
Pages: 632 - 636  
Year of Publication: 2004
ISBN:1-58113-812-1
Authors
João Gama  LIACC, FEP, Univ. do Porto, R. do Campo Alegre 823, Porto, Portugal
Pedro Medas  LIACC, Univ. do Porto, R. do Campo Alegre 823, Porto, Portugal
Ricardo Rocha  Campus Universitário de, Santiago, Aveiro, Portugal
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 36,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/967900.968033
What is a DOI?

ABSTRACT

This paper presents an hybrid adaptive system for induction of forest of trees from data streams. The Ultra Fast Forest Tree system (UFFT) is an incremental algorithm, with constant time for processing each example, works online, and uses the Hoeffding bound to decide when to install a splitting test in a leaf leading to a decision node. Our system has been designed for continuous data. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. The number of examples required to evaluate the splitting criteria is sound, based on the Hoeffding bound. For multiclass problems, the algorithm builds a binary tree for each possible pair of classes, leading to a forest of trees. During the training phase the algorithm maintains a short term memory. Given a data stream, a fixed number of the most recent examples are maintained in a data-structure that supports constant time insertion and deletion. When a test is installed, a leaf is transformed into a decision node with two descendant leaves. The sufficient statistics of these leaves are initialized with the examples in the short term memory that will fall at these leaves. We study the behavior of UFFT in different problems. The experimental results shows that UFFT is competitive against a batch decision tree learner in large and medium datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Blake, E. Keogh, and C. J. Merz. UCI repository of Machine Learning databases, 1999.
 
2
Leo Breiman. Random forests. Technical report, University of Berkeley, 2002.
3
 
4
 
5
 
6
Jonathan Gratch. Sequential inductive learning. In Proc. Thirteenth National Conference on Artificial Intelligence, volume 1, pages 779--786, 1996.
7
 
8
 
9
J. Kittler. Combining classifiers: A theoretical framework. Pattern analysis and Applications, Vol. 1, No. 1, 1998.
 
10
R. Kohavi. Scaling up the accuracy of naive Bayes classifiers: a decision tree hybrid. In Proc. 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
 
11
W.-Y. Loh and Y.-S. Shih. Split selection methods for classification trees. Statistica Sinica, 1997.
 
12
 
13
 
14
P. Utgoff. Perceptron trees - a case study in hybrid concept representation. In Proc. Seventh National Conference on Artificial Intelligence. Morgan Kaufmann, 1988.
 
15


Collaborative Colleagues:
João Gama: colleagues
Pedro Medas: colleagues
Ricardo Rocha: colleagues