ACM Home Page
Please provide us with feedback. Feedback
A streaming ensemble algorithm (SEA) for large-scale classification
Full text PdfPdf (511 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 377 - 382  
Year of Publication: 2001
ISBN:1-58113-391-X
Authors
W. Nick Street  University of Iowa, Iowa City, IA
YongSeog Kim  University of Iowa, Iowa City, IA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 139,   Citation Count: 46
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502568
What is a DOI?

ABSTRACT

Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
C. L. Blake and C. J. Merz. UCI repository of machine learning databases {http: //www.ics.uci.edu/~mlearn/M LRepository, html}, 1998. University of California, Irvine, Department of Information and Computer Sciences.
 
3
J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, July 1998.
 
4
 
5
L. Breiman. Arcing classifiers. Annals of Statistics, 26(3):801-849, 1998.
 
6
C. L. Carter, C. Allen, and D. E. Henson. Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases. Cancer, 63:181-187, 1989.
 
7
8
 
9
 
10
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148-156, 1996.
11
12
 
13
L. O. Hall, K. W. Bowyer, W. P. Kegelmeyer, T. E. Moore, and C. Chao. Distributed learning on very large data sets. In Workshop on Distributed and Parallel Knowledge Discovery (KDD-O0), pages 79-84, Aug 2000.
 
14
 
15
R. Kohavi. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 202-207, 1996.
 
16
O. L. Mangasarian and D. R. Musicant. Massive support vector regression. Machine Learning, to appear.
 
17
 
18
D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11:169-198, 1999.
 
19
 
20
 
21
P. Sollich and A. Krogh. Learning with ensembles: How overfitting can be useful. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8. MIT Press, 1996.
 
22

CITED BY  46

Collaborative Colleagues:
W. Nick Street: colleagues
YongSeog Kim: colleagues