|
ABSTRACT
Ensemble methods have recently garnered a great deal of attention in the machine learning community. Techniques such as Boosting and Bagging have proven to be highly effective but require repeated resampling of the training data, making them inappropriate in a data mining context. The methods presented in this paper take advantage of plentiful data, building separate classifiers on sequential chunks of training points. These classifiers are combined into a fixed-size ensemble using a heuristic replacement strategy. The result is a fast algorithm for large-scale or streaming data that classifies as well as a single decision tree built on all the data, requires approximately constant memory, and adjusts quickly to concept drift.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
C. L. Blake and C. J. Merz. UCI repository of machine learning databases {http: //www.ics.uci.edu/~mlearn/M LRepository, html}, 1998. University of California, Irvine, Department of Information and Computer Sciences.
|
| |
3
|
J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, July 1998.
|
| |
4
|
|
| |
5
|
L. Breiman. Arcing classifiers. Annals of Statistics, 26(3):801-849, 1998.
|
| |
6
|
C. L. Carter, C. Allen, and D. E. Henson. Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases. Cancer, 63:181-187, 1989.
|
| |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148-156, 1996.
|
 |
11
|
|
 |
12
|
Johannes Gehrke , Venkatesh Ganti , Raghu Ramakrishnan , Wei-Yin Loh, BOAT—optimistic decision tree construction, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.169-180, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
13
|
L. O. Hall, K. W. Bowyer, W. P. Kegelmeyer, T. E. Moore, and C. Chao. Distributed learning on very large data sets. In Workshop on Distributed and Parallel Knowledge Discovery (KDD-O0), pages 79-84, Aug 2000.
|
| |
14
|
|
| |
15
|
R. Kohavi. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 202-207, 1996.
|
| |
16
|
O. L. Mangasarian and D. R. Musicant. Massive support vector regression. Machine Learning, to appear.
|
| |
17
|
|
| |
18
|
D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11:169-198, 1999.
|
| |
19
|
|
| |
20
|
|
| |
21
|
P. Sollich and A. Krogh. Learning with ensembles: How overfitting can be useful. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8. MIT Press, 1996.
|
| |
22
|
|
CITED BY 46
|
|
|
|
|
|
|
|
Nitesh V. Chawla , Thomas E. Moore , Lawrence O. Hall , Kevin W. Bowyer , W. Philip Kegelmeyer , Clayton Springer, Distributed learning with bagging-like performance, Pattern Recognition Letters, v.24 n.1-3, p.455-471, January 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Haixun Wang , Jian Yin , Jian Pei , Philip S. Yu , Jeffrey Xu Yu, Suppressing model overfitting in mining concept-drifting data streams, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas Seidl , Ira Assent , Philipp Kranen , Ralph Krieger , Jennifer Herrmann, Indexing density models for incremental learning and anytime classification on data streams, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
Ruben Nicolas , Elisabet Golobardes , Albert Fornells , Sonia Segura , Susana Puig , Cristina Carrera , Joseph Palou , Josep Malvehy, Using Ensemble-Based Reasoning to Help Experts in Melanoma Diagnosis, Proceeding of the 2008 conference on Artificial Intelligence Research and Development: Proceedings of the 11th International Conference of the Catalan Association for Artificial Intelligence, p.178-185, July 03, 2008
|
|
|
|
|
|
|
|
|
|
|
|
Albert Bifet , Geoff Holmes , Bernhard Pfahringer , Richard Kirkby , Ricard Gavaldà, New ensemble methods for evolving data streams, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|