|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
Bagging and boosting are two popular ensemble methods that typically achieve better accuracy than a single classifier. These techniques have limitations on massive data sets, because the size of the data set can be a bottleneck. Voting many classifiers built on small subsets of data ("pasting small votes") is a promising approach for learning from massive data sets, one that can utilize the power of boosting and bagging. We propose a framework for building hundreds or thousands of such classifiers on small subsets of data in a distributed environment. Experiments show this approach is fast, accurate, and scalable.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
<i>Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</i>, San Francisco, CA, 2001. ACM.
|
| |
2
|
R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer. A new ensemble diversity measure applied to thinning ensembles. In <i>Multiple Classifier Systems Workshop</i>, pages 306-316, Surrey, UK, 2003.
|
| |
3
|
|
| |
4
|
H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The protein data bank. <i>Nucleic Acids Research</i>, 28:235-242, 2000. http://www.pdb.org/.
|
| |
5
|
C. L. Blake and C. J. Merz. UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
P. Chan and S. Stolfo. Towards parallel and distributed learning by meta-learning. In <i>Working Notes AAAI Workshop on Knowledge Discovery and Databases</i>, pages 227-240, San Mateo, CA, 1993.
|
| |
10
|
N. V. Chawla, S. Eschrich, and L. O. Hall. Creating ensembles of classifiers. In <i>First IEEE International Conference on Data Mining</i>, pages 581-583, San Jose, CA, 2000.
|
| |
11
|
|
| |
12
|
Nitesh V. Chawla , Thomas E. Moore , Lawrence O. Hall , Kevin W. Bowyer , W. Philip Kegelmeyer , Clayton Springer, Distributed learning with bagging-like performance, Pattern Recognition Letters, v.24 n.1-3, p.455-471, January 2003
[doi> 10.1016/S0167-8655(02)00269-6]
|
| |
13
|
N. V. Chawla, T. E. Moore, Jr., L. O. Hall, K. W. Bowyer, W. P. Kegelmeyer, and C. Springer. Investigation of bagging-like effects and decision trees versus neural nets in protein secondary structure prediction. In <i>ACM SIGKDD Workshop on Data Mining in Bio-Informatics</i>, San Francisco, CA, 2001.
|
| |
14
|
|
| |
15
|
|
| |
16
|
P. Domingos. Using partitioning to speed up specific-to-general rule induction. In <i>AAAI Workshop on Integrating Multiple Learned Models</i>, pages 29-34, Portland, OR, 1996.
|
| |
17
|
|
| |
18
|
S. Eschrich, N. V. Chawla, and L. O. Hall. Learning to predict in complex biological domains. <i>Journal of System Simulation</i>, 2:1464-1471, 2002.
|
| |
19
|
|
| |
20
|
|
| |
21
|
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In <i>Thirteenth International Conference on Machine Learning</i>, Bari, Italy, 1996.
|
| |
22
|
|
| |
23
|
I. J. Good. <i>The Estimation of Probabilities: An essay on modern Bayesian methods</i>. MIT Press, 1965.
|
| |
24
|
L. O. Hall, K. W. Bowyer, N. V. Chawla, T. E. Moore, and W. P. Kegelmeyer. Avatar: Adaptive Visualization Aid for Touring and Recovery. Technical Report SAND2000-8203, Sandia National Labs, 2000.
|
| |
25
|
Lawrence O. Hall , Nitesh V. Chawla , Kevin W. Bowyer , W. Philip Kegelmeyer, Learning Rules from Distributed Data, Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD, p.211-220, August 15, 1999
|
| |
26
|
|
| |
27
|
D. T. Jones. Protein secondary structure prediction based on decision-specific scoring matrices. <i>Journal of Molecular Biology</i>, 292:195-202, 1999.
|
| |
28
|
|
| |
29
|
L. Kuncheva, C. Whitaker, C. Shipp, and R. Duin. Is independence good for combining classifiers? In <i>Proceedings of 15th International Conference on Pattern Recognition</i>, pages 168-171, Barcelona, Spain, September 2000.
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
Lawrence Livermore National Laboratories. ASCI Blue Pacific. http://www.llnl.gov/asci/platforms/bluepac.
|
| |
34
|
Lawrence Livermore National Laboratories. Protein Structure Prediction Center. http://predictioncenter.llnl.gov/, 1999.
|
| |
35
|
R. Musick, J. Catlett, and S. Russell. Decision theoretic subsampling for induction on large databases. In <i>Proceedings of Tenth International Conference on Machine Learning</i>, pages 212- 219, Amherst, MA, 1993.
|
| |
36
|
|
| |
37
|
F. Provost and D. N. Hennessy. Scaling up: Distributed machine learning with cooperation. In <i>Proceedings of the Thirteenth National Conference on Artificial Intelligence, AAAI'96</i>, pages 74-79, Portland, Oregon, 1996.
|
| |
38
|
|
 |
39
|
Foster Provost , David Jensen , Tim Oates, Efficient progressive sampling, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.23-32, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312188]
|
| |
40
|
|
| |
41
|
D. B. Skalak. The sources of increased accuracy for two proposed boosting algorithms. In <i>AAAI Integrating Multiple Learned Models Workshop</i>, Portland, Oregon, 1996.
|
 |
42
|
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
Larry Shoemaker , Robert E. Banfield , Lawrence O. Hall , Kevin W. Bowyer , W. Philip Kegelmeyer, Using classifier ensembles to label spatially disjoint data, Information Fusion, v.9 n.1, p.120-133, January, 2008
|
|
|
|
|
|
Ping Luo , Hui Xiong , Kevin Lü , Zhongzhi Shi, Distributed classification in peer-to-peer networks, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|