ACM Home Page
Please provide us with feedback. Feedback
The distributed boosting algorithm
Full text PdfPdf (625 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Francisco, California
Pages: 311 - 316  
Year of Publication: 2001
ISBN:1-58113-391-X
Authors
Aleksandar Lazarevic  Temple University, Philadelphia, PA
Zoran Obradovic  Temple University, Philadelphia, PA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
AAAI : American Association for Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 54,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502512.502557
What is a DOI?

ABSTRACT

In this paper, we propose a general framework for distributed boosting intended for efficient integrating specialized classifiers learned over very large and distributed homogeneous databases that cannot be merged at a single location. Our distributed boosting algorithm can also be used as a parallel classification technique, where a massive database that cannot fit into main computer memory is partitioned into disjoint subsets for a more efficient analysis. In the proposed method, at each boosting round the classifiers are first learned from disjoint datasets and then exchanged amongst the sites. Finally the classifiers are combined into a weighted voting ensemble on each disjoint data set. The ensemble that is applied to an unseen test set represents an ensemble of ensembles built on all distributed sites. In experiments performed on four large data sets the proposed distributed boosting method achieved classification accuracy comparable or even slightly better than the standard boosting algorithm while requiring less memory and less computational time. In addition, the communication overhead of the distributed boosting algorithm is very small making it a viable alternative to the standard boosting for large-scale databases.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Blake, C.L. and Merz, C.J.: UCI Repository of machine learning databases {http://www.ics.uci.edu/-mlearn/MLRepository.html}. Irvine, CA: University of California, Department of Information and Computer Science, (1998).
 
3
Chart, P. and Stolfo, S. On the Accuracy of Meta-leaming for Scalable Data Mining, Journal of Intelligent Integration of Information, (Kerschberg L. Ed.), (1998).
 
4
5
 
6
Freund, Y., and Schapire, R. E. Experiments with a New Boosting Algorithm, in Proceedings of the 13th International Conference on Machine Learning, (1996), 325-332.
 
7
 
8
Hagan, M., Menhaj, M.B. Training Feedforward Networks with the Marquardt Algorithm. IEEE Transactions on Neural Networks (1994), 5, 989-993.
 
9
Lazarevic, A., Obradovic, Z. The Effective Pruning of Neural Network Ensembles, in Proceedings of the IEEE International Joint Conference on Neural Networks, (2001), in press.
 
10
Pokrajac D., Fiez T., Obradovic Z. A Spatial Data Simulator for Agriculture Knowledge Discovery Applications, in review.
 
11
Riedmiller, M., Braun, H. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm, in Proceedings of the IEEE International Conference on Neural Networks, (1993), 586-591.
 
12
Utgoff, P. An Improved Algorithm for Incremental Induction of Decision Trees, in Proceedings of the l lth International Conference on Machine Learning, (1994), 318-325.


Collaborative Colleagues:
Aleksandar Lazarevic: colleagues
Zoran Obradovic: colleagues