ACM Home Page
Please provide us with feedback. Feedback
Mining complex models from arbitrarily large databases in constant time
Full text PdfPdf (854 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Edmonton, Alberta, Canada
POSTER SESSION: Poster papers table of contents
Pages: 525 - 531  
Year of Publication: 2002
ISBN:1-58113-567-X
Authors
Geoff Hulten  University of Washington, Seattle, WA
Pedro Domingos  University of Washington, Seattle, WA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
: AAAI
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 40,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/775047.775124
What is a DOI?

ABSTRACT

In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm is that its running time becomes independent of the size of the database, while the decisions made are essentially identical to those that would be made given infinite data. The method works within pre-specified memory limits and, as long as the data is iid, only requires accessing it sequentially. It gives anytime results, and can be used to produce batch, stream, time-changing and active-learning versions of an algorithm. We apply the method to learning Bayesian networks, developing an algorithm that is faster than previous ones by orders of magnitude, while achieving essentially the same predictive performance. We observe these gains on a series of large databases "generated from benchmark networks, on the KDD Cup 2000 e-commerce data, and on a Web log containing 100 million requests.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
P. Domingos and G. Hulten. Learning from infinite data in finite time. In Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, MA, 2002.
 
5
N. Friedman, I. Nachman, and D. Peér. Learning Bayesian network structure from massive datasets: The "sparse candidate" algorithm. In Proc. 15th Conf. on Uncertainty in Artificial Intelligence, pp. 206--215, Stockholm, Sweden, 1999.
 
6
N. Friedman and Z. Yakhini. On the sample complexity of learning Bayesian networks. In Proc. 12th Conf. on Uncertainty in Artificial Intelligence, pp. 274--282, Portland, OR, 1996.
 
7
 
8
 
9
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13--30, 1963.
 
10
G. Hulten and P. Domingos. A general method for scaling up learning algorithms and its application to Bayesian networks. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA, 2002.
11
12
 
13
O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. In Advances in Neural Information Processing Systems 6. Morgan Kaufmann, San Mateo, CA, 1994.
 
14
15
 
16
 
17
A. Wald. Sequential analysis. Wiley, New York, 1947.

CITED BY  11

Collaborative Colleagues:
Geoff Hulten: colleagues
Pedro Domingos: colleagues