|
ABSTRACT
In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm is that its running time becomes independent of the size of the database, while the decisions made are essentially identical to those that would be made given infinite data. The method works within pre-specified memory limits and, as long as the data is iid, only requires accessing it sequentially. It gives anytime results, and can be used to produce batch, stream, time-changing and active-learning versions of an algorithm. We apply the method to learning Bayesian networks, developing an algorithm that is faster than previous ones by orders of magnitude, while achieving essentially the same predictive performance. We observe these gains on a series of large databases "generated from benchmark networks, on the KDD Cup 2000 e-commerce data, and on a Web log containing 100 million requests.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
P. Domingos and G. Hulten. Learning from infinite data in finite time. In Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, MA, 2002.
|
| |
5
|
N. Friedman, I. Nachman, and D. Peér. Learning Bayesian network structure from massive datasets: The "sparse candidate" algorithm. In Proc. 15th Conf. on Uncertainty in Artificial Intelligence, pp. 206--215, Stockholm, Sweden, 1999.
|
| |
6
|
N. Friedman and Z. Yakhini. On the sample complexity of learning Bayesian networks. In Proc. 12th Conf. on Uncertainty in Artificial Intelligence, pp. 274--282, Portland, OR, 1996.
|
| |
7
|
|
| |
8
|
|
| |
9
|
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13--30, 1963.
|
| |
10
|
G. Hulten and P. Domingos. A general method for scaling up learning algorithms and its application to Bayesian networks. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA, 2002.
|
 |
11
|
|
 |
12
|
|
| |
13
|
O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. In Advances in Neural Information Processing Systems 6. Morgan Kaufmann, San Mateo, CA, 1994.
|
| |
14
|
|
 |
15
|
Foster Provost , David Jensen , Tim Oates, Efficient progressive sampling, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.23-32, August 15-18, 1999, San Diego, California, United States
[doi> 10.1145/312129.312188]
|
| |
16
|
|
| |
17
|
A. Wald. Sequential analysis. Wiley, New York, 1947.
|
CITED BY 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas Seidl , Ira Assent , Philipp Kranen , Ralph Krieger , Jennifer Herrmann, Indexing density models for incremental learning and anytime classification on data streams, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|