|
ABSTRACT
We propose an algorithm called query by commitee, in which a committee of students is trained on the same data set. The next query is chosen according to the principle of maximal disagreement. The algorithm is studied for two toy models: the high-low game and perceptron learning of another perceptron. As the number of queries goes to infinity, the committee algorithm yields asymptotically finite information gain. This leads to generalization error that decreases exponentially with the number of examples. This in marked contrast to learning from randomly chosen inputs, for which the information gain approaches zero and the generalization error decreases with a relatively slow inverse power law. We suggest that asymptotically finite information gain may be an important characteristic of good query algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
Bau91
|
E. Baum. Neural net algorithms that learn in polynomial time from examples and queries. IEEE Trans. in Neural Networks, 2:5-19, 1991.
|
| |
Fed72
|
V. V. Fedorov. Theory of Optimal Experiments. Academic Press, New York, 1972.
|
| |
GD89
|
E. Gardner and B. Derrida. Three unfinished works on the optimal storage capacity of networks. J. Phys., A22:1983-1994, 1989.
|
| |
GT90
|
G. GySrgyi and N. Tishby. Statistical theory oxc learn{rig a rule. In W. I~. rI)heumann and R. KSberle, editors, Neural Networks and Spin Glasses, pages 3-36, Singapore, 1990. World Scientific.
|
| |
HKS91
|
David Haussler , Michael Kearns , Robert Schapire, Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension, Proceedings of the fourth annual workshop on Computational learning theory, p.61-74, August 05-07, 1991, Santa Cruz, California, United States
|
| |
KR90
|
W. Kinzel and P. Ruj~n. Improving a network generalization ability by selecting examples. Europhys. Lett., 13:473-477, 1990.
|
| |
MP88
|
M. L. Minsky and S. Papert. Percep~rons. MIT Press, Cambridge, expanded edition, 1988.
|
| |
OH91
|
M. Opper and D. Haussler. Generalization performance of bayes optimal classification algorithm for learning a perceptron. Phys. Rev. Left., 66:2677-2680, 1991.
|
| |
SST92
|
H. S. Seung, H. Sompolinsky, and N. Tishby. Statistical mechanics of learning from examples. Phys. Rev., A45:6056-6091, 1992.
|
| |
TLS89
|
N. Tishby, E. Levin, and S. Solla. Consistent inference of probabilities in layered networks: Predictions and generalization. In Proc. Int. Joint Conf. on Neural Networks, volume 2, pages 403-409, Washington, DC, 1989. IEEE.
|
| |
WR92
|
T.L.H. Watkin and A. Rau. Selecting examples for perceptrons. J. Phys., A25:113-121, 1992.
|
CITED BY 101
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vijay S. Iyengar , Chidanand Apte , Tong Zhang, Active learning using adaptive resampling, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.91-98, August 20-23, 2000, Boston, Massachusetts, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jens Kaae Christensen , Kasper Lamberth , Morten Nielsen , Claus Lundegaard , Peder Worning , Sanne Lise Lauemøller , Søren Buus , Søren Brunak , Ole Lund, Selecting informative data for developing peptide-MHC binding predictors using a query by committee approach, Neural Computation, v.15 n.12, p.2931-2942, December 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Li Zhang , ShiXia Liu , Yue Pan , LiPing Yang, InfoAnalyzer: a computer-aided tool for building enterprise taxonomies, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Steven C. H. Hoi , Rong Jin , Jianke Zhu , Michael R. Lyu, Batch mode active learning and its application to medical image classification, Proceedings of the 23rd international conference on Machine learning, p.417-424, June 25-29, 2006, Pittsburgh, Pennsylvania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Robbie Haertel , Eric Ringger , Kevin Seppi , James Carroll , Peter McClanahan, Assessing the costs of sampling methods in active learning for annotation, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, June 16-17, 2008, Columbus, Ohio
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zhenyu Lu , Anand I. Rughani , Bruce I. Tranmer , Josh Bongard, Informative sampling for large unbalanced data sets, Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation, July 12-16, 2008, Atlanta, GA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gang Xiao , Finnegan Southey , Robert C. Holte , Dana Wilkinson, Software testing by active learning for commercial games, Proceedings of the 20th national conference on Artificial intelligence, p.898-903, July 09-13, 2005, Pittsburgh, Pennsylvania
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Weiming Hu , Wei Hu , Nianhua Xie , Steve Maybank, Unsupervised active learning based on hierarchical graph-theoretic clustering, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, v.39 n.5, p.1147-1161, October 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eric Ringger , Peter McClanahan , Robbie Haertel , George Busby , Marc Carmen , James Carroll , Kevin Seppi , Deryle Lonsdale, Active learning for part-of-speech tagging: accelerating corpus annotation, Proceedings of the Linguistic Annotation Workshop, p.101-108, June 28-29, 2007, Prague, Czech Republic
|
|
|
|
|