|
ABSTRACT
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictins. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently know in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
BIRGE, L., AND MASSART, P. 1993. Rates of convergence for minimum contrast estimators. Prob. Theory Rel. Fields 97, 113-150.
|
 |
2
|
|
 |
3
|
Nicolò Cesa-Bianchi , Yoav Freund , David P. Helmbold , David Haussler , Robert E. Schapire , Manfred K. Warmuth, How to use expert advice, Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, p.382-391, May 16-18, 1993, San Diego, California, United States
[doi> 10.1145/167088.167198]
|
| |
4
|
|
 |
5
|
|
| |
6
|
COVER, T.M. 1965. Behaviour of sequential predictors of binary sequences. In Transactions of the 4th Prague Conference on Information Theory, Statistical Decision Functions, Random Processes. Publishing House of the Czechoslovak Academy of Sciences.
|
| |
7
|
COVER, T. M., AND SHANAR, A. 1977. Compound Bayes predictors for sequences with apparent Markov structure. IEEE Trans. Syst. Man Cybernet. SMC-7, 6 (June), 421-424.
|
| |
8
|
DAWID, A. P. 1984. Statistical theory: The prequential approach. J. Roy. Stat. Soc., Series A, 278-292.
|
| |
9
|
DAWlD, A. 1991. Prequential analysis, stochastic complexity and Bayesian inference. In Bayesian Statistics, vol. 4. Oxford University Press, pp. 109-125.
|
| |
10
|
DAWlD, A.P. 1996. Prequential data analysis. Curt. Iss. Stat. Inference, to appear.
|
| |
11
|
Alfredo DeSantis , George Markowsky , Mark N. Wegman, Learning probabilistic prediction functions, Proceedings of the first annual workshop on Computational learning theory, p.312-328, August 03-05, 1988, MIT, Cambridge, Massachusetts, United States
|
| |
12
|
FEDER, M., MERHAV, N., AND GUTMAN, M. 1992. Universal prediction of individual sequences. IEEE Trans. Inf. Theory 38, 1258-1270.
|
| |
13
|
Amos Fiat , Dean P. Foster , Howard Karloff , Yuval Rabani , Yiftach Ravid , Sundar Vishwanathan, Competitive algorithms for layered graph traversal, Proceedings of the 32nd annual symposium on Foundations of computer science, p.288-297, September 1991, San Juan, Puerto Rico
[doi> 10.1109/SFCS.1991.185381]
|
| |
14
|
Amos Fiat , Richard M. Karp , Michael Luby , Lyle A. McGeoch , Daniel D. Sleator , Neal E. Young, Competitive paging algorithms, Journal of Algorithms, v.12 n.4, p.685-699, Dec. 1991
[doi> 10.1016/0196-6774(91)90041-V]
|
| |
15
|
|
| |
16
|
GALAMBOS, J. 1987. The Asymptotic Theory of Extreme Order Statistics, 2nd ed. R. E. Kreiger.
|
| |
17
|
HANNAN, J. 1957. Approximation to Bayes risk in repeated play. In Contributions to the Theory of Games, vol. 3. Princeton University Press, Princeton, N.J., pp. 97-139.
|
| |
18
|
HAUSSLER, D., AND BARRON, A. 1992. How well do Bayes methods work for on-line prediction of { + 1, -1 } values? In Proceedings of the 3rd NEC Symposium on Computation and Cognition. SIAM.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
MERHAV, N., AND FEDER, M. 1993. Universal schemes for sequential decision for individual data sequences. IEEE Trans. Inf. Theory, 39, 4, 1280-1292.
|
| |
32
|
RISSANEN, J. 1978. Modeling by shortest data description. Automatica 14, 465-471.
|
| |
33
|
RISSANEN, J. 1986. Stochastic complexity and modeling. Ann. Star. 14, 3, 1080-1100.
|
| |
34
|
RISSANEN, J., AND LANGDON, G. G., JR. 1981. Universal modeling and coding. IEEE Trans. Inf. Theory IT-27, 1 (Jan.), 12-23.
|
| |
35
|
SEUNG, n. S., SOMPOLINSKY, n., AND TISHBY, N. 1992. Statistical mechanics of learning from examples. Phys. Rev A 45, 8, 6056-6091.
|
| |
36
|
SHTARKOV, YU. M. 1975. Coding of discrete sources with unknown statistics. In Topics in Information Theory. North-Holland, Amsterdam, The Netherlands, pp. 559-574.
|
| |
37
|
SHTARKOV, YU. M. 1987. Universal sequential coding of single messages. Prob. Inf. Transm. 23 (July-September), 175-186.
|
| |
38
|
SOMPOLINSKY, H., TISHBY, N., AND SEUNG, H. S. 1992. Learning from examples in large neural networks. Phys. Rev. Lett. 65, 1683-1686.
|
| |
39
|
STONE, C. J. 1977. Cross-validation: A review. Math. Operationforsch. Statist. Ser Statist. 9, 127-139.
|
| |
40
|
TALAGRAND, M. 1994. Sharper bounds for Gaussian and empirical processes. Ann. Prob. 22, 1, 28-76.
|
 |
41
|
|
| |
42
|
|
| |
43
|
VAPNIK, V. 1992. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems, vol. 4. John E. Moody, Steve J. Hanson, and Richard P. Lippman, eds. Morgan Kaufmann, San Mateo, Calif.
|
| |
44
|
|
| |
45
|
|
| |
46
|
VOVK, V.G. 1993. A logic of probability, with application to the foundations of statistics. J. Roy. Statis Soc, Ser. B-Methodolical 55, 2, 317-351.
|
| |
47
|
|
CITED BY 60
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yiling Chen , Chao-Hsien Chu , Tracy Mullen , David M. Pennock, Information markets vs. opinion pools: an empirical comparison, Proceedings of the 6th ACM conference on Electronic commerce, p.58-67, June 05-08, 2005, Vancouver, BC, Canada
|
|
|
Nir Ailon , Bernard Chazelle , Seshadhri Comandur , Ding Liu, Self-improving algorithms, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, p.261-270, January 22-26, 2006, Miami, Florida
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marco Barreno , Blaine Nelson , Russell Sears , Anthony D. Joseph , J. D. Tygar, Can machine learning be secure?, Proceedings of the 2006 ACM Symposium on Information, computer and communications security, March 21-24, 2006, Taipei, Taiwan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Carlos Diuk , Lihong Li , Bethany R. Leffler, The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning, Proceedings of the 26th Annual International Conference on Machine Learning, p.249-256, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
|
|
|
|
REVIEW
"Giuseppina Carla Gini : Reviewer"
While the title of this paper suggests the use of traditional
expert systems technology, the research reported here concerns how to
make predictions when a team of experts is available and their
predictions are known.
In particular
more...
|