ACM Home Page
Please provide us with feedback. Feedback
Anytime algorithms for multi-armed bandit problems
Full text PdfPdf (266 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm table of contents
Miami, Florida
Pages: 928 - 936  
Year of Publication: 2006
ISBN:0-89871-605-5
Author
Robert Kleinberg  Cornell Univesity, Ithaca, NY
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
: SIAM Activity Group on Discrete Mathematics
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 63,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1109557.1109659
What is a DOI?

ABSTRACT

How should a decision-maker perform repeated choices so as to optimize the average cost or benefit of those choices in the long run? This question motivates the theory of online learning, which encompasses problems such as the well-known best-expert [13, 9] and multi-armed bandit [10, 1] problems. This paper concerns a new approach to dealing with multi-armed bandit problems in which the decision-maker's strategy set is large (exponential or possibly infinite). Recent theoretical progress on the analysis of algorithms for such problems (e.g. [2, 3, 8, 11, 14]) has led to improved online algorithms for problems in areas such as online routing [2], dynamic pricing mechanisms [4, 5, 12], and analysis of reputation systems in e-commerce and peer-to-peer networks [3].


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
B. Awerbuch and R. Kleinberg, Competitive collaborative learning, in Proceedings of the 18th Annual Conference on Learning Theory (COLT), 2005. To appear.
 
4
 
5
 
6
M. Boddy, Anytime problem solving using dynamic programming, Proceedings of National Conference on Artificial Intelligence (AAAI), (1991).
 
7
E. Cope, Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces, 2004. Unpublished manuscript.
 
8
 
9
 
10
J. C. Gittins and D. M. Jones, A dynamic allocation index for the sequential design of experiments, in Progress in Statistics, J. G. et al., ed., North-Holland, 1974, pp. 241--266.
 
11
R. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, in Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou, eds., MIT Press, Cambridge, MA, 2005, pp. 697--704.
 
12
 
13
 
14
H. B. McMahan and A. Blum, Online geometric optimization in the bandit setting against an adaptive adversary, in Proceedings of the 17th Annual Conference on Learning Theory (COLT), vol. 3120 of Lecture Notes in Computer Science, Springer Verlag, 2004, pp. 109--123.
 
15
S. Zilberstein, Operational rationality through compilation of anytime algorithms, 1993.