ACM Home Page
Please provide us with feedback. Feedback
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
Full text PdfPdf (251 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm table of contents
Miami, Florida
Pages: 937 - 943  
Year of Publication: 2006
ISBN:0-89871-605-5
Authors
Varsha Dani  University of Chicago
Thomas P. Hayes  University of California at Berkeley
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
: SIAM Activity Group on Discrete Mathematics
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 27,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1109557.1109660
What is a DOI?

ABSTRACT

We consider "online bandit geometric optimization," a problem of iterated decision making in a largely unknown and constantly changing environment. The goal is to minimize "regret," defined as the difference between the actual loss of an online decision-making procedure and that of the best single decision in hindsight. "Geometric optimization" refers to a generalization of the well-known multi-armed bandit problem, in which the decision space is some bounded subset of Rd, the adversary is restricted to linear loss functions, and regret bounds should depend on the dimensionality d, rather than the total number of possible decisions. "Bandit" refers to the setting in which the algorithm is only told its loss on each round, rather than the entire loss function.McMahan and Blum [10] presented the best known algorithm in this setting, and proved that its expected additive regret is O(poly(d)T3/4). We simplify and improve their analysis of this algorithm to obtain regret O(poly(d)T2/3).We also prove that, for a large class of full-information online optimization problems, the optimal regret against an adaptive adversary is the same as against a non-adaptive adversary.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
V. Dani and T. P. Hayes. Beating the adversarial multiarmed bandit through improved hedging. (working title) Manuscript, 2005.
 
5
 
6
J. Hannan. Approximation to Bayes risk in repeated play. In: Contributions to the Theory of Games, vol III, 97--139, M. Dresher, A. W. Tucker and P. Wolfe, editors, Princeton University Press (1957).
 
7
T. Hayes. An Azuma-Hoeffding inequality for vector-valued martingales. Manuscript (2003).
 
8
A. Kalai and S. Vempala. Efficient algorithms for online optimization. In: Proceedings of the 16th annual Conference on Learning Theory (2003).
 
9
 
10
H. B. McMahan and A. Blum. Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th annual Conference on Learning Theory (2004) 109--123.
 
11
H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society. 55 (1952) 527--535.
 
12
M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (2003).


Collaborative Colleagues:
Varsha Dani: colleagues
Thomas P. Hayes: colleagues