ACM Home Page
Please provide us with feedback. Feedback
Online convex optimization in the bandit setting: gradient descent without a gradient
Full text PdfPdf (850 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms table of contents
Vancouver, British Columbia
SESSION: Session 4C table of contents
Pages: 385 - 394  
Year of Publication: 2005
ISBN:0-89871-585-7
Authors
Abraham D. Flaxman  Carnegie Mellon University
Adam Tauman Kalai  Toyota Technical Institute at Chicago
H. Brendan McMahan  Carnegie Mellon University
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
: SIAM Activity Group on Discrete Mathematics
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 48,   Citation Count: 10
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

We study a general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c1, c2,..., and in each period, we choose a feasible point xt in S, and learn the cost ct(xt). If the function ct is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of O(√n). That is, after n rounds, the total cost incurred will be O(√n) more than the cost of the best single feasible decision chosen with the benefit of hindsight, minx Σ ct(x).We extend this to the "bandit" setting, where, in each period, only the cost ct(xt) is revealed, and bound the expected regret as O(n3/4).Our approach uses a simple approximation of the gradient that is computed from evaluating ct at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it is possible to use gradient descent without seeing anything more than the value of the functions at a single point. The guarantees hold even in the most general case: online against an adaptive adversary.For the online linear optimization problem [15], algorithms with low regrets in the bandit setting have recently been given against oblivious [1] and adaptive adversaries [19]. In contrast to these algorithms, which distinguish between explicit explore and exploit periods, our algorithm can be interpreted as doing a small amount of exploration in each period.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
J. Birge and F. Loveaux. Introduction to Stochastic Programming. Springer Verlag, New York, 1997.
 
6
Thomas Cover. Universal Portfolios. In Mathematical Finance1, 1--29, 1991.
 
7
A. Frieze and R. Kannan. Log-Sobolev inequalities and sampling from log-concave distributions. Annals of Applied Probability, 9 (1999), 14--26.
 
8
O. N. Granichin. Stochastic approximation with input perturbation for dependent observation disturbances. Vestn. Leningrad. Gos Univ., 1989, Ser. 1, no. 4, 27--31.
 
9
 
10
David P. Helmbold, Robert E. Schapire, Yoram Singer, and Manfred K. Warmuth. On-line portfolio selection using multiplicative updates. In Mathematical Finance, 8(4):325--347, 1998.
 
11
L. Khachiyan. A polynomial algorithm in linear programming (in Russian). Doklady Akedamii Nauk SSSR, 244, 1093--1096, 1979.
 
12
R. Kleinberg. Nearly Tight Bounds for the Continuum Armed Bandit Problem. To appear in Proc. 18th Annual Conf. on Neural Information Processing Systems, 2004.
 
13
S. Kirkpatrick, C. D. Gelatt Jr., M. P. Vecchi. Optimization by Simulated Annealing. Science, 220, 4598, 671--680, 1983.
 
14
 
15
A. Kalai and S. Vempala. Efficient algorithms for the online decision problem. In Proceedings of the 16th Conference on Computational Learning Theory, 2003.
 
16
 
17
 
18
 
19
H. Brendan McMahan and Avrim Blum. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary. In Proceedings of the 17th Annual Conference on Learning Theory, COLT 2004.
 
20
N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys., 21, 6, 1087--1092. 1953.
 
21
V. D. Milman and A. Pajor. Isotropic position and inertia ellipsoids and zonoids of the unit ball of a normed n-dimensional space. Lecture Notes in Mathematics 1376, Springer, Berlin (1989), 64--104.
 
22
 
23
 
24
 
25
M. Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In Proceedings of the Twentieth International Conference on Machine Learning, pp. 928--936, 2003.

CITED BY  12
 
 
 
 
 
 
Collaborative Colleagues:
Abraham D. Flaxman: colleagues
Adam Tauman Kalai: colleagues
H. Brendan McMahan: colleagues