ACM Home Page
Please provide us with feedback. Feedback
The ratio index for budgeted learning, with applications
Full text PdfPdf (482 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages 18-27  
Year of Publication: 2009
Authors
Ashish Goel  Stanford University
Sanjeev Khanna  University of Pennsylvania, Philadelphia PA
Brad Null  Stanford University
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 65,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Constant factor approximation algorithms for this problem were developed by Guha and Munagala by rounding a linear program that couples the various alternatives together. In this paper we present an index for this problem, which we call the ratio index, which also guarantees a constant factor approximation. Index-based policies have the advantage that a single number (i.e. the index) can be computed for each alternative irrespective of all other alternatives, and the alternative with the highest index is experimented upon. This is analogous to the famous Gittins index for the discounted multi-armed bandit problem.

The ratio index has several interesting structural properties. First, we show that it can be computed in strongly polynomial time. Second, we show that with the appropriate discount factor, the Gittins index and our ratio index are constant factor approximations of each other, and hence the Gittins index also gives a constant factor approximation to the budgeted learning problem. Finally, we show that the ratio index can be used to create an index-based policy that achieves an O(1)-approximation for the finite horizon version of the multi-armed bandit problem. Moreover, the policy does not require any knowledge of the horizon (whereas we compare its performance against an optimal strategy that is aware of the horizon). This yields the following surprising result: there is an index-based policy that achieves an O(1)-approximation for the multi-armed bandit problem, oblivious to the underlying discount factor.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
P. Bhattacharya, L. Georgiadis, and P. Tsoucas. Extended Polymatroids, Properties, and Optimization. Integer Programming and Combinatorial Optimization, IPCO2, ed. E. Bala, G Cornnejols, and R. Kannan, Carnegie-Mellon University 298--315, 1992.
 
2
R. Bellman. A Problem in the Sequential Design of Experiments. Sankhia, 16:221--229, 1956.
 
3
 
4
 
5
A. Blum and Y. Mansour. Learning, Regret Minimization, and Equilibria. Algorithmic Game Theory, ed. N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani. 79--102, 2007.
 
6
M. Charikar, C. Chekuri, and M. Pal. Sampling Bounds for Stochastic Optimization. APPROX-RANDOM 257--269, 2005.
 
7
 
8
 
9
E. Frostig and G. Weiss. Four Proofs of Gittins' Multiarmed Bandit Theorem. Applied Probability Trust, November 10, 1999.
 
10
J. C. Gittins and D. M. Jones. A Dynamic Allocation Index for the Sequential Design of Experiments. Progress in Statistics: European Meeting of Statisticians, Budapest, 1972, ed. J. Ganu, K. Sarkadi, and I. Vince. 241--266, 1974.
 
11
J. C. Gittins. Bandit Processes and Dynamic Allocation Indices. J Royal Statistical Societe Series B, 14:148--167, 1979.
 
12
J. C. Gittins. Multiarmed Bandits Allocation Indices, Wiley, New York, 1989.
 
13
K. D. Glazebrook and R. Garbe. Almost Optimal Policies for Stochastic Systems which Almost Satisfy Conservation Laws. Annals of Operations Research, 92:19--43, 1999.
 
14
K. D. Glazebrook and D. J. Wilkinson. Index-Based Policies for Discounted Multi-armed Bandits on Parallel Machines. The Annals of Applied Probability, 10:3:877--896, 2000.
15
 
16
 
17
A. Goel, S. Khanna, and B. Null. The Ratio Index for Budgeted Learning, with Applications. Math arXiv:0810.0558v1, http://arxiv.org/abs/0810.0558.
 
18
19
 
20
 
21
D. Luenberger. Linear and Nonlinear Programming, Reading, MA, 1984.
 
22
23
 
24
J. Schneider and A. Moore. Active Learning in Discrete Input Spaces. Proceedings of the 34th Interface Symposium, 2002.
 
25
 
26
P. Tsoucas. The Region of Achievable Performance in a Model of Klimov. Research Report RC16543, IBM T. J. Watson Research Center, Yorktown Heights, New York, 1991.
 
27
A. Wald. Sequential Analysis, J. Wiley & Sons, New York, 212p, 1947.
 
28
P. Whittle. Restless Bandits: Activity Allocation in a Changing World. A Celebration of Applied Probability. J. Applied Probability, 25A:287--298, 1988.

Collaborative Colleagues:
Ashish Goel: colleagues
Sanjeev Khanna: colleagues
Brad Null: colleagues