ACM Home Page
Please provide us with feedback. Feedback
Approximation algorithms for restless bandit problems
Full text PdfPdf (511 KB)
Source Symposium on Discrete Algorithms archive
Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms table of contents
New York, New York
Pages 28-37  
Year of Publication: 2009
Authors
Sudipto Guha  University of Pennsylvania, Philadelphia PA
Kamesh Munagala  Duke University, Durham NC
Peng Shi  Duke University, Durham NC
Publisher
Society for Industrial and Applied Mathematics  Philadelphia, PA, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 133,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this problem by showing that for an interesting and general subclass that we term Monotone bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multi-armed bandit problem. The Monotone bandit problem strictly generalizes the stochastic multi-armed bandit problem, and naturally models multi-project scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by introducing a novel "balance" constraint to the dual of a well-known LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Adelman and A. J. Mersereau. Relaxations of weakly coupled stochastic dynamic programs. Operations Research, 2008.
 
2
 
3
 
4
 
5
 
6
 
7
8
9
 
10
 
11
J. C. Gittins and D. M. Jones. A dynamic allocation index for the sequential design of experiments. Progress in statistics (European Meeting of Statisticians), 1972.
12
 
13
14
 
15
 
16
 
17
S. Guha, K. Munagala, and S. Sarkar. Information acquisition and exploitation in multichannel wireless systems. CoRR, arXiv:0804.1724, 2008.
18
 
19
J. T. Hawkins. A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. PhD thesis, Operations Research Center, Massachusetts Institute of Technology, 2003.
 
20
T. Javidi, B. Krishnamachari, Q. Zhao, and M. Liu. Optimality of myopic sensing in multi-channel opportunistic access. In Proc. IEEE International Conf. in Communications (ICC '08), 2008.
 
21
S. M. Kakade and M. J. Kearns. Trading in markovian price models. In COLT, pages 606--620, 2005.
 
22
R. Levi, M. Pál, R. Roundy, and D. B. Shmoys. Approximation algorithms for stochastic inventory control models. In IPCO, pages 306--320, 2005.
 
23
 
24
 
25
K. Munagala and P. Shi. The stochastic machine replenishment problem. In IPCO, 2008.
 
26
J. Le Ny, M. Dahleh, and E. Feron. Multi-UAV dynamic routing with partial observations using restless bandits allocation indices. In Proceedings of the 2008 American Control Conference, 2008.
 
27
 
28
A. Slivkins and E. Upfal. Adapting to a changing environment: The Brownian restless bandits. In COLT, 2008.
 
29
J. N. Tsitsiklis. A short proof of the Gittins index theorem. Annals of Appl. Prob., 4(1):194--199, 1994.
 
30
 
31
G. Weiss. Branching bandit processes. Probab. Engng. Inform. Sci., 2:269--278, 1988.
 
32
P. Whittle. Arm acquiring bandits. Ann. Probab., 9:284--292, 1981.
 
33
P. Whittle. Restless bandits: Activity allocation in a changing world. Appl. Prob., 25(A):287--298, 1988.
 
34
N. E. Young. The k-server dual and loose competitiveness for paging. Algorithmica, 11(6):525--541, 1994.
 
35
Q. Zhao, B. Krishnamachari, and K. Liu. On myopic sensing for multi-channel opportunistic access, 2007. CoRR, arXiv:0712.0035, 2007.

Collaborative Colleagues:
Sudipto Guha: colleagues
Kamesh Munagala: colleagues
Peng Shi: colleagues