|
ABSTRACT
In this paper, we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit problem is known to be PSPACE-Hard to approximate to any non-trivial factor, and little progress has been made on this problem despite its significance in modeling activity allocation under uncertainty. We make progress on this problem by showing that for an interesting and general subclass that we term Monotone bandits, a surprisingly simple and intuitive greedy policy yields a factor 2 approximation. Such greedy policies are termed index policies, and are popular due to their simplicity and their optimality for the stochastic multi-armed bandit problem. The Monotone bandit problem strictly generalizes the stochastic multi-armed bandit problem, and naturally models multi-project scheduling where the state of a project becomes increasingly uncertain when the project is not scheduled. We develop several novel techniques in the design and analysis of the index policy. Our algorithm proceeds by introducing a novel "balance" constraint to the dual of a well-known LP relaxation to the restless bandit problem. This is followed by a structural characterization of the optimal solution by using both the exact primal as well as dual complementary slackness conditions. This yields an interpretation of the dual variables as potential functions from which we derive the index policy and the associated analysis.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Adelman and A. J. Mersereau. Relaxations of weakly coupled stochastic dynamic programs. Operations Research, 2008.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
Nicolò Cesa-Bianchi , Yoav Freund , David Haussler , David P. Helmbold , Robert E. Schapire , Manfred K. Warmuth, How to use expert advice, Journal of the ACM (JACM), v.44 n.3, p.427-485, May 1997
[doi> 10.1145/258128.258179]
|
 |
9
|
|
| |
10
|
|
| |
11
|
J. C. Gittins and D. M. Jones. A dynamic allocation index for the sequential design of experiments. Progress in statistics (European Meeting of Statisticians), 1972.
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
S. Guha, K. Munagala, and S. Sarkar. Information acquisition and exploitation in multichannel wireless systems. CoRR, arXiv:0804.1724, 2008.
|
 |
18
|
|
| |
19
|
J. T. Hawkins. A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. PhD thesis, Operations Research Center, Massachusetts Institute of Technology, 2003.
|
| |
20
|
T. Javidi, B. Krishnamachari, Q. Zhao, and M. Liu. Optimality of myopic sensing in multi-channel opportunistic access. In Proc. IEEE International Conf. in Communications (ICC '08), 2008.
|
| |
21
|
S. M. Kakade and M. J. Kearns. Trading in markovian price models. In COLT, pages 606--620, 2005.
|
| |
22
|
R. Levi, M. Pál, R. Roundy, and D. B. Shmoys. Approximation algorithms for stochastic inventory control models. In IPCO, pages 306--320, 2005.
|
| |
23
|
|
| |
24
|
|
| |
25
|
K. Munagala and P. Shi. The stochastic machine replenishment problem. In IPCO, 2008.
|
| |
26
|
J. Le Ny, M. Dahleh, and E. Feron. Multi-UAV dynamic routing with partial observations using restless bandits allocation indices. In Proceedings of the 2008 American Control Conference, 2008.
|
| |
27
|
|
| |
28
|
A. Slivkins and E. Upfal. Adapting to a changing environment: The Brownian restless bandits. In COLT, 2008.
|
| |
29
|
J. N. Tsitsiklis. A short proof of the Gittins index theorem. Annals of Appl. Prob., 4(1):194--199, 1994.
|
| |
30
|
|
| |
31
|
G. Weiss. Branching bandit processes. Probab. Engng. Inform. Sci., 2:269--278, 1988.
|
| |
32
|
P. Whittle. Arm acquiring bandits. Ann. Probab., 9:284--292, 1981.
|
| |
33
|
P. Whittle. Restless bandits: Activity allocation in a changing world. Appl. Prob., 25(A):287--298, 1988.
|
| |
34
|
N. E. Young. The k-server dual and loose competitiveness for paging. Algorithmica, 11(6):525--541, 1994.
|
| |
35
|
Q. Zhao, B. Krishnamachari, and K. Liu. On myopic sensing for multi-channel opportunistic access, 2007. CoRR, arXiv:0712.0035, 2007.
|
|