| Multi-armed bandit problems with dependent arms |
| Full text |
Pdf
(235 KB)
|
| Source
|
ICML; Vol. 227
archive
Proceedings of the 24th international conference on Machine learning
table of contents
Corvalis, Oregon
Pages: 721 - 728
Year of Publication: 2007
ISBN:978-1-59593-793-3
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 72, Citation Count: 2
|
|
|
ABSTRACT
We provide a framework to exploit dependencies among arms in multi-armed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find an optimal MDP-based policy for the discounted reward case, and also give an approximation of it with formal error guarantee. We discuss lower bounds on regret in the undiscounted reward scenario, and propose a general two-level bandit policy for it. We propose three different instantiations of our general policy and provide theoretical justifications of how the regret of the instantiated policies depend on the characteristics of the clusters. Finally, we empirically demonstrate the efficacy of our policies on large-scale real-world and synthetic data, and show that they significantly outperform classical policies designed for bandits with independent arms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Agrawal, R. (1995). Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054--1078.
|
| |
2
|
Chang, F., & Lai, T. L. (1987). Optimal stopping and dynamic allocation. Advances in Applied Probability, 19, 829--853.
|
| |
3
|
Dasgupta, S. (2005). Coarse sample complexity bounds for active learning. NIPS.
|
| |
4
|
Frostig, E., & Weiss, G. (1999). Four proofs of Gittins' multiarmed bandit theorem. Applied Probability Trust.
|
| |
5
|
J. C. Gittins (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B, 41, 148--177.
|
| |
6
|
Kocsis, L., & Szepesváári, C. (2006). Bandit based monte-carlo planning. ECML.
|
| |
7
|
Lai, T. L. (1987). Adaptive treatment allocation and multi-armed bandit problem. Annals of Statistics, 15(3), 1091--1114.
|
| |
8
|
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4--22.
|
| |
9
|
|
| |
10
|
Pandey, S., Agarwal, D., Chakrabarti, D., & Josifovski, V. (2007). Bandits for taxonomies: A model-based approach. SDM.
|
| |
11
|
|
| |
12
|
|
| |
13
|
Schneider, J., & Moore, A. (2002). Active learning in discrete input spaces. The 34th Interface Symposium.
|
| |
14
|
Wang, C.-C., Kulkarni, S. R., & Poor, H. (2005). Bandit problems with side observations. IEEE Transactions on Automatic Control, 50(3), 338--355.
|
| |
15
|
Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of the Royal Statistcal Society B, 42, 143--149.
|
|