| Piecewise-stationary bandit problems with side observations |
| Full text |
Pdf
(626 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 1177-1184
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
Jia Yuan Yu
|
McGill University, Montréal, Québec, Canada
|
|
Shie Mannor
|
McGill University, Montréal, Québec, Canada
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 21, Citation Count: 0
|
|
|
ABSTRACT
We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions and the changes, is O(k log(T)), where k is the number of changes up to time T. This is in contrast to the case where side observations are not available, and where the regret is at least Ω(√T).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Akakpo, N. (2008). Detecting change-points in a discrete distribution via model selection. Preprint. http://arxiv.org/abs/0801.0970.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Fuh, C. D. (2004). Asymptotic operating characteristics of an optimal change point detection in hidden Markov models. Ann. Statist., 2305--2339.
|
| |
7
|
Garivier, A., & Moulines, E. (2008). On upper-confidence bound policies for non-stationary bandit problems. Preprint. http://arxiv.org/abs/0805. 3415.
|
| |
8
|
Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., & Sebag, M. (2006). Multi-armed bandit, dynamic environments and meta-bandits. Preprint. http://hal.archives-ouvertes.fr/hal-00113668/en/.
|
| |
9
|
|
| |
10
|
Kocsis, L., & Szepesváári, C. (2006). Discounted-UCB. 2nd PASCAL Challenges Workshop.
|
| |
11
|
Lai, T. L. (2001). Sequential analysis: some classical problems and new challenges. Statistica Sinica, 11, 303--408.
|
| |
12
|
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4--22.
|
| |
13
|
|
| |
14
|
Lorden, G. (1971). Procedures for reacting to a change in distribution. Ann. Math. Statist., 42, 1897--1908.
|
| |
15
|
Mei, Y. J. (2006). Sequential change-point detection when unknown parameters are present in the pre-change distribution. Ann. Statist., 34, 92--122.
|
| |
16
|
Page, E. S. (1954). Continuous inspection scheme. Biometrika, 41, 100--115.
|
| |
17
|
Pollak, M. (1987). Average run lengths of an optimal method of detecting a change in distribution. Ann. Statist., 15, 749--779.
|
| |
18
|
Shiryayev, A. N. (1963). On optimum methods in quickest detection problems. Theory Probab. Appl., 8, 22--46.
|
|