ACM Home Page
Please provide us with feedback. Feedback
Piecewise-stationary bandit problems with side observations
Full text PdfPdf (626 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 1177-1184  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Jia Yuan Yu  McGill University, Montréal, Québec, Canada
Shie Mannor  McGill University, Montréal, Québec, Canada
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 21,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553524
What is a DOI?

ABSTRACT

We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions and the changes, is O(k log(T)), where k is the number of changes up to time T. This is in contrast to the case where side observations are not available, and where the regret is at least Ω(√T).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Akakpo, N. (2008). Detecting change-points in a discrete distribution via model selection. Preprint. http://arxiv.org/abs/0801.0970.
 
2
 
3
 
4
 
5
 
6
Fuh, C. D. (2004). Asymptotic operating characteristics of an optimal change point detection in hidden Markov models. Ann. Statist., 2305--2339.
 
7
Garivier, A., & Moulines, E. (2008). On upper-confidence bound policies for non-stationary bandit problems. Preprint. http://arxiv.org/abs/0805. 3415.
 
8
Hartland, C., Gelly, S., Baskiotis, N., Teytaud, O., & Sebag, M. (2006). Multi-armed bandit, dynamic environments and meta-bandits. Preprint. http://hal.archives-ouvertes.fr/hal-00113668/en/.
 
9
 
10
Kocsis, L., & Szepesváári, C. (2006). Discounted-UCB. 2nd PASCAL Challenges Workshop.
 
11
Lai, T. L. (2001). Sequential analysis: some classical problems and new challenges. Statistica Sinica, 11, 303--408.
 
12
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4--22.
 
13
 
14
Lorden, G. (1971). Procedures for reacting to a change in distribution. Ann. Math. Statist., 42, 1897--1908.
 
15
Mei, Y. J. (2006). Sequential change-point detection when unknown parameters are present in the pre-change distribution. Ann. Statist., 34, 92--122.
 
16
Page, E. S. (1954). Continuous inspection scheme. Biometrika, 41, 100--115.
 
17
Pollak, M. (1987). Average run lengths of an optimal method of detecting a change in distribution. Ann. Statist., 15, 749--779.
 
18
Shiryayev, A. N. (1963). On optimum methods in quickest detection problems. Theory Probab. Appl., 8, 22--46.

Collaborative Colleagues:
Jia Yuan Yu: colleagues
Shie Mannor: colleagues