| Learning when to stop thinking and do something! |
| Full text |
Pdf
(901 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 825-832
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
Barnabás Póczos
|
University of Alberta, Edmonton, AB, Canada
|
|
Yasin Abbasi-Yadkori
|
University of Alberta, Edmonton, AB, Canada
|
|
Csaba Szepesvári
|
University of Alberta, Edmonton, AB, Canada
|
|
Russell Greiner
|
University of Alberta, Edmonton, AB, Canada
|
|
Nathan Sturtevant
|
University of Alberta, Edmonton, AB, Canada
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 32, Citation Count: 0
|
|
|
ABSTRACT
An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide a system for addressing this challenge, which combines the global optimizer Cross-Entropy method with local gradient ascent. This paper theoretically investigates how far the estimated gradient is from the true gradient, then empirically demonstrates that this system is effective by applying it to a toy problem, as well as on a real-world face detection task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Baxter, J., & Bartlett, P. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 319--350.
|
| |
2
|
Bradski, G., & Kaehler, A. (2008). Learning Open CV: Computer vision with the Open CV library. O'Reilly Press.
|
| |
3
|
|
| |
4
|
|
| |
5
|
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13--30.
|
| |
6
|
Lienhart, R., Kuranov, A., & Pisarevsky, V. (2003). Empirical analysis of detection cascades of boosted classifiers for rapid object detection. DAGM'03, 25th Pattern Recognition Symposium (pp. 297--304).
|
 |
7
|
|
| |
8
|
Rubinstein, R. Y., & Kroese, D. P. (2004). The crossentropy method. Information Science and Statistics. Springer.
|
| |
9
|
Russell, S., & Subramanian, D. (1995). Provably bounded-optimal agents. Journal of Artificial Intelligence Research, 2, 575--609.
|
| |
10
|
Shiryaev, A. (2007). Optimal stopping rules. Springer.
|
| |
11
|
|
| |
12
|
Turney, P. (2000). Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning (International Conference on Machine Learning).
|
| |
13
|
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition (pp. 511--518).
|
| |
14
|
Wald, A., & Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. Annals of Mathematical Statistics, 19, 326--339.
|
| |
15
|
|
| |
16
|
Zilberstein, S. (1996). Using anytime algorithms in intelligent systems. AI magazine, 17, 73--83.
|
|