ACM Home Page
Please provide us with feedback. Feedback
Learning when to stop thinking and do something!
Full text PdfPdf (901 KB)
Source ACM International Conference Proceeding Series; Vol. 382 archive
Proceedings of the 26th Annual International Conference on Machine Learning table of contents
Montreal, Quebec, Canada
Pages 825-832  
Year of Publication: 2009
ISBN:978-1-60558-516-1
Authors
Barnabás Póczos  University of Alberta, Edmonton, AB, Canada
Yasin Abbasi-Yadkori  University of Alberta, Edmonton, AB, Canada
Csaba Szepesvári  University of Alberta, Edmonton, AB, Canada
Russell Greiner  University of Alberta, Edmonton, AB, Canada
Nathan Sturtevant  University of Alberta, Edmonton, AB, Canada
Sponsors
: MITACS
: NSF
Microsoft Research : Microsoft Research
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 32,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1553374.1553480
What is a DOI?

ABSTRACT

An anytime algorithm is capable of returning a response to the given task at essentially any time; typically the quality of the response improves as the time increases. Here, we consider the challenge of learning when we should terminate such algorithms on each of a sequence of iid tasks, to optimize the expected average reward per unit time. We provide a system for addressing this challenge, which combines the global optimizer Cross-Entropy method with local gradient ascent. This paper theoretically investigates how far the estimated gradient is from the true gradient, then empirically demonstrates that this system is effective by applying it to a toy problem, as well as on a real-world face detection task.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Baxter, J., & Bartlett, P. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 319--350.
 
2
Bradski, G., & Kaehler, A. (2008). Learning Open CV: Computer vision with the Open CV library. O'Reilly Press.
 
3
 
4
 
5
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13--30.
 
6
Lienhart, R., Kuranov, A., & Pisarevsky, V. (2003). Empirical analysis of detection cascades of boosted classifiers for rapid object detection. DAGM'03, 25th Pattern Recognition Symposium (pp. 297--304).
7
 
8
Rubinstein, R. Y., & Kroese, D. P. (2004). The crossentropy method. Information Science and Statistics. Springer.
 
9
Russell, S., & Subramanian, D. (1995). Provably bounded-optimal agents. Journal of Artificial Intelligence Research, 2, 575--609.
 
10
Shiryaev, A. (2007). Optimal stopping rules. Springer.
 
11
 
12
Turney, P. (2000). Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning (International Conference on Machine Learning).
 
13
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. Computer Vision and Pattern Recognition (pp. 511--518).
 
14
Wald, A., & Wolfowitz, J. (1948). Optimum character of the sequential probability ratio test. Annals of Mathematical Statistics, 19, 326--339.
 
15
 
16
Zilberstein, S. (1996). Using anytime algorithms in intelligent systems. AI magazine, 17, 73--83.

Collaborative Colleagues:
Barnabás Póczos: colleagues
Yasin Abbasi-Yadkori: colleagues
Csaba Szepesvári: colleagues
Russell Greiner: colleagues
Nathan Sturtevant: colleagues