| Knows what it knows: a framework for self-aware learning |
| Full text |
Pdf
(335 KB)
|
| Source
|
ICML; Vol. 307
archive
Proceedings of the 25th international conference on Machine learning
table of contents
Helsinki, Finland
Pages 568-575
Year of Publication: 2008
ISBN:978-1-60558-205-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 61, Citation Count: 3
|
|
|
ABSTRACT
We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcement-learning and active-learning problems. We catalog several KWIK-learnable classes and open problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Bagnell, J., Ng, A. Y., & Schneider, J. (2001). Solving uncertain Markov decision problems (Technical Report CMU-RI-TR-01-25). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Cesa-Bianchi, N., Lugosi, G., & Stoltz, G. (2005). Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51, 2152--2162.
|
| |
7
|
|
| |
8
|
Fong, P. W. L. (1995). A quantitative study of hypothesis selection. Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) (pp. 226--234).
|
 |
9
|
Yoav Freund , Robert E. Schapire , Yoram Singer , Manfred K. Warmuth, Using and combining predictors that specialize, Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, p.334-343, May 04-06, 1997, El Paso, Texas, United States
[doi> 10.1145/258533.258616]
|
| |
10
|
|
| |
11
|
Kakade, S., Kearns, M., & Langford, J. (2003). Exploration in metric state spaces. Proceedings of the 20th International Conference on Machine Learning.
|
| |
12
|
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
Strehl, A. L., Diuk, C., & Littman, M. L. (2007). Efficient structure learning in factored-state MDPs. Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI-07).
|
| |
18
|
Strehl, A. L., & Littman, M. L. (2008). Online linear regression and its application to model-based reinforcement learning. Advances in Neural Information Processing Systems 20.
|
 |
19
|
Alexander L. Strehl , Chris Mesterharm , Michael L. Littman , Haym Hirsh, Experience-efficient learning in associative bandit problems, Proceedings of the 23rd international conference on Machine learning, p.889-896, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143956]
|
 |
20
|
|
 |
21
|
|
CITED BY 3
|
|
|
|
|
Nicolò Cesa-Bianchi , Claudio Gentile , Francesco Orabona, Robust bounds for classification via selective sampling, Proceedings of the 26th Annual International Conference on Machine Learning, p.121-128, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Carlos Diuk , Lihong Li , Bethany R. Leffler, The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning, Proceedings of the 26th Annual International Conference on Machine Learning, p.249-256, June 14-18, 2009, Montreal, Quebec, Canada
|
|