ACM Home Page
Please provide us with feedback. Feedback
Efficient reinforcement learning
Full text PdfPdf (823 KB)
Source Annual Workshop on Computational Learning Theory archive
Proceedings of the seventh annual conference on Computational learning theory table of contents
New Brunswick, New Jersey, United States
Pages: 88 - 97  
Year of Publication: 1994
ISBN:0-89791-655-7
Author
Claude-Nicolas Fiechter  Univ. of Pittsburgh, Pittsburgh, PA
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 28,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/180139.181019
What is a DOI?

ABSTRACT

In this paper we propose a new formal model for studying reinforcement learning, based on Valiant's PAC framework. In our model the learner does not have direct access to every state of the environment. Instead, every sequence of experiments starts in a fixed initial state and the learner is provided with a “reset” operation that interrupts the current sequence of experiments and starts a new one (from the initial state). We do not require the agent to learn the optimal policy but only a good approximation of it with high probability. More precisely, we require the learner to produce a policy whose expected value from the initial state is &egr;-close to that of the optimal policy, with probability no less than 1−&dgr;. For this model, we describe an algorithm that produces such an (&egr;,&dgr;)-optimal policy for any environment, in time polynomial in N,K,1/&egr;,1/&dgr;,1/(1−&bgr;) and rmax, where N is the number of states of the environment, K is the maximum number of actions in a state, &bgr; is the discount factor and rmax is the maximum reward on any transition.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A.G. Barto, R.S. Sutton, and C.W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man and Cybernetics, 13:834-846, 1983.
 
2
A.G. Barto, R.S. Sutton, and C. Watkins. Learning and sequential decision making. In M. Gabriel and J.W. Moore, editors, Learning and Computational Ncuroacie~cc. MIT Press, Cambridge, MA, 1990.
 
3
 
4
J.J. Grefenstette. Credit assignment in rule discovery systems based on genetic algorithms. Machine Learning, 5:355-382, 1989.
 
5
 
6
 
7
 
8
Wassily Hoeffding. Probability inequalities for sum of bounded random variables. Journal of the Amer. ican Statistical Association, 58(301), 1963.
 
9
J.H. Holland. Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R.S. Michalski, }.G. Carbonell, and T.M. Mitchell, editors, Machine Learning: An Artfcial Intelligence Approach (Volume II). Morgan Kaufmann, San Mateo, CA, 1986.
 
10
S. Koenig and R.G. Simmons. Complexity analysis of real-time reinforcement learning. In Proceedings of the Tenth International Conference on Machine Learning, pages 99-105. Morgan Kaufmann, 1993.
 
11
H. Mine and S. Osaki. Markovian Decision Processes. Modern Analytic and Computational Methods in Science and Mathematics. Elsevier, New York, NY, 1970.
 
12
 
13
 
14
 
15
16
 
17
C. Watkins. Learning From Delayed Rewards. PhD thesis, Cambridge University, Cambridge, MA, 1989.
 
18

CITED BY  13


REVIEW

"Jaak Tepandi : Reviewer"

A new formal model for studying reinforcement learning is presented. In reinforcement learning, an agent learns a task that involves a sequence of actions, by interacting with the environment. The agent learns to associate appropriate actions   more...

Collaborative Colleagues:
Claude-Nicolas Fiechter: colleagues