|
ABSTRACT
This paper presents a novel model of reinforcement learning agents. A feature of our learning agent model is to integrate analytic hierarchy process (AHP) into a standard reinforcement learning agent model, which consists of three modules: state recognition, learning, and action selecting modules. In our model, AHP module is designed with primary knowledge that human intrinsically should have in order to attain a goal state. This aims at increasing promising actions of agent especially in the earlier stages of learning instead of completely random actions as in the standard reinforcement learning algorithms. We adopt profit-sharing as a reinforcement learning algorithm and demonstrate the potential of our approach on two learning problems of a pursuit problem and a Sokoban problem with deadlock in the grid-world domains, where results indicate that the learning time can be decreased considerably for the problems and our approach efficiently avoids the deadlock for the Sokoban problem. We also show that bad effect that can be usually observed by introducing a priori knowledge into reinforcement learning process can be restrained by a method that decreases a rate of using knowledge during learning.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Arai, K. Miyazaki, and S. Kobayashi. Generating cooperative behavior by multi-agent reinforcement learning. In Proc. of the 6th European Workshop on Learning Robots, pages 111--120, 1997.
|
| |
2
|
S. Arai, Katia P. Sycara, and Terry R. Payne. Experience-based reinforcement learning to acquire effective behavior in a multi-agent domain. In Proc. of the 6th Pacific Rim International Conference on Artificial Intelligence, pages 125--135, 2000.
|
| |
3
|
M. Benda, V. Jagannathan, and R. Dodhiawalla. On optimal cooperation of knowledge sources. Technical Report BCS-G2010-28, Boeing Al Center, Boeing Computer Services, Bellevue, WA, 1985.
|
| |
4
|
M. E. Bratman, D. Israel, and M. E. Pollack. Plans and resource-bounded practical reasoning. Computational Intelligence, 4(4):349--355, 1988.
|
| |
5
|
R. A. Brooks. A robust layered control system for a mobile robot. IEEE Robotics and Automation, 2(1):14--23, 1986.
|
| |
6
|
J. Culberson. Sokoban is PSPACE-complete. In Proceedings in Informatics 4, Fun With Algorithms, E. Lodi, L. Pagli and N. Santoro Eds., pages 65--76, 1999.
|
| |
7
|
K. R. Dixon, R. J. Malak, and P. K. Khosla. Incorporating prior knowledge and previously learned information into reinforcement learning. Technical Report, Institute for Complex Enginerred Systems, Carnegie Mellon University, 2000.
|
| |
8
|
|
| |
9
|
L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4:237--285, 1996.
|
| |
10
|
K. Miyazaki, M. Yamamura, and S. Kobayashi. On the rationality of profit sharing in reinforcement learning. In Proc. of the 3rd International Conference on Fuzzy Logic, Neural Nets and Soft Computing, pages 285--288, 1994.
|
| |
11
|
|
| |
12
|
T. Saaty. The analytic hierarchy process. The McGraw-Hill Companies, 1980.
|
| |
13
|
|
| |
14
|
|
| |
15
|
M. Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proc. of 10th International Conference on Machine Learning, pages 330--337, 1993.
|
| |
16
|
T. Unemi. Scaling up reinforcement learning with human knowledge as an intrinsic behavior. In Proc. of the 6th International Conference on Intelligent Autonomous Systems, pages 511--518, 2000.
|
| |
17
|
|
| |
18
|
|
|