|
ABSTRACT
This paper investigates reinforcement learning (RL) in XCS. First, it formally shows that XCS implements a method of generalized RL based on linear approximators, in which the usual input mapping function translates the state-action space into a niche relative fitness space. Then, it shows that, although XCS has always been related to standard RL, XCS is actually a method of averaging RL. More precisely, XCS with gradient descent can be actually derived from the typical update of averaging RL. It is noted that the use of averaging RL in XCS introduces an intrinsic preference toward classifiers with a smaller fitness in the niche. It is argued that, because of the accuracy pressure in XCS, this results in an additional preference toward specificity. A very simple experiment is presented to support this hypothesis. The same approach is applied to XCS with computed prediction (XCSF) and similar conclusions are drawn.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pages 30--77. Morgan Kaufman Publishers, San Francisco, CA., July 1995.
|
| |
2
|
L. Booker. Approximating value functions in classifier systems. In Bull and Kovacs {4}, pages 45--61.
|
| |
3
|
J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 369--376, Cambridge, MA, 1995. The MIT Press.
|
| |
4
|
L. Bull and T. Kovacs, editors. Foundations of Learning Classifier Systems, volume 183 of Studies in Fuzziness and Soft Computing. Springer, 2005.
|
| |
5
|
M. Butz, D. G. Goldberg, and P. L. Lanzi. Gradient descent methods in learning classifier systems. Technical Report 2003028, Illinois Genetic Algorithms Laboratory -- University of Illinois at Urbana-Champaign, 117 Transportation Building, 104 S. Mathews Avenue, Urbana, IL 61801, Jan. 2003.
|
| |
6
|
M. V. Butz, D. E. Goldberg, and P. L. Lanzi. Gradient descent methods in learning classifier systems: Improving xcs performance in multistep problems. IEEE Transaction on Evolutionary Computation, 9(5):452--473, Oct. 2005.
|
| |
7
|
M. V. Butz and S. W. Wilson. An algorithmic description of xcs. Journal of Soft Computing, 6(3--4):144--153, 2002.
|
| |
8
|
G. J. Gordon. Online fitted reinforcement learning from the value function approximation. Workshop on Value Function Approximation held during the 12th International Conference on Machine Learning, 1995.
|
| |
9
|
T. Kovacs. Strength or Accuracy: Credit Assignment in Learning Classifier Systems. Distinguished Dissertations. Springer, 2004.
|
| |
10
|
P. L. Lanzi. An Analysis of Generalization in the XCS Classifier System. Evolutionary Computation Journal, 7(2):125--149, 1999.
|
| |
11
|
P. L. Lanzi. Learning classifier systems from a reinforcement learning perspective. Soft Computing - A Fusion of Foundations, Methodologies and Applications, 6(3):162--170, 2002.
|
| |
12
|
P. L. Lanzi, D. Loiacono, S. W. Wilson, and D. E. Goldberg. Generalization in the xcsf classifier system: Analysis, improvement, and extension. Technical Report 2005012, Illinois Genetic Algorithms Laboratory -- University of Illinois at Urbana-Champaign, 2005.
|
| |
13
|
|
| |
14
|
S. I. Reynolds. Reinforcement Learning with Exploration. PhD thesis, School of Computer Science. The University of Birmingham, Birmingham, B15 2TT, Dec. 2002.
|
| |
15
|
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1038--1044. The MIT Press, Cambridge, MA., 1996.
|
| |
16
|
|
| |
17
|
|
| |
18
|
J. N. Tsitsiklis and B. V. Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674--690, May 1997.
|
| |
19
|
A. Wada, K. Takadama, and K. Shimohara. Learning classifier system equivalent with reinforcement learning with function approximation, 2005. Eighth International Workshop on Learning Classifier Systems (IWLCS-2005).
|
| |
20
|
A. Wada, K. Takadama, K. Shimohara, and O. Katai. Learning classifier systems with convergence and generalization. In Bull and Kovacs {4}, pages 285--304.
|
| |
21
|
C. Watkins. Learning from delayed reward. PhD thesis, 1989.
|
| |
22
|
|
| |
23
|
M. Wiering. Explorations in Efficient Reinforcement Learning. PhD thesis, Universiteit van Amsterdam, The Netherlands, Feb. 1999.
|
| |
24
|
M. Wiering. Convergence and divergence in standard and averaging reinforcement learning. In Proceedings of the 15th European Conference on Machine Learning, volume 3201, pages 477--488. Springer-Verlag, Berlin, Heidelberg, 2004.
|
| |
25
|
S. W. Wilson. Classifier Fitness Based on Accuracy. Evolutionary Computation, 3(2):149--175, 1995. http://prediction-dynamics.com/.
|
| |
26
|
S. W. Wilson. Generalization in the XCS classifier system. In Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 665--674. Morgan Kaufmann, 1998.
|
| |
27
|
|
|