Article

Sample-efficient evolutionary function approximation for reinforcement learning

Authors:

Shimon Whiteson,

Peter StoneAuthors Info & Claims

AAAI'06: Proceedings of the 21st national conference on Artificial intelligence - Volume 1

Pages 518 - 523

Published: 16 July 2006 Publication History

Abstract

Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent's optimal value function. In most real-world problems, learning this value function requires a function approximator, which maps state-action pairs to values via a concise, parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. A recently developed approach called evolutionary function approximation uses evolutionary computation to automate the search for effective representations. While this approach can substantially improve the performance of TD methods, it requires many sample episodes to do so. We present an enhancement to evolutionary function approximation that makes it much more sample-efficient by exploiting the off-policy nature of certain TD methods. Empirical results in a server job scheduling domain demonstrate that the enhanced method can learn better policies than evolution or TD methods alone and can do so in many fewer episodes than standard evolutionary function approximation.

References

[1]

Ackley, D., and Littman, M. 1991. Interactions between learning and evolution. Artificial Life II, SFI Studies in the Sciences of Complexity 10:487-509.

[2]

Baird, L., and Moore, A. 1999. Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems 11. MIT Press.

Digital Library

[3]

Baird, L. 1995. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, 30-37. Morgan Kaufmann.

[4]

Baldwin, J. M. 1896. A new factor in evolution. The American Naturalist 30:441-451.

[5]

Crites, R. H., and Barto, A. G. 1998. Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2-3):235-262.

Digital Library

[6]

Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning.

Digital Library

[7]

Hinton, G. E., and Nowlan, S. J. 1987. How learning can guide evolution. Complex Systems 1:495-502.

Digital Library

[8]

Kephart, J. O., and Chess, D. M. 2003. The vision of autonomic computing. Computer 36(1):41-50.

Digital Library

[9]

Kohl, N., and Stone, P. 2004. Machine learning for fast quadrupedal locomotion. In The Nineteenth National Conference on Artificial Intelligence, 611-616.

Digital Library

[10]

Lagoudakis, M. G., and Parr, R. 2003. Least-squares policy iteration. Journal of Machine Learning Research 4(2003):1107-1149.

Digital Library

[11]

Lin, L.-J. 1992. Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning 8(3-4):293-321.

Digital Library

[12]

Moore, A. W., and Atkeson, C. G. 1993. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13(1):103-130.

[13]

Reidmiller, M. 2005. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In Proceedings of the Sixteenth European Conference on Machine Learning, 317-328.

Digital Library

[14]

Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986. Learning internal representations by error propagation. In Parallel Distributed Processing. 318-362.

Digital Library

[15]

Stanley, K. O., and Miikkulainen, R. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2):99-127.

Digital Library

[16]

Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

Digital Library

[17]

Sutton, R.; McAllester, D.; Singh, S.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, 1057-1063.

[18]

Tesauro, G. 1994. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2):215-219.

Digital Library

[19]

Walsh, W. E.; Tesauro, G.; Kephart, J. O.; and Das, R. 2004. Utility functions in autonomic systems. In Proceedings of the International Conference on Autonomic Computing, 70-77.

Digital Library

[20]

Watkins, C. 1989. Learning from Delayed Rewards. Ph.D. Dissertation, King's College, Cambridge.

[21]

Whiteson, S., and Stone, P. 2006. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research. To appear.

Digital Library

[22]

Yao, X. 1999. Evolving artificial neural networks. Proceedings of the IEEE 87(9):1423-1447.

Sample-efficient evolutionary function approximation for reinforcement learning
1. Computing methodologies

Recommendations

Evolutionary Function Approximation for Reinforcement Learning

Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods require a function approximator to represent the value ...
Reinforcement learning algorithms with function approximation: Recent advances and applications

In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal ...
Learning classifier system equivalent with reinforcement learning with function approximation
GECCO '05: Proceedings of the 7th annual workshop on Genetic and evolutionary computation

We present an experimental comparison of the reinforcement process between Learning Classifier System (LCS) and Reinforcement Learning (RL) with function approximation (FA) method, regarding their generalization mechanisms. To validate our previous ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'06: Proceedings of the 21st national conference on Artificial intelligence - Volume 1

July 2006

1005 pages

ISBN:9781577352815

Editor:
Anthony Cohn
University of Leeds

Sponsors

AAAI: American Association for Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 16 July 2006

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents