skip to main content
10.5555/1597538.1597622guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Sample-efficient evolutionary function approximation for reinforcement learning

Published: 16 July 2006 Publication History

Abstract

Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent's optimal value function. In most real-world problems, learning this value function requires a function approximator, which maps state-action pairs to values via a concise, parameterized function. In practice, the success of function approximators depends on the ability of the human designer to select an appropriate representation for the value function. A recently developed approach called evolutionary function approximation uses evolutionary computation to automate the search for effective representations. While this approach can substantially improve the performance of TD methods, it requires many sample episodes to do so. We present an enhancement to evolutionary function approximation that makes it much more sample-efficient by exploiting the off-policy nature of certain TD methods. Empirical results in a server job scheduling domain demonstrate that the enhanced method can learn better policies than evolution or TD methods alone and can do so in many fewer episodes than standard evolutionary function approximation.

References

[1]
Ackley, D., and Littman, M. 1991. Interactions between learning and evolution. Artificial Life II, SFI Studies in the Sciences of Complexity 10:487-509.
[2]
Baird, L., and Moore, A. 1999. Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems 11. MIT Press.
[3]
Baird, L. 1995. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, 30-37. Morgan Kaufmann.
[4]
Baldwin, J. M. 1896. A new factor in evolution. The American Naturalist 30:441-451.
[5]
Crites, R. H., and Barto, A. G. 1998. Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2-3):235-262.
[6]
Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning.
[7]
Hinton, G. E., and Nowlan, S. J. 1987. How learning can guide evolution. Complex Systems 1:495-502.
[8]
Kephart, J. O., and Chess, D. M. 2003. The vision of autonomic computing. Computer 36(1):41-50.
[9]
Kohl, N., and Stone, P. 2004. Machine learning for fast quadrupedal locomotion. In The Nineteenth National Conference on Artificial Intelligence, 611-616.
[10]
Lagoudakis, M. G., and Parr, R. 2003. Least-squares policy iteration. Journal of Machine Learning Research 4(2003):1107-1149.
[11]
Lin, L.-J. 1992. Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning 8(3-4):293-321.
[12]
Moore, A. W., and Atkeson, C. G. 1993. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning 13(1):103-130.
[13]
Reidmiller, M. 2005. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In Proceedings of the Sixteenth European Conference on Machine Learning, 317-328.
[14]
Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986. Learning internal representations by error propagation. In Parallel Distributed Processing. 318-362.
[15]
Stanley, K. O., and Miikkulainen, R. 2002. Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2):99-127.
[16]
Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
[17]
Sutton, R.; McAllester, D.; Singh, S.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, 1057-1063.
[18]
Tesauro, G. 1994. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2):215-219.
[19]
Walsh, W. E.; Tesauro, G.; Kephart, J. O.; and Das, R. 2004. Utility functions in autonomic systems. In Proceedings of the International Conference on Autonomic Computing, 70-77.
[20]
Watkins, C. 1989. Learning from Delayed Rewards. Ph.D. Dissertation, King's College, Cambridge.
[21]
Whiteson, S., and Stone, P. 2006. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research. To appear.
[22]
Yao, X. 1999. Evolving artificial neural networks. Proceedings of the IEEE 87(9):1423-1447.
  1. Sample-efficient evolutionary function approximation for reinforcement learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    AAAI'06: Proceedings of the 21st national conference on Artificial intelligence - Volume 1
    July 2006
    1005 pages
    ISBN:9781577352815

    Sponsors

    • AAAI: American Association for Artificial Intelligence

    Publisher

    AAAI Press

    Publication History

    Published: 16 July 2006

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media