article

Reinforcement Programming

Authors:

George RudolphAuthors Info & Claims

Computational Intelligence, Volume 28, Issue 2

Pages 176 - 208

https://doi.org/10.1111/j.1467-8640.2012.00413.x

Published: 01 May 2012 Publication History

Abstract

Reinforcement Programming (RP) is a new approach to automatically generating algorithms that uses reinforcement learning techniques. This paper introduces the RP approach and demonstrates its use to generate a generalized, in-place, iterative sort algorithm. The RP approach improves on earlier results that use genetic programming (GP). The resulting algorithm is a novel algorithm that is more efficient than comparable sorting routines. RP learns the sort in fewer iterations than GP and with fewer resources. Experiments establish interesting empirical bounds on learning the sort algorithm: A list of size 4 is sufficient to learn the generalized sort algorithm. The training set only requires one element and learning took less than 200,000 iterations. Additionally RP was used to generate three binary addition algorithms: a full adder, a binary incrementer, and a binary adder. © 2012 Wiley Periodicals, Inc.

References

[1]

Baird, L. C. 1995. Residual algorithms: Reinforcement learning with function approximation. In International Conference on Machine Learning, Tahoe City, CA, pp. 30–37.

[2]

Ernst, D., P. Geurts, and L. Wehenkel. 2005. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6: 503556.

Digital Library

[3]

Fonseca, C. M., and P. J. Fleming. 1993. Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In Genetic Algorithms: Proceedings of the Fifth International Conference. Morgan Kaufmann: San Francisco, CA, pp. 416–423.

Digital Library

[4]

Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman: Boston, MA.

Digital Library

[5]

Holland, J. H. 1975. Adaptation in natural and artificial systems. The University of Michigan Press: Ann Arbor, MI.

[6]

Jaakkola, T., S. P. Singh, and M. I. Jordan. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7. The MIT Press: Cambridge, MA, pp. 345–352.

[7]

Kaelbling, L. P., M. L. Littman, and A. P. Moore. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4: 237–285.

Digital Library

[8]

Kinnear, K. E. 1993a. Evolving a sort: Lessons in genetic programming. In Proceedings of the 1993 International Conference on Neural Networks, volume 2. IEEE Press: San Francisco, CA, pp. 881–888.

[9]

Kinnear, K. E. 1993b. Generality and difficulty in genetic programming: Evolving a sort. In Proceedings of the 5th International Conference on Genetic Algorithms, ICGA-93. Edited by S. Forrest. Morgan Kaufmann: San Francisco, CA, pp. 287–294.

Digital Library

[10]

Koza, J. 1990. Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Technical Report STAN-CS-90-1314, Department of Computer Science, Stanford University, Palo Alto, CA.

Digital Library

[11]

Koza, J. R. 1989. Hierarchical genetic algorithms operating on populations of computer programs. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence IJCAI-89. Edited by N. S. Sridharan, volume 1. Morgan Kaufmann: San Francisco, CA, pp. 768–774.

Digital Library

[12]

Koza, J. R. 1992. Hierarchical automatic function definition in genetic programming. In Foundations of Genetic Algorithms 2. Edited by L. D. Whitley. Morgan Kaufmann: San Francisco, CA, pp. 297–318.

[13]

Koza, J. R., F. H. Bennett III, J. L. Hutchings, S. L. Bade, M. A. Keane, and D. Andre. 1997. Evolving sorting networks using genetic programming and rapidly reconfigurable field-programmable gate arrays. In Workshop on Evolvable Systems. International Joint Conference on Artificial Intelligence, Nagoya, Japan, pp. 27–32.

[14]

Lagoudakis, M. G., R. Parr, and L. Bartlett. 2003. Least-squares policy iteration. Journal of Machine Learning Research, 4: 1107–1149.

Digital Library

[15]

Massey, P., J. A. Clark, and S. Stepney. 2005. Evolution of a human-competitive quantum Fourier transform algorithm using genetic programming. In GECCO ’05: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation. ACM Press: New York, pp. 1657–1663.

Digital Library

[16]

McGovern, A., and A. G. Barto. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann: San Francisco, CA, pp. 361–368.

Digital Library

[17]

Mitchell, T. M. 1997. Machine Learning. McGraw-Hill: New York.

Digital Library

[18]

Nonas, E. 1998. Optimising a rule based agent using a genetic algorithm. Technical Report TR-98-07, Department of Computer Science, King’s College London, UK.

[19]

Sekanina, L., and M. Bidlo. 2005. Evolutionary design of arbitrarily large sorting networks using development. Genetic Programming and Evolvable Machines, 6(3): 319–347.

Digital Library

[20]

Spector, L., H. Barnum, H. J. Bernstein, and N. Swamy. 1999. Finding a better-than-classical quantum and/or algorithm using genetic programming. In Proceedings of 1999 Congress on Evolutionary Computation, Washington, DC, pp. 2239–2246.

[21]

Srivastava, S., S. Gulwani, and J. S. Foster. 2010. From program verification to program synthesis. In POPL ’10: Proceedings of the 37th ACM SIGACT-SIGPLAN Conference on Principles of Programming Languages, Madrid, Spain.

Digital Library

[22]

Sutton, R. S., and A. G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA.

Digital Library

[23]

Sutton, R. S., D. Precup, and S. P. Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2): 181–211.

Digital Library

[24]

Urbanowicz, R. J., and J. H. Moore. 2009. Learning classifier systems: A complete introduction, review, and roadmap. Journal of Artificial Evolution and Applications, 2009.

[25]

Watkins, C. J. 1989. Learning from delayed rewards. Ph.D. thesis, Cambridge University, Cambridge, UK.

[26]

White, S., T. R. Martinez, and G. Rudolph. 2010. Generating three binary addition algorithms using reinforcement programming. In Proceedings of the 48th Annual Southeast Regional Conference (ACMSE '10). ACM Press: New York.

Digital Library

[27]

White, S. K. 2006. Reinforcement Programming: A New Technique in Automatic Algorithm Development. Master’s thesis, Brigham Young University, Provo, UT.

[28]

Whiteson, S., and P. Stone. 2006. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 7: 877–917.

Digital Library

[29]

Xu, X., D. Hu, and X. Lu. 2007. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18(4): 973–992.

Digital Library

Cited By

Faust AAimone JJames CTapia L(2018)Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting2018 IEEE Conference on Decision and Control (CDC)10.1109/CDC.2018.8619634(5999-6006)Online publication date: 17-Dec-2018
https://dl.acm.org/doi/10.1109/CDC.2018.8619634

Recommendations

Genetic programming methods for reinforcement learning
GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

Reinforcement Learning (RL) algorithms can be used to optimally solve dynamic decision-making and control problems. With continuous-valued state and input variables, RL algorithms must rely on function approximators to represent the value function and ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Multi-objective Genetic Programming for Explainable Reinforcement Learning
Genetic Programming
Abstract
Deep reinforcement learning has met noticeable successes recently for a wide range of control problems. However, this is typically based on thousands of weights and non-linearities, making solutions complex, not easily reproducible, ...

Comments

Information & Contributors

Information

Published In

cover image Computational Intelligence

Computational Intelligence Volume 28, Issue 2

May 2012

158 pages

ISSN:0824-7935

EISSN:1467-8640

Issue’s Table of Contents

Publisher

Blackwell Publishers, Inc.

United States

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 May 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Faust AAimone JJames CTapia L(2018)Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting2018 IEEE Conference on Decision and Control (CDC)10.1109/CDC.2018.8619634(5999-6006)Online publication date: 17-Dec-2018
https://dl.acm.org/doi/10.1109/CDC.2018.8619634

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents