skip to main content
article

Reinforcement Programming

Published: 01 May 2012 Publication History

Abstract

Reinforcement Programming (RP) is a new approach to automatically generating algorithms that uses reinforcement learning techniques. This paper introduces the RP approach and demonstrates its use to generate a generalized, in-place, iterative sort algorithm. The RP approach improves on earlier results that use genetic programming (GP). The resulting algorithm is a novel algorithm that is more efficient than comparable sorting routines. RP learns the sort in fewer iterations than GP and with fewer resources. Experiments establish interesting empirical bounds on learning the sort algorithm: A list of size 4 is sufficient to learn the generalized sort algorithm. The training set only requires one element and learning took less than 200,000 iterations. Additionally RP was used to generate three binary addition algorithms: a full adder, a binary incrementer, and a binary adder. © 2012 Wiley Periodicals, Inc.

References

[1]
Baird, L. C. 1995. Residual algorithms: Reinforcement learning with function approximation. In International Conference on Machine Learning, Tahoe City, CA, pp. 30–37.
[2]
Ernst, D., P. Geurts, and L. Wehenkel. 2005. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6: 503556.
[3]
Fonseca, C. M., and P. J. Fleming. 1993. Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization. In Genetic Algorithms: Proceedings of the Fifth International Conference. Morgan Kaufmann: San Francisco, CA, pp. 416–423.
[4]
Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman: Boston, MA.
[5]
Holland, J. H. 1975. Adaptation in natural and artificial systems. The University of Michigan Press: Ann Arbor, MI.
[6]
Jaakkola, T., S. P. Singh, and M. I. Jordan. 1995. Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems, volume 7. The MIT Press: Cambridge, MA, pp. 345–352.
[7]
Kaelbling, L. P., M. L. Littman, and A. P. Moore. 1996. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4: 237–285.
[8]
Kinnear, K. E. 1993a. Evolving a sort: Lessons in genetic programming. In Proceedings of the 1993 International Conference on Neural Networks, volume 2. IEEE Press: San Francisco, CA, pp. 881–888.
[9]
Kinnear, K. E. 1993b. Generality and difficulty in genetic programming: Evolving a sort. In Proceedings of the 5th International Conference on Genetic Algorithms, ICGA-93. Edited by S. Forrest. Morgan Kaufmann: San Francisco, CA, pp. 287–294.
[10]
Koza, J. 1990. Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Technical Report STAN-CS-90-1314, Department of Computer Science, Stanford University, Palo Alto, CA.
[11]
Koza, J. R. 1989. Hierarchical genetic algorithms operating on populations of computer programs. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence IJCAI-89. Edited by N. S. Sridharan, volume 1. Morgan Kaufmann: San Francisco, CA, pp. 768–774.
[12]
Koza, J. R. 1992. Hierarchical automatic function definition in genetic programming. In Foundations of Genetic Algorithms 2. Edited by L. D. Whitley. Morgan Kaufmann: San Francisco, CA, pp. 297–318.
[13]
Koza, J. R., F. H. Bennett III, J. L. Hutchings, S. L. Bade, M. A. Keane, and D. Andre. 1997. Evolving sorting networks using genetic programming and rapidly reconfigurable field-programmable gate arrays. In Workshop on Evolvable Systems. International Joint Conference on Artificial Intelligence, Nagoya, Japan, pp. 27–32.
[14]
Lagoudakis, M. G., R. Parr, and L. Bartlett. 2003. Least-squares policy iteration. Journal of Machine Learning Research, 4: 1107–1149.
[15]
Massey, P., J. A. Clark, and S. Stepney. 2005. Evolution of a human-competitive quantum Fourier transform algorithm using genetic programming. In GECCO ’05: Proceedings of the 2005 Conference on Genetic and Evolutionary Computation. ACM Press: New York, pp. 1657–1663.
[16]
McGovern, A., and A. G. Barto. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the 18th International Conference on Machine Learning. Morgan Kaufmann: San Francisco, CA, pp. 361–368.
[17]
Mitchell, T. M. 1997. Machine Learning. McGraw-Hill: New York.
[18]
Nonas, E. 1998. Optimising a rule based agent using a genetic algorithm. Technical Report TR-98-07, Department of Computer Science, King’s College London, UK.
[19]
Sekanina, L., and M. Bidlo. 2005. Evolutionary design of arbitrarily large sorting networks using development. Genetic Programming and Evolvable Machines, 6(3): 319–347.
[20]
Spector, L., H. Barnum, H. J. Bernstein, and N. Swamy. 1999. Finding a better-than-classical quantum and/or algorithm using genetic programming. In Proceedings of 1999 Congress on Evolutionary Computation, Washington, DC, pp. 2239–2246.
[21]
Srivastava, S., S. Gulwani, and J. S. Foster. 2010. From program verification to program synthesis. In POPL ’10: Proceedings of the 37th ACM SIGACT-SIGPLAN Conference on Principles of Programming Languages, Madrid, Spain.
[22]
Sutton, R. S., and A. G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press: Cambridge, MA.
[23]
Sutton, R. S., D. Precup, and S. P. Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2): 181–211.
[24]
Urbanowicz, R. J., and J. H. Moore. 2009. Learning classifier systems: A complete introduction, review, and roadmap. Journal of Artificial Evolution and Applications, 2009.
[25]
Watkins, C. J. 1989. Learning from delayed rewards. Ph.D. thesis, Cambridge University, Cambridge, UK.
[26]
White, S., T. R. Martinez, and G. Rudolph. 2010. Generating three binary addition algorithms using reinforcement programming. In Proceedings of the 48th Annual Southeast Regional Conference (ACMSE '10). ACM Press: New York.
[27]
White, S. K. 2006. Reinforcement Programming: A New Technique in Automatic Algorithm Development. Master’s thesis, Brigham Young University, Provo, UT.
[28]
Whiteson, S., and P. Stone. 2006. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 7: 877–917.
[29]
Xu, X., D. Hu, and X. Lu. 2007. Kernel-based least squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks, 18(4): 973–992.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computational Intelligence
Computational Intelligence  Volume 28, Issue 2
May 2012
158 pages
ISSN:0824-7935
EISSN:1467-8640
Issue’s Table of Contents

Publisher

Blackwell Publishers, Inc.

United States

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 May 2012

Author Tags

  1. automatic code generation
  2. genetic programming
  3. reinforcement learning
  4. reinforcement programming
  5. state representation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media