skip to main content
10.5555/3091125.3091208acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Reward Shaping in Episodic Reinforcement Learning

Published: 08 May 2017 Publication History

Abstract

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming because the learning algorithms have to determine the long term consequences of their actions using delayed feedback or rewards. Reward shaping is a method of incorporating domain knowledge into reinforcement learning so that the algorithms are guided faster towards more promising solutions. Under an overarching theme of episodic reinforcement learning, this paper shows a unifying analysis of potential-based reward shaping which leads to new theoretical insights into reward shaping in both model-free and model-based algorithms, as well as in multi-agent reinforcement learning.

References

[1]
P. Abbeel, M. Quigley, and A. Y. Ng. Using inaccurate models in reinforcement learning. In Proc. of ICML, pages 1--8. ACM, 2006.
[2]
J. Asmuth, M. L. Littman, and R. Zinkov. Potential-based shaping in model-based reinforcement learning. In Proceedings of AAAI, 2008.
[3]
C. Boutilier. Sequential optimality and coordination in multiagent systems. In Proceedings of the International Joint Conferrence on Artificial Intelligencekue, pages 478--485, 1999.
[4]
C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1--94, 1999.
[5]
R. I. Brafman and M. Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. JMLR, 3:213--231, 2002.
[6]
S. Devlin and D. Kudenko. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of AAMAS, 2011.
[7]
S. Devlin and D. Kudenko. Dynamic potential-based reward shaping. In Proceedings of AAMAS, 2012.
[8]
A. Eck, L.-K. Soh, S. Devlin, and D. Kudenko. Potential-based reward shaping for finite horizon online POMDP planning. Journal of Autonomous Agents and Multiagent Systems, 30(3):403--445, 2016.
[9]
M. Grzes. Improving exploration in reinforcement learning through domain knowledge and parameter analysis. PhD thesis, University of York, 2010.
[10]
M. Grzes and D. Kudenko. Online learning of shaping rewards in reinforcement learning. Neural Networks, 23:541--550, 2010.
[11]
J. Hoey, R. St-Aubin, A. Hu, and C. Boutilier. SPUDD: Stochastic planning using decision diagrams. In Proc. of UAI, pages 279--288, 1999.
[12]
L. P. Kaelbling, M. L. Littman, and A. P. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237--285, 1996.
[13]
M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209--232, 2002.
[14]
L. Kocsis and C. Szepesvári. Bandit based monte-carlo planning. In Proc. of ECML, number 4012 in LNCS, pages 282--293. Springer, 2006.
[15]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proc. of NIPS, pages 1097--1105. 2012.
[16]
L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293--321, 1992.
[17]
X. Lu, H. M. Schwartz, and S. N. Givigi. Policy invariance under reward transformations for general-sum stochastic games. Journal of Artificial Intelligence Research, 41(2):397--406, May 2011.
[18]
B. Marthi. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine Learning, pages 601--608, 2007.
[19]
M. J. Mataric. Reward functions for accelerated learning. In Proceedings of the 11th International Conference on Machine Learning, pages 181--189, 1994.
[20]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 02 2015.
[21]
A. Moore. Efficient Memory-Based Learning for Robot Control. PhD thesis, University of Cambridge, November 1990.
[22]
A. Y. Ng, D. Harada, and S. J. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, pages 278--287, 1999.
[23]
J. Peng and R. J. Williams. Efficient learning and planning within the dyna framework. In Proceedings of the 1993 IEEE International Conference on Neural Networks, pages 168--174, 1993.
[24]
F. Pommerening, G. Röger, M. Helmert, and B. Bonet. Heuristics for cost-optimal classical planning based on linear programming. In Proc. of IJCAI, pages 4303--4309, 2015.
[25]
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994.
[26]
J. Randløv. Solving Complex Problems with Reinforcement Learning. PhD thesis, University of Copenhagen, 2001.
[27]
A. L. Strehl and M. L. Littman. An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences, 74:1309--1331, 2008.
[28]
I. Szita and C. Szepesvári. Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proc. of ICML, pages 1031--1038, 2010.
[29]
F. Trevizan, S. Thiébaux, P. Santana, and B. Williams. Heuristic search in dual space for constrained stochastic shortest path problems. In Proc. of AAAI, 2016.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems
May 2017
1914 pages

Sponsors

  • IFAAMAS

In-Cooperation

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

  1. multiagent learning
  2. potential-based reward shaping
  3. reinforcement learning
  4. reward shaping
  5. reward structures for learning

Qualifiers

  • Research-article

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media