skip to main content
10.5555/3091125.3091208acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections

Reward Shaping in Episodic Reinforcement Learning

Published: 08 May 2017 Publication History


Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming because the learning algorithms have to determine the long term consequences of their actions using delayed feedback or rewards. Reward shaping is a method of incorporating domain knowledge into reinforcement learning so that the algorithms are guided faster towards more promising solutions. Under an overarching theme of episodic reinforcement learning, this paper shows a unifying analysis of potential-based reward shaping which leads to new theoretical insights into reward shaping in both model-free and model-based algorithms, as well as in multi-agent reinforcement learning.


P. Abbeel, M. Quigley, and A. Y. Ng. Using inaccurate models in reinforcement learning. In Proc. of ICML, pages 1--8. ACM, 2006.
J. Asmuth, M. L. Littman, and R. Zinkov. Potential-based shaping in model-based reinforcement learning. In Proceedings of AAAI, 2008.
C. Boutilier. Sequential optimality and coordination in multiagent systems. In Proceedings of the International Joint Conferrence on Artificial Intelligencekue, pages 478--485, 1999.
C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1--94, 1999.
R. I. Brafman and M. Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. JMLR, 3:213--231, 2002.
S. Devlin and D. Kudenko. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of AAMAS, 2011.
S. Devlin and D. Kudenko. Dynamic potential-based reward shaping. In Proceedings of AAMAS, 2012.
A. Eck, L.-K. Soh, S. Devlin, and D. Kudenko. Potential-based reward shaping for finite horizon online POMDP planning. Journal of Autonomous Agents and Multiagent Systems, 30(3):403--445, 2016.
M. Grzes. Improving exploration in reinforcement learning through domain knowledge and parameter analysis. PhD thesis, University of York, 2010.
M. Grzes and D. Kudenko. Online learning of shaping rewards in reinforcement learning. Neural Networks, 23:541--550, 2010.
J. Hoey, R. St-Aubin, A. Hu, and C. Boutilier. SPUDD: Stochastic planning using decision diagrams. In Proc. of UAI, pages 279--288, 1999.
L. P. Kaelbling, M. L. Littman, and A. P. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237--285, 1996.
M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209--232, 2002.
L. Kocsis and C. Szepesvári. Bandit based monte-carlo planning. In Proc. of ECML, number 4012 in LNCS, pages 282--293. Springer, 2006.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proc. of NIPS, pages 1097--1105. 2012.
L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293--321, 1992.
X. Lu, H. M. Schwartz, and S. N. Givigi. Policy invariance under reward transformations for general-sum stochastic games. Journal of Artificial Intelligence Research, 41(2):397--406, May 2011.
B. Marthi. Automatic shaping and decomposition of reward functions. In Proceedings of the 24th International Conference on Machine Learning, pages 601--608, 2007.
M. J. Mataric. Reward functions for accelerated learning. In Proceedings of the 11th International Conference on Machine Learning, pages 181--189, 1994.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529--533, 02 2015.
A. Moore. Efficient Memory-Based Learning for Robot Control. PhD thesis, University of Cambridge, November 1990.
A. Y. Ng, D. Harada, and S. J. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, pages 278--287, 1999.
J. Peng and R. J. Williams. Efficient learning and planning within the dyna framework. In Proceedings of the 1993 IEEE International Conference on Neural Networks, pages 168--174, 1993.
F. Pommerening, G. Röger, M. Helmert, and B. Bonet. Heuristics for cost-optimal classical planning based on linear programming. In Proc. of IJCAI, pages 4303--4309, 2015.
M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994.
J. Randløv. Solving Complex Problems with Reinforcement Learning. PhD thesis, University of Copenhagen, 2001.
A. L. Strehl and M. L. Littman. An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences, 74:1309--1331, 2008.
I. Szita and C. Szepesvári. Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proc. of ICML, pages 1031--1038, 2010.
F. Trevizan, S. Thiébaux, P. Santana, and B. Williams. Heuristic search in dual space for constrained stochastic shortest path problems. In Proc. of AAAI, 2016.

Cited By

View all



Information & Contributors


Published In

cover image ACM Other conferences
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems
May 2017
1914 pages





International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2017

Check for updates

Author Tags

  1. multiagent learning
  2. potential-based reward shaping
  3. reinforcement learning
  4. reward shaping
  5. reward structures for learning


  • Research-article

Acceptance Rates

AAMAS '17 Paper Acceptance Rate 127 of 457 submissions, 28%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics


Cited By

View all

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.








Share this Publication link

Share on social media