Article

Free access

Data-efficient hierarchical reinforcement learning

Authors:

Sergey LevineAuthors Info & Claims

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Pages 3307 - 3317

Published: 03 December 2018 Publication History

PDF eReader Publisher Site

Abstract

Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios. In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control. For generality, we develop a scheme where lower-level controllers are supervised with goals that are learned and proposed automatically by the higher-level controllers. To address efficiency, we propose to use off-policy experience for both higherand lower-level training. This poses a considerable challenge, since changes to the lower-level behaviors change the action space for the higher-level policy, and we introduce an off-policy correction to remedy this challenge. This allows us to take advantage of recent advances in off-policy model-free RL to learn both higher-and lower-level policies using substantially fewer environment interactions than on-policy algorithms. We term the resulting HRL agent HIRO and find that it is generally applicable and highly sample-efficient. Our experiments show that HIRO can be used to learn highly complex behaviors for simulated robots, such as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of real-time interaction. In comparisons with a number of prior HRL methods, we find that our approach substantially outperforms previous state-of-the-art techniques.

References

[1]

Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. In Advances in Neural Information Processing Systems, pages 5048-5058, 2017.

Digital Library

[2]

Pierre-Luc Bacon, Jean Harb, and Doina Precup. The option-critic architecture. In AAAI, pages 1726-1734, 2017.

Digital Library

[3]

Gabriel Barth-Maron, Matthew W Hoffman, David Budden, Will Dabney, Dan Horgan, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617, 2018.

[4]

Andrew G Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4):341-379, 2003.

Digital Library

[5]

Nuttapong Chentanez, Andrew G Barto, and Satinder P Singh. Intrinsically motivated reinforcement learning. In Advances in neural information processing systems, pages 1281-1288, 2005.

[6]

Christian Daniel, Gerhard Neumann, and Jan Peters. Hierarchical relative entropy policy search. In Artificial Intelligence and Statistics, pages 273-281, 2012.

Digital Library

[7]

Peter Dayan and Geoffrey E Hinton. Feudal reinforcement learning. In Advances in neural information processing systems, pages 271-278, 1993.

Digital Library

[8]

Thomas G Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 2000.

[9]

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, pages 1329-1338, 2016.

Digital Library

[10]

Carlos Florensa, Yan Duan, and Pieter Abbeel. Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012, 2017.

[11]

Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. Meta learning shared hierarchies. International Conference on Learning Representations (ICLR), 2018.

[12]

Scott Fujimoto, Herke van Hoof, and Dave Meger. Addressing function approximation error in actor-critic methods. arXiv preprint arXiv:1802.09477, 2018.

[13]

Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, pages 3389-3396. IEEE, 2017.

[14]

Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E Turner, and Sergey Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. arXiv preprint arXiv:1611.02247, 2016.

[15]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Offpolicy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018.

[16]

Jean Harb, Pierre-Luc Bacon, Martin Klissarov, and Doina Precup. When waiting is not an option: Learning options with a deliberation cost. arXiv preprint arXiv:1709.04571, 2017.

[17]

Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.

[18]

Nicolas Heess, Greg Wayne, Yuval Tassa, Timothy Lillicrap, Martin Riedmiller, and David Silver. Learning and transfer of modulated locomotor controllers. arXiv preprint arXiv:1610.05182, 2016.

[19]

David Held, Xinyang Geng, Carlos Florensa, and Pieter Abbeel. Automatic goal generation for reinforcement learning agents. arXiv preprint arXiv:1705.06366, 2017.

[20]

Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. Vime: Variational information maximizing exploration. In Advances in Neural Information Processing Systems, pages 1109-1117, 2016.

Digital Library

[21]

George Konidaris and Andrew G Barto. Building portable options: Skill transfer in reinforcement learning. In IJCAI, volume 7, pages 895-900, 2007.

Digital Library

[22]

Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in neural information processing systems, pages 3675-3683, 2016.

[23]

Sergey Levine, Shane Gu, and Vitchyr Pong. Temporal difference model learning: Model-free deep rl for model-based control. 2018.

[24]

Andrew Levy, Robert Platt, and Kate Saenko. Hierarchical actor-critic. arXiv preprint arXiv:1712.00948, 2017.

[25]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.

[26]

Sridhar Mahadevan and Mauro Maggioni. Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research, 8(Oct):2169-2231, 2007.

Digital Library

[27]

Shie Mannor, Ishai Menache, Amit Hoze, and Uri Klein. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the twenty-first international conference on Machine learning, page 71. ACM, 2004.

Digital Library

[28]

Ofir Nachum, Mohammad Norouzi, Kelvin Xu, and Dale Schuurmans. Trust-pcl: An off-policy trust region method for continuous control. arXiv preprint arXiv:1707.01891, 2017.

[29]

Ronald Parr and Stuart J Russell. Reinforcement learning with hierarchies of machines. In Advances in neural information processing systems, pages 1043-1049, 1998.

[30]

Matthias Plappert, Marcin Andrychowicz, Alex Ray, Bob McGrew, Bowen Baker, Glenn Powell, Jonas Schneider, Josh Tobin, Maciek Chociej, Peter Welinder, et al. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.

[31]

Vitchyr Pong, Shixiang Gu, Murtaza Dalal, and Sergey Levine. Temporal difference models: Model-free deep rl for model-based control. International Conference on Learning Representations, 2018.

[32]

Doina Precup. Temporal abstraction in reinforcement learning. University of Massachusetts Amherst, 2000.

Digital Library

[33]

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, and Sergey Levine. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.

[34]

Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. Universal value function approximators. In International Conference on Machine Learning, pages 1312-1320, 2015.

Digital Library

[35]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International Conference on Machine Learning, pages 1889-1897, 2015.

Digital Library

[36]

Olivier Sigaud and Freek Stulp. Policy search in continuous action domains: an overview. arXiv preprint arXiv:1803.04706, 2018.

[37]

Martin Stolle and Doina Precup. Learning options in reinforcement learning. In International Symposium on abstraction, reformulation, and approximation, pages 212-223. Springer, 2002.

Digital Library

[38]

Richard S Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M Pilarski, Adam White, and Doina Precup. Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 761-768. International Foundation for Autonomous Agents and Multiagent Systems, 2011.

Digital Library

[39]

Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(12):181-211, 1999.

Digital Library

[40]

Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J Mankowitz, and Shie Mannor. A deep hierarchical approach to lifelong learning in minecraft. In AAAI, volume 3, page 6, 2017.

Digital Library

[41]

Matej Večerík, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, and Martin Riedmiller. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.

[42]

Alexander Vezhnevets, Volodymyr Mnih, Simon Osindero, Alex Graves, Oriol Vinyals, John Agapiou, et al. Strategic attentive writer for learning macro-actions. In Advances in neural information processing systems, pages 3486-3494, 2016.

[43]

Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. arXiv preprint arXiv:1703.01161, 2017.

Cited By

Ma HVo TLeong TDastani MSichman JAlechina NDignum V(2024)Mixed-Initiative Bayesian Sub-Goal Optimization in Hierarchical Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662991(1328-1336)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662991
Park JOh SKim YSerra ESpezzano F(2024)Novelty-aware Graph Traversal and Expansion for Hierarchical Reinforcement LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679523(1846-1855)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679523
Ou WLuo BWang BZhao Y(2024)Modular hierarchical reinforcement learning for multi-destination navigation in hybrid crowdsNeural Networks10.1016/j.neunet.2023.12.032171:C(474-484)Online publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.12.032
Show More Cited By

Data-efficient hierarchical reinforcement learning
1. Computing methodologies

Recommendations

Hierarchical Reinforcement Learning: A Comprehensive Survey

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious ...
Design and analysis of efficient reinforcement learning algorithms
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

December 2018

11021 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2018

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
117
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)10

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ma HVo TLeong TDastani MSichman JAlechina NDignum V(2024)Mixed-Initiative Bayesian Sub-Goal Optimization in Hierarchical Reinforcement LearningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3662991(1328-1336)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3662991
Park JOh SKim YSerra ESpezzano F(2024)Novelty-aware Graph Traversal and Expansion for Hierarchical Reinforcement LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679523(1846-1855)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679523
Ou WLuo BWang BZhao Y(2024)Modular hierarchical reinforcement learning for multi-destination navigation in hybrid crowdsNeural Networks10.1016/j.neunet.2023.12.032171:C(474-484)Online publication date: 17-Apr-2024
https://dl.acm.org/doi/10.1016/j.neunet.2023.12.032
Tao SLi XMu THuang ZQin YSu HKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Abstract-to-executable trajectory translation for one-shot task generalizationProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619819(33850-33882)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619819
Luo SChen JHu ZZhang CZhuang BAgmon NAn BRicci AYeoh W(2023)Hierarchical Reinforcement Learning with Attention RewardProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3599084(2804-2806)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3599084
Ma HVo TLeong TAgmon NAn BRicci AYeoh W(2023)Hierarchical Reinforcement Learning with Human-AI Collaborative Sub-Goals OptimizationProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598917(2310-2312)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3598917
Yang XHuang SSun YYang YYu CTu WYang HWang YAgmon NAn BRicci AYeoh W(2023)Learning Graph-Enhanced Commander-Executor for Multi-Agent NavigationProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598822(1652-1660)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3598822
Daoudi PRobu BPrieur CDos Santos LBarlier MAgmon NAn BRicci AYeoh W(2023)Enhancing Reinforcement Learning Agents with Local GuidesProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems10.5555/3545946.3598718(829-838)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.5555/3545946.3598718
Sun QZhang LYu HZhang WMei YXiong HSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Hierarchical Reinforcement Learning for Dynamic Autonomous Vehicle Navigation at Intelligent IntersectionsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599839(4852-4861)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599839
Ivanov RJothimurugan KHsu SVaidya SAlur RBastani O(2021)Compositional Learning and Verification of Neural Network ControllersACM Transactions on Embedded Computing Systems10.1145/347702320:5s(1-26)Online publication date: 22-Sep-2021
https://dl.acm.org/doi/10.1145/3477023
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents