Article

Nonlinear inverse reinforcement learning with Gaussian processes

Authors:

Zoran Popović,

Vladlen KoltunAuthors Info & Claims

NIPS'11: Proceedings of the 24th International Conference on Neural Information Processing Systems

Pages 19 - 27

Published: 12 December 2011 Publication History

Abstract

We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonlinear function, while also determining the relevance of each feature to the expert's policy. Our probabilistic algorithm allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.

References

[1]

P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In ICML '04: Proceedings of the 21st International Conference on Machine Learning, 2004.

[2]

M. P. Deisenroth, C. E. Rasmussen, and J. Peters. Gaussian process dynamic programming. Neurocomputing, 72(7-9):1508-1524, 2009.

[3]

K. Dvijotham and E. Todorov. Inverse optimal control with linearly-solvable MDPs. In ICML '10: Proceedings of the 27th International Conference on Machine Learning, pages 335-342, 2010.

[4]

Y. Engel, S. Mannor, and R. Meir. Reinforcement learning with Gaussian processes. In ICML '05: Proceedings of the 22nd International Conference on Machine learning, pages 201-208, 2005.

[5]

S. Levine, Z. Popović, and V. Koltun. Feature construction for inverse reinforcement learning. In Advances in Neural Information Processing Systems 23. 2010.

[6]

G. Neu and C. Szepesvári. Apprenticeship learning using inverse reinforcement learning and gradient methods. In Uncertainty in Artificial Intelligence (UAI), 2007.

[7]

A. Y. Ng and S. J. Russell. Algorithms for inverse reinforcement learning. In ICML 00: Proceedings of the 17th International Conference on Machine Learning, pages 663-670, 2000.

[8]

J. Quiñonero Candela and C. E. Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6:1939-1959, 2005.

[9]

D. Ramachandran and E. Amir. Bayesian inverse reinforcement learning. In IJCAI'07: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pages 2586-2591, 2007.

[10]

C. E. Rasmussen and M. Kuss. Gaussian processes in reinforcement learning. In Advances in Neural Information Processing Systems 16, 2003.

[11]

C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2005.

[12]

N. Ratliff, J. A. Bagnell, and M. A. Zinkevich. Maximum margin planning. In ICML '06: Proceedings of the 23rd International Conference on Machine Learning, pages 729-736, 2006.

[13]

N. Ratliff, D. Bradley, J. A. Bagnell, and J. Chestnutt. Boosting structured prediction for imitation learning. In Advances in Neural Information Processing Systems 19, 2007.

[14]

N. Ratliff, D. Silver, and J. A. Bagnell. Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27(1):25-53, 2009.

[15]

U. Syed and R. Schapire. A game-theoretic approach to apprenticeship learning. In Advances in Neural Information Processing Systems 20, 2008.

[16]

B. D. Ziebart. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. PhD thesis, Carnegie Mellon University, 2010.

[17]

B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence (AAAI 2008), pages 1433-1438, 2008.

Cited By

Wu GLi YLuo J(2022)Transforming Policy via Reward Advancement2019 IEEE 58th Conference on Decision and Control (CDC)10.1109/CDC40024.2019.9029286(4609-4614)Online publication date: 28-Dec-2022
https://dl.acm.org/doi/10.1109/CDC40024.2019.9029286
Bashiri MZiebart BZhang XRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Distributionally robust imitation learningProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542129(24404-24417)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542129
Giwa BLee C(2021)A Marginal Log-Likelihood Approach for the Estimation of Discount Factors of Multiple Experts in Inverse Reinforcement Learning2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS51168.2021.9636479(7786-7791)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1109/IROS51168.2021.9636479
Show More Cited By

Index Terms

Nonlinear inverse reinforcement learning with Gaussian processes

Recommendations

Bayesian inverse reinforcement learning
IJCAI'07: Proceedings of the 20th international joint conference on Artifical intelligence

Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert. IRL is motivated by situations where knowledge of the rewards is a ...
A survey of inverse reinforcement learning
Abstract
Learning from demonstration, or imitation learning, is the process of learning to act in an environment from examples provided by a teacher. Inverse reinforcement learning (IRL) is a specific form of learning from demonstration that attempts to ...
Reinforcement learning with Gaussian processes for condition-based maintenance
Highlights
- Reinforcement learning for condition-based maintenance with continuous-state MDP.
- Gaussian process regression for function approximation in reinforcement learning.
- Develop a new Gaussian process for reinforcement learning (GPRL) ...
Abstract
Condition-based maintenance strategies are effective in enhancing reliability and safety for complex engineering systems that exhibit degradation phenomena with uncertainty. Such sequential decision-making problems are often modeled as Markov ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'11: Proceedings of the 24th International Conference on Neural Information Processing Systems

December 2011

2752 pages

ISBN:9781618395993

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 12 December 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wu GLi YLuo J(2022)Transforming Policy via Reward Advancement2019 IEEE 58th Conference on Decision and Control (CDC)10.1109/CDC40024.2019.9029286(4609-4614)Online publication date: 28-Dec-2022
https://dl.acm.org/doi/10.1109/CDC40024.2019.9029286
Bashiri MZiebart BZhang XRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Distributionally robust imitation learningProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542129(24404-24417)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542129
Giwa BLee C(2021)A Marginal Log-Likelihood Approach for the Estimation of Discount Factors of Multiple Experts in Inverse Reinforcement Learning2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS51168.2021.9636479(7786-7791)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1109/IROS51168.2021.9636479
Kalinowska APrabhakar AFitzsimons KMurphey T(2021)Ergodic imitation: Learning from what to do and what not to do2021 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48506.2021.9561746(3648-3654)Online publication date: 30-May-2021
https://dl.acm.org/doi/10.1109/ICRA48506.2021.9561746
Tirinzoni APoiani RRestelli MDaumé HSingh A(2020)Sequential transfer in reinforcement learning with a generative modelProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525817(9481-9492)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525817
Kalweit GHuegle MWerling MBoedecker JLarochelle HRanzato MHadsell RBalcan MLin H(2020)Deep inverse Q-learning with constraintsProceedings of the 34th International Conference on Neural Information Processing Systems10.5555/3495724.3496922(14291-14302)Online publication date: 6-Dec-2020
https://dl.acm.org/doi/10.5555/3495724.3496922
Guo XChang SYu MTesauro GCampbell M(2019)Hybrid reinforcement learning with expert state sequencesProceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v33i01.33013739(3739-3746)Online publication date: 27-Jan-2019
https://dl.acm.org/doi/10.1609/aaai.v33i01.33013739
Xuan JLu JZhang G(2019)A Survey on Bayesian Nonparametric LearningACM Computing Surveys10.1145/329104452:1(1-36)Online publication date: 25-Jan-2019
https://dl.acm.org/doi/10.1145/3291044
Yin HMelo FPaiva ABillard A(2019)An ensemble inverse optimal control approach for robotic task learning and adaptationAutonomous Robots10.1007/s10514-018-9757-y43:4(875-896)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1007/s10514-018-9757-y
Vazquez-Chanlatte MJha STiwari AHo MSeshia S(2018)Learning task specifications from demonstrationsProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327442(5372-5382)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327345.3327442
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents