skip to main content
10.1145/3477314.3507091acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article
Public Access

Learning adaptive control in dynamic environments using reproducing kernel priors with bayesian policy gradients

Published: 06 May 2022 Publication History

Abstract

One of the most distinctive characteristics in biological evolution is to not only learn and reinforce knowledge from prior experience, but also develop a solution in (pseudo) real-time for future events by studying the past choices. Inspired by this observation, we aim at developing a systematic methodology of dynamically learning effective control policy for robotic manipulators that deftly avoid dynamic obstacles in fast-changing environments. Unfortunately, dynamical obstacles present time-varying statistical sensory irregularities, making learning based on prior experience much less productive. Furthermore, off-the-shelf policy gradient methods often become computationally expensive, and sometimes intractable, to adapt existing policy for dynamically changing environments. In this paper, to mitigate both of these challenges, we propose to use vector-valued kernel embedding (instead of parameter vectors) to represent policy distribution as features in non-decreasing Euclidean space. Furthermore, we develop policy search algorithm over Bayesian posterior estimation derived from inner-product of a priori Gaussian kernels, allowing the search space to be defined as high (possibly infinite) dimensional Reproducing Kernel Hilbert Space (RKHS).
Our empirical results have shown that our proposed method performs optimally in a collaborative multi-robot setting, where two robot arms can manipulate in dynamic real-world environment, incrementally modifying their motion plan, to maintain a smooth, collision-free manipulation. In particular, comparing against a state-of-the-art DDPG (Deep Deterministic Policy Gradient)-based obstacle avoidance scheme as the baseline, our DRL (Developmental Reinforcement Learning) agent can not only effectively avoid dynamically generated obstacles while achieving its control objective, but do so with ~ 25 times faster in learning performance. A video demo of our simulated and real-setting experimentation has been added in YouTube link https://youtu.be/GMM5V0eBQCs.

References

[1]
Bret D Elderd and Tom EX Miller. 2016. Quantifying demographic uncertainty: Bayesian methods for integral projection models. Ecological Monographs 86, 1 (2016), 125--144.
[2]
Yaakov Engel and Mohammad Ghavamzadeh. 2007. Bayesian policy gradient algorithms. Advances in NIPS 19 (2007), 457.
[3]
Fernando Fernández and Manuela Veloso. 2006. Probabilistic policy reuse in a reinforcement learning agent. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems. 720--727.
[4]
Kevin Frans, Henry M Gunn, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman Openai. 2018. Meta-learning shared strategies: OpenAI. Iclr 2018 (2018), 1--11. https://s3-us-west-2.amazonaws.com/openai-assets/MLSH/mlsh{_}paper.pdf
[5]
Zoubin Ghahramani and Carl E Rasmussen. 2002. Bayesian {Monte Carlo}. Advances in NIPS 1 (2002), 489--496.
[6]
Mohammad Ghavamzadeh and Yaakov Engel. 2007. Bayesian actor-critic algorithms. ACM Intl. Conf. Proceeding Series 227 (2007), 297--304.
[7]
Mohammad Ghavamzadeh, Yaakov Engel, and Michal Valko. 2016. Bayesian policy gradient. In NIPS, Vol. 17.
[8]
Michael A. Goodrich and Alan C. Schultz. 2007. Human-robot interaction: A survey. Foundations and Trends in Human-Computer Interaction 1, 3 (2007), 203--275.
[9]
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E Turner, and Sergey Levine. 2016. Q-prop: Sample-efficient policy gradient with an off-policy critic. arXiv preprint arXiv:1611.02247 (2016).
[10]
Julian Ibarz, Jie Tan, Chelsea Finn, Mrinal Kalakrishnan, Peter Pastor, and Sergey Levine. 2021. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research (2021). arXiv:https://doi.org/10.1177/0278364920987859
[11]
Beomjoon Kim and Joelle Pineau. 2013. Maximum Mean Discrepancy Imitation Learning. In Robotics: Science and systems.
[12]
Dana Kulić and Elizabeth Croft. 2007. Pre-collision safety strategies for human-robot interaction. Autonomous Robots 22, 2 (2007), 149--164.
[13]
Guy Lever and Ronnie Stafford. 2015. Modelling Policies in MDPs in Reproducing Kernel Hilbert Space. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Guy Lebanon and S. V. N. Vishwanathan (Eds.), Vol. 38. PMLR, San Diego, California, USA, 590--598. http://proceedings.mlr.press/v38/lever15.html
[14]
G. Lever and Ronnie Stafford. 2015. Modelling Policies in MDPs in Reproducing Kernel Hilbert Space. In AISTATS.
[15]
Siyuan Li and Chongjie Zhang. 2018. An optimal online method of selecting source policies for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[16]
Yiming Li, Tao Kong, Ruihang Chu, Yifeng Li, Peng Wang, and Lei Li. 2021. Simultaneous Semantic and Collision Learning for 6-DoF Grasp Pose Estimation. arXiv e-prints, Article arXiv:2108.02425 (Aug. 2021), arXiv:2108.02425 pages. arXiv:cs.RO/2108.02425
[17]
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. 4th International Conference on Learning Representations (ICLR 2016) (2016). arXiv:1509.02971v6 http://arxiv.org/abs/1509.02971v6
[18]
Krikamol Muandet, Kenji Fukumizu, Bharath Sriperumbudur, and Bernhard Schölkopf. 2016. Kernel mean embedding of distributions: A review and beyond. arXiv preprint arXiv:1605.09522 (2016).
[19]
Anthony O'Hagan. 1987. Monte Carlo is fundamentally unsound. The Statistician (1987), 247--249.
[20]
Anthony OHagan. 1991. Bayes-hermite quadrature. Journal of statistical planning and inference 29, 3 (1991), 245--260.
[21]
Dirk Ormoneit and Śaunak Sen. 2002. Kernel-based reinforcement learning. Machine learning 49, 2 (2002), 161--178.
[22]
Kei Ota, Devesh K. Jha, Tomoaki Oiki, Mamoru Miura, Takashi Nammoto, Daniel Nikovski, and Toshisada Mariyama. 2019. Trajectory Optimization for Unknown Constrained Systems using Reinforcement Learning. (2019), 3487--3494. arXiv:1903.05751 http://arxiv.org/abs/1903.05751
[23]
Santiago Paternain, Juan Andres Bazerque, and Alejandro Ribeiro. 2020. Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes. arXiv:cs.LG/2010.08443 https://arxiv.org/abs/2010.08443
[24]
Santiago Paternain, Juan Andres Bazerque, and Alejandro Ribeiro. 2021. Stochastic policy gradient ascent in reproducing kernel hilbert spaces. Vol. 66. Transactions on Automatic Control. https://arxiv.org/abs/2010.08443
[25]
Sayyed Jaffar Ali Raza, Apan Dastider, and Mingjie Lin. 2021. Survivable Robotic Control through Guided Bayesian Policy Search with Deep Reinforcement Learning. In 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE). 1188--1193.
[26]
Philipp S Schmitt, Florian Wirnshofer, Kai M Wurm, Georg v Wichert, and Wolfram Burgard. 2019. Planning reactive manipulation in dynamic environments. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 136--143.
[27]
Philipp S. Schmitt, Florian Witnshofer, Kai M. Wurm, Georg V. Wichert, and Wolfram Burgard. 2019. Modeling and planning manipulation in dynamic environments. Proceedings -IEEE International Conference on Robotics and Automation 2019-May (2019), 176--182.
[28]
P. Marlith Jaramillo Tuyen Pham Le, Vien Anh Ngo and TaeChoong Chung. 2019. Importance sampling policy gradient algorithms in reproducing kernel Hilbert space. The Artificial Intelligence Review 52 (2019), 2039--2059.
[29]
V. Spokoiny Y. Nesterov. 2017. Random gradient-free minimization of convex functions. Vol. 17 no. 2. Foundations of Computational Mathematics. 527--566 pages.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
April 2022
2099 pages
ISBN:9781450387132
DOI:10.1145/3477314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RKHS
  2. bayesian learning
  3. deep reinforcement learning
  4. developmental learning
  5. dynamic constrains

Qualifiers

  • Research-article

Funding Sources

Conference

SAC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 144
    Total Downloads
  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)14
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media