research-article

Public Access

Learning adaptive control in dynamic environments using reproducing kernel priors with bayesian policy gradients

Authors:

Apan Dastider,

Sayyed Jaffar Ali Raza,

Mingjie LinAuthors Info & Claims

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Pages 748 - 757

https://doi.org/10.1145/3477314.3507091

Published: 06 May 2022 Publication History

PDF eReader

Abstract

One of the most distinctive characteristics in biological evolution is to not only learn and reinforce knowledge from prior experience, but also develop a solution in (pseudo) real-time for future events by studying the past choices. Inspired by this observation, we aim at developing a systematic methodology of dynamically learning effective control policy for robotic manipulators that deftly avoid dynamic obstacles in fast-changing environments. Unfortunately, dynamical obstacles present time-varying statistical sensory irregularities, making learning based on prior experience much less productive. Furthermore, off-the-shelf policy gradient methods often become computationally expensive, and sometimes intractable, to adapt existing policy for dynamically changing environments. In this paper, to mitigate both of these challenges, we propose to use vector-valued kernel embedding (instead of parameter vectors) to represent policy distribution as features in non-decreasing Euclidean space. Furthermore, we develop policy search algorithm over Bayesian posterior estimation derived from inner-product of a priori Gaussian kernels, allowing the search space to be defined as high (possibly infinite) dimensional Reproducing Kernel Hilbert Space (RKHS).

Our empirical results have shown that our proposed method performs optimally in a collaborative multi-robot setting, where two robot arms can manipulate in dynamic real-world environment, incrementally modifying their motion plan, to maintain a smooth, collision-free manipulation. In particular, comparing against a state-of-the-art DDPG (Deep Deterministic Policy Gradient)-based obstacle avoidance scheme as the baseline, our DRL (Developmental Reinforcement Learning) agent can not only effectively avoid dynamically generated obstacles while achieving its control objective, but do so with ~ 25 times faster in learning performance. A video demo of our simulated and real-setting experimentation has been added in YouTube link https://youtu.be/GMM5V0eBQCs.

References

[1]

Bret D Elderd and Tom EX Miller. 2016. Quantifying demographic uncertainty: Bayesian methods for integral projection models. Ecological Monographs 86, 1 (2016), 125--144.

Abstract

References

Index Terms

Recommendations

A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation

Multi-Robot Cooperative Navigation in Dynamic Environments using Deep Reinforcement Learning in ROS

Policy ensemble gradient for continuous control problems in deep reinforcement learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations