skip to main content
research-article

A structured prediction approach for robot imitation learning

Published: 21 February 2024 Publication History

Abstract

We propose a structured prediction approach for robot imitation learning from demonstrations. Among various tools for robot imitation learning, supervised learning has been observed to have a prominent role. Structured prediction is a form of supervised learning that enables learning models to operate on output spaces with complex structures. Through the lens of structured prediction, we show how robots can learn to imitate trajectories belonging to not only Euclidean spaces but also Riemannian manifolds. Exploiting ideas from information theory, we propose a class of loss functions based on the f-divergence to measure the information loss between the demonstrated and reproduced probabilistic trajectories. Different types of f-divergence will result in different policies, which we call imitation modes. Furthermore, our approach enables the incorporation of spatial and temporal trajectory modulation, which is necessary for robots to be adaptive to the change in working conditions. We benchmark our algorithm against state-of-the-art methods in terms of trajectory reproduction and adaptation. The quantitative evaluation shows that our approach outperforms other algorithms regarding both accuracy and efficiency. We also report real-world experimental results on learning manifold trajectories in a polishing task with a KUKA LWR robot arm, illustrating the effectiveness of our algorithmic framework.

References

[1]
Abbeel P and Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1.
[2]
Abbeel P, Coates A, and Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. The International Journal of Robotics Research 29(13): 1608–1639.
[3]
Absil PA, Mahony R, and Sepulchre R (2009) Optimization Algorithms on Matrix Manifolds. Princeton, NJ: Princeton University Press.
[4]
Ahmadzadeh SR and Chernova S (2018) Trajectory-based skill learning using generalized cylinders. Frontiers in Robotics and AI 5: 132.
[5]
Ajoudani A, Fang C, and Tsagarakis N, et al. (2018) Reduced-complexity representation of the human arm active endpoint stiffness for supervisory control of remote manipulation. The International Journal of Robotics Research 37(1): 155–167.
[6]
Álvarez MA, Rosasco L, and Lawrence ND (2012) Kernels for vector-valued functions: a review. Foundations and Trends® in Machine Learning 4(3): 195–266.
[7]
Amanhoud W, Khoramshahi M, and Billard A (2019) A dynamical system approach to motion and force generation in contact tasks. Robotics: Science and Systems (RSS).
[8]
Arduengo M, Colomé A, and Borràs J, et al. (2021) Task-adaptive robot learning from demonstration with Gaussian process models under replication. IEEE Robotics and Automation Letters 6(2): 966–973.
[9]
Bahl S, Mukadam M, and Gupta A, et al. (2020) Neural dynamic policies for end-to-end sensorimotor learning. Advances in Neural Information Processing Systems 33: 5058–5069.
[10]
Beik-Mohammadi H, Hauberg S, and Arvanitidis G, et al. (2021) Learning Riemannian manifolds for geodesic motion skills. Robotics: Science and Systems.
[11]
Billard A, Calinon S, and Dillmann R, et al. (2008) Robot Programming by Demonstration. Berlin, Heidelberg: Springer handbook of robotics, 1371–1394.
[12]
Bonalli R, Bylard A, and Cauligi A, et al. (2019) Trajectory optimization on manifolds: a theoretically-guaranteed embedded sequential convex programming approach Robotics: Science and Systems.
[13]
Calinon S (2020) Gaussians on riemannian manifolds: applications for robot learning and adaptive control. IEEE Robotics & Automation Magazine 27(2): 33–45.
[14]
Cheng CA, Yan X, and Wagener N, et al. (2018) Fast policy learning through imitation and reinforcement. Uncertainty in Artificial Intelligence. (pp. 2192–2202):Proceedings of Machine Learning Research (PMLR).
[15]
Ciliberto C, Rosasco L, and Rudi A (2016) A consistent regularization approach for structured prediction. Advances in Neural Information Processing Systems 29. California, CA, USA: Neural Information Processing Systems Foundation, Inc. (NeurIPS). (pp. 4412–4420)
[16]
Ciliberto C, Rosasco L, and Rudi A (2020) A general framework for consistent structured prediction with implicit loss embeddings. Journal of Machine Learning Research 21(98): 1–67.
[17]
Darvish K, Penco L, and Ramos J, et al. (2023) Teleoperation of humanoid robots: a survey. IEEE Transactions on Robotics 39(3): 1706–1727.
[18]
Duan A, Camoriano R, and Ferigo D, et al. (2018) Constrained DMPs for feasible skill learning on humanoid robots. In: 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids). IEEE, pp. 1–6.
[19]
Duan A, Camoriano R, and Ferigo D, et al. (2019) Learning to sequence multiple tasks with competing constraints. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 2672–2678.
[20]
Duan A, Camoriano R, and Ferigo D, et al. (2020) Learning to avoid obstacles with minimal intervention control. Frontiers in Robotics and AI 7: 60.
[21]
Duan A, Victorova M, and Zhao J, et al. (2022) Ultrasound-guided assistive robots for scoliosis assessment with optimization-based control and variable impedance. IEEE Robotics and Automation Letters 7(3): 8106–8113.
[22]
Figueroa N and Billard A (2018) A physically-consistent bayesian non-parametric mixture model for dynamical system learning. In:2nd Annual Conference on Robot Learning, CoRL 2018, Zürich, Switzerland, 29–31 October 2018, Proceedings, volume 87.
[23]
Florence P, Lynch C, and Zeng A, et al. (2022) Implicit behavioral cloning. In: Conference on Robot Learning. PMLR, 158–168.
[24]
Ghasemipour SKS, Zemel R, and Gu S (2020) A divergence minimization perspective on imitation learning methods. In: Conference on Robot Learning. PMLR, 1259–1277.
[25]
Ho J and Ermon S (2016) Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, 4565–4573.
[26]
Huang Y, Rozo L, and Silvério J, et al. (2019) Kernelized movement primitives. The International Journal of Robotics Research 38(7): 833–852.
[27]
Ijspeert AJ, Nakanishi J, and Hoffmann H, et al. (2013) Dynamical movement primitives: learning attractor models for motor behaviors. Neural Computation 25(2): 328–373.
[28]
Ke L, Choudhury S, and Barnes M, et al. (2021) Imitation learning as f-divergence minimization. In: Algorithmic Foundations of Robotics XIV: Proceedings of the Fourteenth Workshop on the Algorithmic Foundations of Robotics 14. Springer International Publishing, 313–329.
[29]
Khansari-Zadeh SM and Billard A (2011) Learning stable nonlinear dynamical systems with Gaussian mixture models. IEEE Transactions on Robotics 27(5): 943–957.
[30]
Khoramshahi M, Henriks G, and Naef A, et al. (2020) Arm-hand motion-force coordination for physical interactions with non-flat surfaces using dynamical systems: toward compliant robotic massage. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE.
[31]
Kingston Z, Moll M, and Kavraki LE (2019) Exploring implicit spaces for constrained sampling-based planning. The International Journal of Robotics Research 38(10–11): 1151–1178.
[32]
Kober J and Peters J (2014) Policy search for motor primitives in robotics. In: Learning Motor Skills. Springer, 83–117.
[33]
Kronander K and Billard A (2016) Passive interaction control with dynamical systems. IEEE Robotics and Automation Letters 1(1): 106–113.
[34]
Kulvicius T, Ning K, and Tamosiunaite M, et al. (2012) Joining movement sequences: modified dynamic movement primitives for robotics applications exemplified on handwriting. IEEE Transactions on Robotics 28(1): 145–157.
[35]
Mroueh Y, Poggio T, and Rosasco L, et al. (2012) Multiclass learning with simplex coding. In: Advances in Neural Information Processing Systems, 2789–2797.
[36]
Osa T, Pajarinen J, and Neumann G, et al. (2018) An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics 7(1-2): 1–179.
[37]
Paraschos A, Daniel C, and Peters JR, et al. (2013) Probabilistic movement primitives. In: Advances in Neural Information Processing Systems, 2616–2624.
[38]
Pardo L (2018) Statistical Inference Based on Divergence Measures. Boca Raton: Chapman and Hall/CRC.
[39]
Peng XB, Abbeel P, and Levine S, et al. (2018) Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics (TOG) 37(4): 143.
[40]
Peters J, Mulling K, and Altun Y (2010) Relative entropy policy search. In: Twenty-Fourth AAAI Conference on Artificial Intelligence.
[41]
Pomerleau DA (1989) Alvinn: an autonomous land vehicle in a neural network. In: Advances in Neural Information Processing Systems, 305–313.
[42]
Ratliff ND, Silver D, and Bagnell JA (2009) Learning to search: functional gradient techniques for imitation learning. Autonomous Robots 27(1): 25–53.
[43]
Ravichandar H, Polydoros AS, and Chernova S, et al. (2020) Recent advances in robot learning from demonstration. Annual Review of Control, Robotics, and Autonomous Systems 3: 297–330.
[44]
Reiner B, Ertel W, and Posenauer H, et al. (2014) LAT: a simple learning from demonstration method. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 4436–4441.
[45]
Ross S, Gordon G, and Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635.
[46]
Ruan S, Poblete KL, and Wu H, et al. (2023) Efficient path planning in narrow passages for robots with ellipsoidal components. IEEE Transactions on Robotics 39(1): 110–127.
[47]
Rudi A, Camoriano R, and Rosasco L (2015) Less is more: Nyström computational regularization. Advances in Neural Information Processing Systems 28: 1657–1665.
[48]
Rudi A, Ciliberto C, and Marconi G, et al. (2018) Manifold structured prediction. In: Advances in Neural Information Processing Systems, 5610–5621.
[49]
Schaal S (1999) Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3(6): 233–242.
[50]
Schneider M and Ertel W (2010) Robot learning by demonstration with local Gaussian process regression. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 255–260.
[51]
Schulman J, Levine S, and Abbeel P, et al. (2015) Trust region policy optimization. In: International Conference on Machine Learning, 1889–1897.
[52]
Simo-Serra E, Torras C, and Moreno-Noguer F (2017) 3D human pose tracking priors using geodesic mixture models. International Journal of Computer Vision 122(2): 388–408.
[53]
Stulp F and Sigaud O (2015) Many regression algorithms, one unified model: a review. Neural Networks 69: 60–79.
[54]
Swamy G, Choudhury S, and Bagnell JA, et al. (2021) Of moments and matching: a game-theoretic framework for closing the imitation gap. In: International Conference on Machine Learning. PMLR, pp. 10022–10032.
[55]
Traversaro S, Brossette S, and Escande A, et al. (2016) Identification of fully physical consistent inertial parameters using optimization on manifolds. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 5446–5451.
[56]
Yang C, Zeng C, and Fang C, et al. (2018) A DMPs-based framework for robot learning and generalization of humanlike variable impedance skills. IEEE/ASME Transactions on Mechatronics 23(3): 1193–1203.
[57]
Zahra O, Tolu S, and Zhou P, et al. (2022) A bio-inspired mechanism for learning robot motion from mirrored human demonstrations. Frontiers in Neurorobotics 16: 826410.
[58]
Zeestraten MJ, Havoutis I, and Calinon S, et al. (2017a) Learning task-space synergies using Riemannian geometry. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 73–78.
[59]
Zeestraten MJ, Havoutis I, and Silvério J, et al. (2017b) An approach for imitation learning on Riemannian manifolds. IEEE Robotics and Automation Letters 2(3): 1240–1247.
[60]
Zhou Y and Asfour T (2017) Task-oriented generalization of dynamic movement primitive. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 3202–3209.
[61]
Ziebart BD, Maas A, and Bagnell JA, et al. (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, AAAI’08. AAAI Press. ISBN 9781577353683, p. 1433–1438.

Cited By

View all

Index Terms

  1. A structured prediction approach for robot imitation learning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image International Journal of Robotics Research
          International Journal of Robotics Research  Volume 43, Issue 2
          Feb 2024
          126 pages

          Publisher

          Sage Publications, Inc.

          United States

          Publication History

          Published: 21 February 2024

          Author Tags

          1. Imitation learning
          2. structured prediction
          3. learning and adaptive systems
          4. kernel methods
          5. Riemannian manifolds

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 05 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media