research-article

How to train your robot with deep reinforcement learning: : lessons we have learned

Authors:

Mrinal Kalakrishnan,

Sergey LevineAuthors Info & Claims

The International Journal of Robotics Research, Volume 40, Issue 4-5

Pages 698 - 721

https://doi.org/10.1177/0278364920987859

Published: 01 April 2021 Publication History

Abstract

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.

References

[1]

Abadi M, Agarwal A, and Barham P, et al. (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. http://tensorflow.org/.

[2]

Achiam J, Held D, Tamar A, and Abbeel P (2017) Constrained policy optimization. In: International Conference on Machine Learning.

[3]

Agarwal R, Schuurmans D, and Norouzi M (2020) An optimistic perspective on offline reinforcement learning. In: International Conference on Machine Learning.

[4]

Altman E (1999) Constrained Markov Decision Processes, Vol. 7. Boca Raton, FL: CRC Press.

[5]

Andrychowicz M, Wolski F, and Ray A, et al. (2017) Hindsight experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058.

[6]

Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, and Munos R (2016) Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, pp. 1471–1479.

[7]

Bohez S, Abdolmaleki A, Neunert M, Buchli J, Heess N, and Hadsell R (2019) Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623.

[8]

Bojarski M, Del Testa D, and Dworakowski D, et al. (2016) End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.

[9]

Bousmalis K, Irpan A, and Wohlhart P, et al. (2018) Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In: International Conference on Robotics and Automation. IEEE, pp. 4243–4250.

[10]

Bousmalis K, Silberman N, Dohan D, Erhan D, and Krishnan D (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Conference on Computer Vision and Pattern Recognition.

[11]

Brockman G, Cheung V, and Pettersson L, et al. (2016) OpenAI Gym. arXiv preprint arXiv:1606.01540.

[12]

Burda Y, Edwards H, Storkey A, and Klimov O (2019) Exploration by random network distillation. In: International Conference on Learning Representations.

[13]

Byravan A, Leeb F, Meier F, and Fox D (2018) SE3-Pose-Nets: Structured deep dynamics models for visuomotor control. In: International Conference on Robotics and Automation.

[14]

Cabi S, Colmenarejo SG, and Novikov A, et al. (2019) A framework for data-driven robotics. arXiv preprint arXiv:1909.12200.

[15]

Chebotar Y, Hausman K, Zhang M, Sukhatme G, Schaal S, and Levine S (2017a) Combining model-based and model-free updates for trajectory-centric reinforcement learning. In: International Conference on Machine Learning, pp. 703–711.

[16]

Chebotar Y, Kalakrishnan M, Yahya A, Li A, Schaal S, and Levine S (2017b) Path integral guided policy search. In: International Conference on Robotics and Automation. IEEE, pp. 3381–3388.

[17]

Chen Z, Badrinarayanan V, Lee CY, and Rabinovich A (2018) GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: International Conference on Machine Learning.

[18]

Chiang HTL, Faust A, Fiser M, and Francis A (2019) Learning navigation behaviors end-to-end with AutoRL. IEEE Robotics and Automation Letters 4(2): 2007–2014.

[19]

Clavera I, Nagabandi A, and Liu S, et al. (2019) Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: International Conference on Learning Representations.

[20]

Clegg A, Yu W, Tan J, Liu CK, and Turk G (2018) Learning to dress: Synthesizing human dressing motion via deep reinforcement learning. In: SIGGRAPH Asia 2018 Technical Papers. New York: ACM Press.

[21]

Coumans E and Bai Y (2016) PyBullet, a Python module for physics simulation, games, robotics and machine learning. http://pybullet.org/.

[22]

Daniel C, Neumann G, Kroemer O, and Peters J (2013) Learning sequential motor tasks. In: International Conference on Robotics and Automation. IEEE.

[23]

De A (2017) Modular Hopping and Running via Parallel Composition. PhD Thesis, University of Pennsylvania.

[24]

De Boer PT, Kroese DP, Mannor S, and Rubinstein RY (2005) A tutorial on the cross-entropy method. Annals of Operations Research 134(1): 19–67.

[25]

Deisenroth M and Rasmussen C (2011) Pilco: A model-based and data-efficient approach to policy search. In: International Conference on Machine Learning. Omnipress, pp. 465–472.

Digital Library

[26]

Deisenroth MP, Neumann G, and Peters J (2013) A survey on policy search for robotics. In: Foundations and Trends in Robotics, Vol. 2. Now Publishers, Inc., pp. 1–142.

[27]

Deng J, Dong W, Socher R, Li LJ, Li K, and Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition. IEEE, pp. 248–255.

[28]

Devlin J, Chang MW, Lee K, and Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT.

[29]

Draeger A, Engell S, and Ranke H (1995) Model predictive control using neural networks. Control Systems Magazine 15(5): 61–66.

[30]

Duan Y, Andrychowicz M, and Stadie B, et al. (2017) One-shot imitation learning. In: Advances in Neural Information Processing Systems, pp. 1087–1098.

[31]

Duan Y, Schulman J, Chen X, Bartlett PL, Sutskever I, and Abbeel P (2016) RL²: Fast reinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779.

[32]

Ebert F, Finn C, Dasari S, Xie A, Lee A, and Levine S (2018) Visual foresight: Model-based deep reinforcement learning for vision-based robotic control. arXiv preprint arXiv:1812.00568.

[33]

Eysenbach B, Gu S, Ibarz J, and Levine S (2018) Leave no trace: Learning to reset for safe and autonomous reinforcement learning. In: International Conference on Learning Representations.

[34]

Finn C, Abbeel P, and Levine S (2017a) Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning.

[35]

Finn C and Levine S (2017) Deep visual foresight for planning robot motion. In: International Conference on Robotics and Automation. IEEE.

[36]

Finn C, Levine S, and Abbeel P (2016a) Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp. 49–58.

Digital Library

[37]

Finn C, Tan XY, Duan Y, Darrell T, Levine S, and Abbeel P (2016b) Deep Spatial Autoencoders for Visuomotor Learning. In: International Conference on Robotics and Automation. IEEE, pp. 512–519.

[38]

Finn C, Yu T, Zhang T, Abbeel P, and Levine S (2017b) One-shot visual imitation learning via meta-learning. Proceedings of Machine Learning Research 78: 357–368.

[39]

Fox R, Pakman A, and Tishby N (2016) Taming the noise in reinforcement learning via soft updates. In: Conference on Uncertainty in Artificial Intelligence. AUAI Press.

[40]

Fu J, Co-Reyes J, and Levine S (2017) Ex2: Exploration with exemplar models for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2577–2587.

[41]

Fujimoto S, Meger D, and Precup D (2019) Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning.

[42]

Fujimoto S, van Hoof H, and Meger D (2018) Addressing function approximation error in actor–critic methods. In: International Conference on Machine Learning.

[43]

Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, and Pallett DS (1993) DARPA TIMIT acoustic–phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report93.

[44]

Ghadirzadeh A, Maki A, Kragic D, and Björkman M (2017) Deep predictive policy training using reinforcement learning. In: International Conference on Intelligent Robots and Systems. IEEE, pp. 2351–2358.

[45]

Goodfellow I, Pouget-Abadie J, and Mirza M, et al. (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680.

Digital Library

[46]

Gregor K, Rezende DJ, and Wierstra D (2017) Variational intrinsic control. In: International Conference on Learning Representations, Workshop Track Proceedings.

[47]

Gu S, Holly E, Lillicrap T, and Levine S (2017) Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: International Conference on Robotics and Automation. IEEE, pp. 3389–3396.

[48]

Gu S, Lillicrap T, Sutskever I, and Levine S (2016) Continuous deep Q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838.

Digital Library

[49]

Ha S, Kim J, and Yamane K (2018) Automated deep reinforcement learning environment for hardware of a modular legged robot. In: International Conference on Ubiquitous Robots. IEEE.

[50]

Haarnoja T, Ha S, Zhou A, Tan J, Tucker G, and Levine S (2019) Learning to walk via deep reinforcement learning. In: Robotics: Science and Systems.

[51]

Haarnoja T, Pong V, Zhou A, Dalal M, Abbeel P, and Levine S (2018a) Composable deep reinforcement learning for robotic manipulation. In: International Conference on Robotics and Automation. IEEE.

[52]

Haarnoja T, Tang H, Abbeel P, and Levine S (2017) Reinforcement learning with deep energy-based policies. In: International Conference on Machine Learning, pp. 1352–1361.

[53]

Haarnoja T, Zhou A, Abbeel P, and Levine S (2018b) Soft actor–critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning.

[54]

Haarnoja T, Zhou A, and Hartikainen K, et al. (2018c) Soft actor–critic algorithms and applications. arXiv preprint arXiv:1812.05905.

[55]

Hämäläinen P, Rajamäki J, and Liu CK (2015) Online control of simulated humanoids using particle belief propagation. ACM Transactions on Graphics 34(4): 81.

[56]

Hausman K, Chebotar Y, Kroemer O, Sukhatme GS, and Schaal S (2017) Regrasping using tactile perception and supervised policy learning. In: AAAI Symposium on Interactive Multi-Sensory Object Perception for Embodied Agents.

[57]

Heess N, Sriram S, and Lemmon J, et al. (2017) Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286.

[58]

Hester T, Vecerk M, and Pietquin O, et al. (2018) Deep Q-learning from demonstrations. In: Conference on Artificial Intelligence.

[59]

Hwangbo J, Lee J, and Dosovitskiy A, et al. (2019) Learning agile and dynamic motor skills for legged robots. Science Robotics 4: 26.

[60]

Ijspeert A, Nakanishi J, and Schaal S (2002) Movement imitation with nonlinear dynamical systems in humanoid robots. In: International Conference on Robotics and Automation. IEEE.

[61]

Irpan A (2018) Deep reinforcement learning doesn’t work yet. https://www.alexirpan.com/2018/02/14/rl-hard.html.

[62]

Iscen A, Caluwaerts K, and Tan J, et al. (2018) Policies modulating trajectory generators. In: Conference on Robot Learning.

[63]

Jabri A, Hsu K, Gupta A, Eysenbach B, Levine S, and Finn C (2019) Unsupervised curricula for visual meta-reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 10519–10530.

[64]

Jain D, Li A, Singhal S, Rajeswaran A, Kumar V, and Todorov E (2019) Learning deep visuomotor policies for dexterous hand manipulation. In: International Conference on Robotics and Automation. IEEE, pp. 3636–3643.

[65]

James S, Bloesch M, and Davison AJ (2018) Task-embedded control networks for few-shot imitation learning. In: Conference on Robot Learning.

[66]

James S, Davison AJ, and Johns E (2017) Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task. In: Conference on Robot Learning.

[67]

James S, Wohlhart P, and Kalakrishnan M, et al. (2019) Sim-to-real via sim-to-sim: Data-efficient robotic grasping via randomized-to-canonical adaptation networks. In: Conference on Computer Vision and Pattern Recognition.

[68]

Johannink T, Bahl S, and Nair A, et al. (2019) Residual reinforcement learning for robot control. In: International Conference on Robotics and Automation. IEEE.

Digital Library

[69]

Kakade SM (2002) A natural policy gradient. In: Advances in Neural Information Processing Systems, pp. 1531–1538.

Digital Library

[70]

Kalashnikov D, Irpan A, and Pastor P, et al. (2018) Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning, Proceedings of Machine Learning Research.

[71]

Khadka S, Majumdar S, and Nassar T, et al. (2019) Collaborative evolutionary reinforcement learning. In: International Conference on Machine Learning.

[72]

Kober J, Bagnell JA, and Peters J (2013) Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11): 1238–1274.

Digital Library

[73]

Kohl N and Stone P (2004) Policy gradient reinforcement learning for fast quadrupedal locomotion. In: International Conference on Robotics and Automation. IEEE.

[74]

Konidaris G, Kuindersma S, Grupen R, and Barto A (2012) Robot learning from demonstration by constructing skill trees. The International Journal of Robotics Research 31(3): 360–375.

Digital Library

[75]

Kroemer O, Niekum S, and Konidaris G (2019) A review of robot learning for manipulation: Challenges, representations, and algorithms. CoRR abs/1907.03146.

[76]

Kumar A, Fu J, Soh M, Tucker G, and Levine S (2019) Stabilizing off-policy Q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.

[77]

Kurutach T, Clavera I, Duan Y, Tamar A, and Abbeel P (2018) Model-ensemble trust-region policy optimization. In: International Conference on Learning Representations.

[78]

Kuznetsova A, Rom H, and Alldrin N, et al. (2020) The open images dataset V4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision 128(7): 1956–1981.

[79]

Lee J, Hwangbo J, Wellhausen L, Koltun V, and Hutter M (2020) Learning quadrupedal locomotion over challenging terrain. Science Robotics 5(47): eabc5986.

[80]

Lee MA, Zhu Y, and Srinivasan K, et al. (2019) Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks. In: International Conference on Robotics and Automation. IEEE.

[81]

Lenz I, Knepper RA, and Saxena A (2015) DeepMPC: Learning deep latent features for model predictive control. In: Robotics: Science and Systems, Rome, Italy.

[82]

Levine S and Abbeel P (2014) Learning neural network policies with guided policy search under unknown dynamics. In: Advances in Neural Information Processing Systems, pp. 1071–1079.

Digital Library

[83]

Levine S, Finn C, Darrell T, and Abbeel P (2016) End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1): 1334–1373.

Digital Library

[84]

Levine S and Koltun V (2013) Guided policy search. In: International Conference on Machine Learning.

Digital Library

[85]

Levine S, Pastor P, Krizhevsky A, Ibarz J, and Quillen D (2018) Learning hand–eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research 37(4–5): 421–436.

[86]

Levine S, Wagener N, and Abbeel P (2015) Learning contact-rich manipulation skills with guided policy search. In: International Conference on Robotics and Automation.

[87]

Lillicrap TP, Hunt JJ, and Pritzel A, et al. (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

[88]

Mahajan D, Girshick R, and Ramanathan V, et al. (2018) Exploring the limits of weakly supervised pretraining. In: European Conference on Computer Vision.

Digital Library

[89]

Mahler J, Matl M, Liu X, Li A, Gealy D, and Goldberg K (2018) Dex-Net 3.0: Computing robust vacuum suction grasp targets in point clouds using a new analytic model and deep learning. In: International Conference on Robotics and Automation.

[90]

Mania H, Guy A, and Recht B (2018) Simple random search of static linear policies is competitive for reinforcement learning. In: Advances in Neural Information Processing Systems.

[91]

Manschitz S, Kober J, Gienger M, and Peters J (2014) Learning to sequence movement primitives from demonstrations. In: International Conference on Intelligent Robots and Systems.

[92]

Mnih V, Kavukcuoglu K, and Silver D, et al. (2013) Playing Atari with deep reinforcement learning. In: Advances in Neural Information Processing Systems, Deep Learning Workshop.

[93]

Montgomery W, Ajay A, Finn C, Abbeel P, and Levine S (2017) Reset-free guided policy search: Efficient deep reinforcement learning with stochastic initial states. In: International Conference on Robotics and Automation. IEEE, pp. 3373–3380.

[94]

Montgomery WH and Levine S (2016) Guided policy search via approximate mirror descent. In: Advances in Neural Information Processing Systems, pp. 4008–4016.

[95]

Morrison D, Corke P, and Leitner J (2018a) Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach. In: Robotics: Science and Systems.

[96]

Morrison D, Tow AW, and McTaggart M, et al. (2018b) Cartman: The low-cost Cartesian Manipulator that won the Amazon Robotics Challenge. In: International Conference on Robotics and Automation. IEEE.

[97]

Nagabandi A, Konolige K, Levine S, and Kumar V (2020) Deep dynamics models for learning dexterous manipulation. In: Conference on Robot Learning.

[98]

Nagabandi A, Yang G, and Asmar T, et al. (2018) Learning image-conditioned dynamics models for control of underactuated legged millirobots. In: International Conference on Intelligent Robots and Systems. IEEE, pp. 4606–4613.

[99]

Nair A, McGrew B, Andrychowicz M, Zaremba W, and Abbeel P (2018) Overcoming exploration in reinforcement learning with demonstrations. In: International Conference on Robotics and Automation. IEEE, pp. 6292–6299.

Digital Library

[100]

Osband I, Blundell C, Pritzel A, and Van Roy B (2016) Deep exploration via bootstrapped DQN. In: Advances in Neural Information Processing Systems, pp. 4026–4034.

Digital Library

[101]

Parisotto E, Ba J, and Salakhutdinov R (2016) Actor-Mimic: Deep multitask and transfer reinforcement learning. CoRR .

[102]

Paszke A, Gross S, and Chintala S, et al. (2017) Automatic differentiation in PyTorch. In: Advances in Neural Information Processing Systems Workshop on Autodiff.

[103]

Pathak D, Agrawal P, Efros AA, and Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: Conference on Computer Vision and Pattern Recognition Workshops. IEEE, pp. 16–17.

[104]

Peng XB, Abbeel P, Levine S, van de, and Panne M (2018a) DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics 37(4): 143.

Digital Library

[105]

Peng XB, Andrychowicz M, Zaremba W, and Abbeel P (2018b) Sim-to-real transfer of robotic control with dynamics randomization. In: International Conference on Robotics and Automation. IEEE.

Digital Library

[106]

Peng XB, Kumar A, Zhang G, and Levine S (2019) Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177.

[107]

Peters J, Mülling K, and Altün Y (2010) Relative entropy policy search. In: AAAI Conference on Artificial Intelligence.

Digital Library

[108]

Peters J and Schaal S (2006) Policy gradient methods for robotics. In: International Conference on Intelligent Robots and Systems. IEEE, pp. 2219–2225.

[109]

Peters J and Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4): 682–697.

Digital Library

[110]

Pinto L, Davidson J, Sukthankar R, and Gupta A (2017) Robust adversarial reinforcement learning. In: International Conference on Machine Learning.

[111]

Pinto L and Gupta A (2016) Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours. In: International Conference on Robotics and Automation. IEEE.

[112]

Raibert MH (1986) Legged Robots That Balance. Cambridge, MA: MIT Press.

[113]

Rakelly K, Zhou A, Finn C, Levine S, and Quillen D (2019) Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: International Conference on Machine Learning.

[114]

Rao K, Harris C, Irpan A, Levine S, Ibarz J, and Khansari M (2020) RL-CycleGAN: Reinforcement learning aware simulation-to-real. In: Conference on Computer Vision and Pattern Recognition.

[115]

Rawlik K, Toussaint M, and Vijayakumar S (2013) On stochastic optimal control and reinforcement learning by approximate inference. In: International Joint Conference on Artificial Intelligence.

[116]

Riedmiller M, Hafner R, and Lampe T, et al. (2018) Learning by playing solving sparse reward tasks from scratch. In: International Conference on Machine Learning.

[117]

Ross S, Gordon G, and Bagnell D (2011) A reduction of imitation learning and structured prediction to no-regret online learning. In: International Conference on Artificial Intelligence and Statistics.

[118]

Rusu AA, Colmenarejo SG, and Gulcehre C, et al. (2015) Policy distillation. arXiv preprint arXiv:1511.06295.

[119]

Sadeghi F and Levine S (2017) CAD2RL: Real single-image flight without a single real image. In: Robotics: Science and Systems.

[120]

Schaal S (2006) Dynamic movement primitives-a framework for motor control in humans and humanoid robotics. In: Adaptive motion of animals and machines. Berlin: Springer, pp. 261–280.

[121]

Schaul T, Borsa D, Modayil J, and Pascanu R (2019) Ray interference: A source of plateaus in deep reinforcement learning. In: Multidisciplinary Conference on Reinforcement Learning and Decision Making.

[122]

Schoettler G, Nair A, and Luo J, et al. (2019) Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards. In: International Conference on Intelligent Robots and Systems.

[123]

Schulman J, Levine S, Abbeel P, Jordan M, and Moritz P (2015) Trust region policy optimization. In: International Conference on Machine Learning.

Digital Library

[124]

Schulman J, Wolski F, Dhariwal P, Radford A, and Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

[125]

Schwab D, Springenberg TJ, and Martins FM, et al. (2019) Simultaneously learning vision and feature-based control policies for real-world ball-in-a-cup. In: Robotics: Science and Systems.

[126]

Sener O and Koltun V (2018) Multi-task learning as multi-objective optimization. In: Advances in Neural Information Processing Systems, pp. 527–538.

[127]

Shrivastava A, Pfister T, Tuzel O, Susskind J, Wang W, and Webb R (2017) Learning from simulated and unsupervised images through adversarial training. In: Conference on Computer Vision and Pattern Recognition.

[128]

Silver T, Allen K, Tenenbaum J, and Kaelbling L (2018) Residual policy learning. arXiv preprint arXiv:1812.06298.

[129]

Singh A, Yang L, Finn C, and Levine S (2019) End-to-end robotic reinforcement learning without reward engineering. In: Robotics: Science and Systems.

[130]

Sünderhauf N, Brock O, and Scheirer WJ, et al. (2018) The limits and potentials of deep learning for robotics. The International Journal of Robotics Research 37(4–5): 405–420.

[131]

Tan J, Gu Y, Liu CK, and Turk G (2014) Learning bicycle stunts. ACM Transactions on Graphics 33(4): 50.

[132]

Tan J, Zhang T, and Coumans E, et al. (2018) Sim-to-real: Learning agile locomotion for quadruped robots. In: Robotics: Science and Systems.

[133]

Tang D, Agarwal A, O’Brien D, and Meyer M (2010) Overlapping experiment infrastructure: More, better, faster experimentation. In: International Conference on Knowledge Discovery and Data Mining. New York: ACM Press.

[134]

Tedrake R, Zhang TW, and Seung HS (2015) Learning to walk in 20 minutes. In: Workshop on Adaptive and Learning Systems.

[135]

ten Pas A, Gualtieri M, Saenko K, and Platt R (2017) Grasp pose detection in point clouds. The International Journal of Robotics Research 36(13–14): 1455–1473.

Digital Library

[136]

Thananjeyan B, Balakrishna A, and Nair S, et al. (2020) Recovery RL: Safe reinforcement learning with learned recovery zones. arXiv preprint arXiv:2010.15920.

[137]

Tobin J, Fong R, Ray A, Schneider J, Zaremba W, and Abbeel P (2017) Domain randomization for transferring deep neural networks from simulation to the real world. In: International Conference on Intelligent Robots and Systems. IEEE.

Digital Library

[138]

Toussaint M (2009) Robot trajectory optimization using approximate inference. In: International Conference on Machine Learning. New York: ACM Press, pp. 1049–1056.

Digital Library

[139]

Večerík M, Hester T, and Scholz J, et al. (2017) Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817.

[140]

Viereck U, ten Pas A, Saenko K, and Platt R (2017) Learning a visuomotor controller for real world robotic grasping using simulated depth images. In: Conference on Robot Learning.

[141]

Wu YH, Charoenphakdee N, Bao H, Tangkaratt V, and Sugiyama M (2019) Imitation learning from imperfect demonstration. In: International Conference on Machine Learning, Proceedings of Machine Learning Research.

[142]

Xiao T, Jang E, and Kalashnikov D, et al. (2020) Thinking while moving: Deep reinforcement learning with concurrent control. In: International Conference on Learning Representations.

[143]

Xie A, Ebert F, Levine S, and Finn C (2019) Improvisation through physical understanding: using novel objects as tools with visual foresight. arXiv preprint arXiv:1904.05538.

[144]

Xie A, Singh A, Levine S, and Finn C (2018) Few-shot goal inference for visuomotor learning and planning. Proceedings of Machine Learning Research 87: 40–52.

[145]

Xie Q, Luong MT, Hovy E, and Le QV (2020) Self-training with noisy student improves ImageNet classification. In: Conference on Computer Vision and Pattern Recognition.

[146]

Yang Y, Caluwaerts K, Iscen A, Tan J, and Finn C (2019) NoRML: No-reward meta learning. In: AAMAS.

[147]

Yang Y, Caluwaerts K, Iscen A, Zhang T, Tan J, and Sindhwani V (2020) Data efficient reinforcement learning for legged robots. In: Conference on Robot Learning.

[148]

Yen-Chen L, Bauza M, and Isola P (2020) Experience-embedded visual foresight. In: Conference on Robot Learning.

[149]

Yu K and Rodriguez A (2018) Realtime state estimation with tactile and visual sensing. Application to planar manipulation. In: International Conference on Robotics and Automation. IEEE.

[150]

Yu W, Tan J, Bai Y, Coumans E, and Ha S (2019) Learning fast adaptation with meta strategy optimization. arXiv preprint arXiv:1909.12995.

[151]

Yu W, Tan J, Liu CK, and Turk G (2017) Preparing for the unknown: Learning a universal policy with online system identification. In: Robotics: Science and Systems.

[152]

Yu W, Turk G, and Liu CK (2018) Learning symmetric and low-energy locomotion. ACM Transactions on Graphics 37(4): 144.

[153]

Zeng A, Song S, Welker S, Lee J, Rodriguez A, and Funkhouser T (2018) Learning synergies between pushing and grasping with self-supervised deep reinforcement learning. In: International Conference on Intelligent Robots and Systems, pp. 4238–4245.

[154]

Zhu H, Gupta A, Rajeswaran A, Levine S, and Kumar V (2019) Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. In: International Conference on Robotics and Automation. IEEE.

[155]

Ziebart BD, Maas A, Bagnell JA, and Dey AK (2008) Maximum entropy inverse reinforcement learning. In: National Conference on Artificial Intelligence.

Cited By

De Silva RGuo WRuaro NGrishchenko IKruegel CVigna GBalzarotti DXu W(2024)GuideEnricherProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699099(3549-3566)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3699099
Sagar STaparia ASenanayake RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Failures are fated, but can be fadedProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693822(42999-43023)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693822
Psenka MEscontrela AAbbeel PMa YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Learning a diffusion model policy from rewards via Q-score matchingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693742(41163-41182)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693742
Show More Cited By

Index Terms

How to train your robot with deep reinforcement learning: lessons we have learned
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
      1. Robotic autonomy
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
      1. Robotic planning
    2. Planning and scheduling
      1. Robotic planning
  2. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning

Usage of trust region policy optimisation (TRPO) and proximal policy optimisation (PPO) 'children of policy gradient optimisation method' and deep Q-learning network (DQN) in Lidar-based differential robots are proposed using Turtlebot and OpenAI's ...
Robust Deep Reinforcement Learning with Adversarial Attacks
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems

This paper proposes adversarial attacks for Reinforcement Learning (RL). These attacks are then leveraged during training to improve the robustness of RL within robust control framework. We show that this adversarial training of DRL algorithms like Deep ...
Reinforcement Learning in the Multi-Robot Domain

This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research

International Journal of Robotics Research Volume 40, Issue 4-5

Apr 2021

152 pages

ISSN:0278-3649

Issue’s Table of Contents

© The Author(s) 2021.

This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 April 2021

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

60
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

De Silva RGuo WRuaro NGrishchenko IKruegel CVigna GBalzarotti DXu W(2024)GuideEnricherProceedings of the 33rd USENIX Conference on Security Symposium10.5555/3698900.3699099(3549-3566)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.5555/3698900.3699099
Sagar STaparia ASenanayake RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Failures are fated, but can be fadedProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693822(42999-43023)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693822
Psenka MEscontrela AAbbeel PMa YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Learning a diffusion model policy from rewards via Q-score matchingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693742(41163-41182)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693742
Mukherjee SHanna JNowak RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)SaVeRProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693556(36531-36576)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693556
Li HZhang ZLuo WHan CHu YGuo TLiao SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Towards optimal adversarial robust Q-learning with bellman infinity-errorProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693250(29324-29372)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693250
Leahy KMann MSerlin ZSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Run-time task composition with safety semanticsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693119(26241-26258)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693119
Jesson ALu CGupta GBeltran-Velez NFilos AFoerster JGal YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)ReLU to the rescueProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692936(21577-21605)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692936
Wang CChen ZLiu HDastani MSichman JAlechina NDignum V(2024)On the Utility of External Agent Intention Predictor for Human-AI CoordinationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663222(2546-2548)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.5555/3635637.3663222
Zhang YZhang Y(2024)Application of deep learning-based ethnic music therapy for selecting repertoireJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-23089346:2(5405-5414)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.3233/JIFS-230893
Wotte YCalifano FStramigioli S(2024)Optimal potential shaping on SE(3) via neural ordinary differential equations on Lie groupsInternational Journal of Robotics Research10.1177/0278364924125604443:14(2221-2244)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1177/02783649241256044
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents