skip to main content
10.1145/3449639.3459304acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Policy gradient assisted MAP-Elites

Published: 26 June 2021 Publication History

Abstract

Quality-Diversity optimization algorithms such as MAP-Elites, aim to generate collections of both diverse and high-performing solutions to an optimization problem. MAP-Elites has shown promising results in a variety of applications. In particular in evolutionary robotics tasks targeting the generation of behavioral repertoires that highlight the versatility of robots. However, for most robotics applications MAP-Elites is limited to using simple open-loop or low-dimensional controllers. Here we present Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites), a novel algorithm that enables MAP-Elites to efficiently evolve large neural network controllers by introducing a gradient-based variation operator inspired by Deep Reinforcement Learning. This operator leverages gradient estimates obtained from a critic neural network to rapidly find higher-performing solutions and is paired with a traditional genetic variation to maintain a divergent search behavior. The synergy of these operators makes PGA-MAP-Elites an efficient yet powerful algorithm for finding diverse and high-performing behaviors. We evaluate our method on four different tasks for building behavioral repertoires that use deep neural network controllers. The results show that PGA-MAP-Elites significantly improves the quality of the generated repertoires compared to existing methods.

References

[1]
Alberto Alvarez, Steve Dahlskog, Jose Font, and Julian Togelius. 2020. Interactive Constrained MAP-Elites Analysis and Evaluation of the Expressiveness of the Feature Dimensions. arXiv:2003.03377 [cs.AI]
[2]
Alberto Alvarez, Steve Dahlskog, José M. Font, and Julian Togelius. 2019. Empowering Quality Diversity in Dungeon Design with Interactive Constrained MAP-Elites. In CoG. IEEE, 1--8. http://dblp.uni-trier.de/db/conf/cig/cog2019.html#AlvarezDFT19
[3]
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. In Advances in Neural Information Processing Systems, Vol. 2017-Decem. Neural information processing systems foundation, 5049--5059. arXiv:1707.01495
[4]
Richard Bellman. 1954. The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11 1954), 503--515. https://projecteuclid.org:443/euclid.bams/1183519147
[5]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs/1606.01540 (2016). arXiv:1606.01540 http://arxiv.org/abs/1606.01540
[6]
Geoffrey Cideron, Thomas Pierrot, Nicolas Perrin, Karim Beguir, and Olivier Sigaud. 2020. QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning. arXiv:2006.08505 [cs.AI]
[7]
Cédric Colas, Vashisht Madhavan, Joost Huizinga, and Jeff Clune. 2020. Scaling MAP-Elites to Deep Neuroevolution. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (Cancún, Mexico) (GECCO '20). Association for Computing Machinery, New York, NY, USA, 67--75. Implementation: https://github.com/uber-research/Map-Elites-Evolutionary.
[8]
Cedric Colas, Olivier Sigau, and Pierre Yves Oudeyer. 2018. GEP-PG: Decoupling exploration and exploitation in deep reinforcement learning algorithms. In 35th International Conference on Machine Learning, ICML 2018, Vol. 3. International Machine Learning Society (IMLS), 1682--1691. arXiv:1802.05054
[9]
Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 5032--5043.
[10]
Erwin Coumans and Yunfei Bai. 2016--2019. PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org. Implementation: https://github.com/bulletphysics/bullet3/blob/master/examples/pybullet/gym/pybullet_envs/gym_locomotion_envs.py.
[11]
Antoine Cully. 2019. Autonomous skill discovery with quality-diversity and unsupervised descriptors. In GECCO 2019 - Proceedings of the 2019 Genetic and Evolutionary Computation Conference. Association for Computing Machinery, Inc, 81--89. arXiv:1905.11874
[12]
Antoine Cully. 2020. Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters. arXiv:2007.05352 [cs.NE]
[13]
Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean Baptiste Mouret. 2015. Robots that can adapt like animals. Nature (2015), 503--507. arXiv:1407.3501
[14]
Antoine Cully and Yiannis Demiris. 2018. Hierarchical Behavioral Repertoires with Unsupervised Descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference (Kyoto, Japan) (GECCO '18). Association for Computing Machinery, New York, NY, USA, 69--76.
[15]
Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework. IEEE Transactions on Evolutionary Computation (2018). arXiv:1708.09251
[16]
Antoine Cully and Jean Baptiste Mouret. 2013. Behavioral repertoire learning in robotics. In GECCO 2013 - Proceedings of the 2013 Genetic and Evolutionary Computation Conference.
[17]
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2020. First return then explore. (apr 2020). arXiv:2004.12919 http://arxiv.org/abs/2004.12919
[18]
Benjamin Eysenbach, Julian Ibarz, Abhishek Gupta, and Sergey Levine. 2019. Diversity is all you need: Learning skills without a reward function. In 7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR. arXiv:1802.06070
[19]
Manon Flageat and Antoine Cully. 2020. Fast and stable MAP-Elites in noisy domains using deep grids. The 2020 Conference on Artificial Life (2020).
[20]
Matthew C. Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K. Hoover. 2020. Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (Cancún, Mexico) (GECCO '20). Association for Computing Machinery, New York, NY, USA, 94--102.
[21]
Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning. (aug 2017). arXiv:1708.02190 http://arxiv.org/abs/1708.02190
[22]
Scott Fujimoto, Herke van Hoof, and David Meger. 2018. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsmässan, Stockholm Sweden, 1587--1596. http://proceedings.mlr.press/v80/fujimoto18a.html Implementation: https://github.com/sfujim/TD3.
[23]
Adam Gaier, Alexander Asteroth, and Jean-Baptiste Mouret. 2017. Aerodynamic design exploration through surrogate-assisted illumination. In 18th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference. 3330.
[24]
Tanmay Gangwani, Jian Peng, and Yuan Zhou. 2020. Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity. arXiv:2011.02614 [cs.LG]
[25]
Jorge Gomes, Sancho Moura Oliveira, and Anders Lyhne Christensen. 2018. An approach to evolve and exploit repertoires of general robot behaviours. Swarm and Evolutionary Computation (2018).
[26]
Abhishek Gupta, Russell Mendonca, Yu Xuan Liu, Pieter Abbeel, and Sergey Levine. 2018. Meta-reinforcement learning of structured exploration strategies. In Advances in Neural Information Processing Systems, Vol. 2018-Decem. Neural information processing systems foundation, 5302--5311. arXiv:1802.07245
[27]
Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, and Pieter Abbeel. 2018. Evolved policy gradients. In Advances in Neural Information Processing Systems, Vol. 2018-Decem. Neural information processing systems foundation, 5400--5409. arXiv:1802.04821
[28]
Shauharda Khadka and Kagan Tumer. 2018. Evolution-guided policy gradient in reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 2018-Decem. Neural information processing systems foundation, 1188--1200. arXiv:1805.07917
[29]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]
[30]
Ayaka Kume, Eiichi Matsumoto, Kuniyuki Takahashi, Wilson Ko, and Jethro Tan. 2017. Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning. (oct 2017). arXiv:1710.06117 http://arxiv.org/abs/1710.06117
[31]
Joel Lehman and Kenneth O. Stanley. 2011. Abandoning objectives: Evolution through the search for novelty alone. Evolutionary Computation 19, 2 (2011), 189--222.
[32]
Joel Lehman and Kenneth O. Stanley. 2011. Evolving a diversity of creatures through novelty search and local competition. In Genetic and Evolutionary Computation Conference, GECCO'11. 211--218.
[33]
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. arXiv:1509.02971
[34]
Long-Ji Lin. 1992. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching. Mach. Learn. 8, 3--4 (May 1992), 293--321.
[35]
Seppo Linnainmaa. [n.d.]. Taylor Expansion of the Accumulated Rounding Error. 16, 2 ([n. d.]), 146--160.
[36]
Niru Maheswaranathan, Luke Metz, George Tucker, and Jascha Sohl-Dickstein. 2018. Guided evolutionary strategies: escaping the curse of dimensionality in random search. arXiv (2018), 1--16. https://doi.org/arXiv:1806.10230v2 arXiv:1806.10230
[37]
Volodymyr Mnih, Adria Puigdomenech Badia, Lehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In 33rd International Conference on Machine Learning, ICML 2016, Vol. 4. International Machine Learning Society (IMLS), 2850--2869. arXiv:1602.01783
[38]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. (dec 2013). arXiv:1312.5602 http://arxiv.org/abs/1312.5602
[39]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (feb 2015), 529--533.
[40]
Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. (apr 2015). arXiv:1504.04909 http://arxiv.org/abs/1504.04909
[41]
Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz. 2018. Parameter space noise for exploration. In 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings. International Conference on Learning Representations, ICLR. arXiv:1706.01905
[42]
Aloïs Pourchot and Olivier Sigaud. 2019. CEM-RL: Combining evolutionary and gradient-based methods for policy search. In 7th International Conference on Learning Representations, ICLR 2019. International Conference on Learning Representations, ICLR. arXiv:1810.01222
[43]
Justin K. Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality diversity: A new frontier for evolutionary computation. Frontiers Robotics AI 3, JUL (jul 2016).
[44]
Justin K. Pugh, L. B. Soros, Paul A. Szerlip, and Kenneth O. Stanley. 2015. Confronting the challenge of quality diversity. In GECCO 2015 - Proceedings of the 2015 Genetic and Evolutionary Computation Conference. Association for Computing Machinery, Inc, 967--974.
[45]
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning. (mar 2017). arXiv:1703.03864 http://arxiv.org/abs/1703.03864
[46]
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In 31st International Conference on Machine Learning, ICML 2014.
[47]
Danesh Tarapore, Jeff Clune, Antoine Cully, and Jean-Baptiste Mouret. 2016. How Do Different Encodings Influence the Performance of the MAP-Elites Algorithm?. In Genetic and Evolutionary Computation Conference.
[48]
Vassilis Vassiliades, Konstantinos Chatzilygeroudis, and Jean-Baptiste Mouret. 2017. Using Centroidal Voronoi Tessellations to Scale Up the Multi-dimensional Archive of Phenotypic Elites Algorithm. IEEE Transactions on Evolutionary Computation (2017), 9. Implementation: https://github.com/resibots/pymap_elites.
[49]
Vassilis Vassiliades and Jean-Baptiste Mouret. 2018. Discovering the Elite Hypervolume by Leveraging Interspecies Correlation. Proceedings of the Genetic and Evolutionary Computation Conference (2018).
[50]
Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. 2016. Learning to reinforcement learn. (nov 2016). arXiv:1611.05763 http://arxiv.org/abs/1611.05763
[51]
Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. Machine Learning (1992).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference
June 2021
1219 pages
ISBN:9781450383509
DOI:10.1145/3449639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MAP-Elites
  2. neuroevolution
  3. quality-diversity

Qualifiers

  • Research-article

Funding Sources

Conference

GECCO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)147
  • Downloads (Last 6 weeks)18
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media