-
Open-Endedness is Essential for Artificial Superhuman Intelligence
Authors:
Edward Hughes,
Michael Dennis,
Jack Parker-Holder,
Feryal Behbahani,
Aditi Mavalankar,
Yuge Shi,
Tom Schaul,
Tim Rocktaschel
Abstract:
In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended, ever self-improving AI remains elusive. In this position paper, we argue that the ingredients are now in place to achieve openendedness in AI systems with respect to a human observer. Furthermore, w…
▽ More
In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended, ever self-improving AI remains elusive. In this position paper, we argue that the ingredients are now in place to achieve openendedness in AI systems with respect to a human observer. Furthermore, we claim that such open-endedness is an essential property of any artificial superhuman intelligence (ASI). We begin by providing a concrete formal definition of open-endedness through the lens of novelty and learnability. We then illustrate a path towards ASI via open-ended systems built on top of foundation models, capable of making novel, humanrelevant discoveries. We conclude by examining the safety implications of generally-capable openended AI. We expect that open-ended foundation models will prove to be an increasingly fertile and safety-critical area of research in the near future.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning
Authors:
Jonathan Cook,
Chris Lu,
Edward Hughes,
Joel Z. Leibo,
Jakob Foerster
Abstract:
Cultural accumulation drives the open-ended and diverse progress in capabilities spanning human history. It builds an expanding body of knowledge and skills by combining individual exploration with inter-generational information transmission. Despite its widespread success among humans, the capacity for artificial learning agents to accumulate culture remains under-explored. In particular, approac…
▽ More
Cultural accumulation drives the open-ended and diverse progress in capabilities spanning human history. It builds an expanding body of knowledge and skills by combining individual exploration with inter-generational information transmission. Despite its widespread success among humans, the capacity for artificial learning agents to accumulate culture remains under-explored. In particular, approaches to reinforcement learning typically strive for improvements over only a single lifetime. Generational algorithms that do exist fail to capture the open-ended, emergent nature of cultural accumulation, which allows individuals to trade-off innovation and imitation. Building on the previously demonstrated ability for reinforcement learning agents to perform social learning, we find that training setups which balance this with independent learning give rise to cultural accumulation. These accumulating agents outperform those trained for a single lifetime with the same cumulative experience. We explore this accumulation by constructing two models under two distinct notions of a generation: episodic generations, in which accumulation occurs via in-context learning and train-time generations, in which accumulation occurs via in-weights learning. In-context and in-weights cultural accumulation can be interpreted as analogous to knowledge and skill accumulation, respectively. To the best of our knowledge, this work is the first to present general models that achieve emergent cultural accumulation in reinforcement learning, opening up new avenues towards more open-ended learning systems, as well as presenting new opportunities for modelling human culture.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
The Ethics of Advanced AI Assistants
Authors:
Iason Gabriel,
Arianna Manzini,
Geoff Keeling,
Lisa Anne Hendricks,
Verena Rieser,
Hasan Iqbal,
Nenad Tomašev,
Ira Ktena,
Zachary Kenton,
Mikel Rodriguez,
Seliem El-Sayed,
Sasha Brown,
Canfer Akbulut,
Andrew Trask,
Edward Hughes,
A. Stevie Bergman,
Renee Shelby,
Nahema Marchal,
Conor Griffin,
Juan Mateos-Garcia,
Laura Weidinger,
Winnie Street,
Benjamin Lange,
Alex Ingerman,
Alison Lentz
, et al. (32 additional authors not shown)
Abstract:
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, pro…
▽ More
This paper focuses on the opportunities and the ethical and societal risks posed by advanced AI assistants. We define advanced AI assistants as artificial agents with natural language interfaces, whose function is to plan and execute sequences of actions on behalf of a user, across one or more domains, in line with the user's expectations. The paper starts by considering the technology itself, providing an overview of AI assistants, their technical foundations and potential range of applications. It then explores questions around AI value alignment, well-being, safety and malicious uses. Extending the circle of inquiry further, we next consider the relationship between advanced AI assistants and individual users in more detail, exploring topics such as manipulation and persuasion, anthropomorphism, appropriate relationships, trust and privacy. With this analysis in place, we consider the deployment of advanced assistants at a societal scale, focusing on cooperation, equity and access, misinformation, economic impact, the environment and how best to evaluate advanced AI assistants. Finally, we conclude by providing a range of recommendations for researchers, developers, policymakers and public stakeholders.
△ Less
Submitted 28 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
Viblio: Introducing Credibility Signals and Citations to Video-Sharing Platforms
Authors:
Emelia Hughes,
Renee Wang,
Prerna Juneja,
Tony Li,
Tanu Mitra,
Amy Zhang
Abstract:
As more users turn to video-sharing platforms like YouTube as an information source, they may consume misinformation despite their best efforts. In this work, we investigate ways that users can better assess the credibility of videos by first exploring how users currently determine credibility using existing signals on platforms and then by introducing and evaluating new credibility-based signals.…
▽ More
As more users turn to video-sharing platforms like YouTube as an information source, they may consume misinformation despite their best efforts. In this work, we investigate ways that users can better assess the credibility of videos by first exploring how users currently determine credibility using existing signals on platforms and then by introducing and evaluating new credibility-based signals. We conducted 12 contextual inquiry interviews with YouTube users, determining that participants used a combination of existing signals, such as the channel name, the production quality, and prior knowledge, to evaluate credibility, yet sometimes stumbled in their efforts to do so. We then developed Viblio, a prototype system that enables YouTube users to view and add citations and related information while watching a video based on our participants' needs. From an evaluation with 12 people, all participants found Viblio to be intuitive and useful in the process of evaluating a video's credibility and could see themselves using Viblio in the future.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Genie: Generative Interactive Environments
Authors:
Jake Bruce,
Michael Dennis,
Ashley Edwards,
Jack Parker-Holder,
Yuge Shi,
Edward Hughes,
Matthew Lai,
Aditi Mavalankar,
Richie Steigerwald,
Chris Apps,
Yusuf Aytar,
Sarah Bechtle,
Feryal Behbahani,
Stephanie Chan,
Nicolas Heess,
Lucy Gonzalez,
Simon Osindero,
Sherjil Ozair,
Scott Reed,
Jingwei Zhang,
Konrad Zolna,
Jeff Clune,
Nando de Freitas,
Satinder Singh,
Tim Rocktäschel
Abstract:
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotem…
▽ More
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization
Authors:
Yunfan Zhao,
Nikhil Behari,
Edward Hughes,
Edwin Zhang,
Dheeraj Nagaraj,
Karl Tuyls,
Aparna Taneja,
Milind Tambe
Abstract:
Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when…
▽ More
Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a training procedure in which arms opt-in and out over time. We derive a new update rule for a crucial $λ$-network with theoretical convergence guarantees and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.
△ Less
Submitted 29 January, 2024; v1 submitted 22 October, 2023;
originally announced October 2023.
-
A Unified Transformer-based Network for multimodal Emotion Recognition
Authors:
Kamran Ali,
Charles E. Hughes
Abstract:
The development of transformer-based models has resulted in significant advances in addressing various vision and NLP-based research challenges. However, the progress made in transformer-based methods has not been effectively applied to biosensing research. This paper presents a novel Unified Biosensor-Vision Multi-modal Transformer-based (UBVMT) method to classify emotions in an arousal-valence s…
▽ More
The development of transformer-based models has resulted in significant advances in addressing various vision and NLP-based research challenges. However, the progress made in transformer-based methods has not been effectively applied to biosensing research. This paper presents a novel Unified Biosensor-Vision Multi-modal Transformer-based (UBVMT) method to classify emotions in an arousal-valence space by combining a 2D representation of an ECG/PPG signal with the face information. To achieve this goal, we first investigate and compare the unimodal emotion recognition performance of three image-based representations of the ECG/PPG signal. We then present our UBVMT network which is trained to perform emotion recognition by combining the 2D image-based representation of the ECG/PPG signal and the facial expression features. Our unified transformer model consists of homogeneous transformer blocks that take as an input the 2D representation of the ECG/PPG signal and the corresponding face frame for emotion representation learning with minimal modality-specific design. Our UBVMT model is trained by reconstructing masked patches of video frames and 2D images of ECG/PPG signals, and contrastive modeling to align face and ECG/PPG data. Extensive experiments on the MAHNOB-HCI and DEAP datasets show that our Unified UBVMT-based model produces comparable results to the state-of-the-art techniques.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas
Authors:
Udari Madhushani,
Kevin R. McKee,
John P. Agapiou,
Joel Z. Leibo,
Richard Everett,
Thomas Anthony,
Edward Hughes,
Karl Tuyls,
Edgar A. Duéñez-Guzmán
Abstract:
In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributions of group reward. Prior studies show that groups of agents endowed with heterogeneous SVO learn diverse poli…
▽ More
In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributions of group reward. Prior studies show that groups of agents endowed with heterogeneous SVO learn diverse policies in settings that resemble the incentive structure of Prisoner's dilemma. Our work extends this body of results and demonstrates that (1) heterogeneous SVO leads to meaningfully diverse policies across a range of incentive structures in sequential social dilemmas, as measured by task-specific diversity metrics; and (2) learning a best response to such policy diversity leads to better zero-shot generalization in some situations. We show that these best-response agents learn policies that are conditioned on their co-players, which we posit is the reason for improved zero-shot generalization results.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Human-Timescale Adaptation in an Open-Ended Task Space
Authors:
Adaptive Agent Team,
Jakob Bauer,
Kate Baumli,
Satinder Baveja,
Feryal Behbahani,
Avishkar Bhoopchand,
Nathalie Bradley-Schmieg,
Michael Chang,
Natalie Clay,
Adrian Collister,
Vibhavari Dasagi,
Lucy Gonzalez,
Karol Gregor,
Edward Hughes,
Sheleem Kashem,
Maria Loks-Thompson,
Hannah Openshaw,
Jack Parker-Holder,
Shreya Pathak,
Nicolas Perez-Nieves,
Nemanja Rakicevic,
Tim Rocktäschel,
Yannick Schroecker,
Jakub Sygnowski,
Karl Tuyls
, et al. (3 additional authors not shown)
Abstract:
Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a…
▽ More
Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments
Authors:
Ian Gemp,
Thomas Anthony,
Yoram Bachrach,
Avishkar Bhoopchand,
Kalesha Bullard,
Jerome Connor,
Vibhavari Dasagi,
Bart De Vylder,
Edgar Duenez-Guzman,
Romuald Elie,
Richard Everett,
Daniel Hennes,
Edward Hughes,
Mina Khan,
Marc Lanctot,
Kate Larson,
Guy Lever,
Siqi Liu,
Luke Marris,
Kevin R. McKee,
Paul Muller,
Julien Perolat,
Florian Strub,
Andrea Tacchetti,
Eugene Tarassov
, et al. (2 additional authors not shown)
Abstract:
The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in d…
▽ More
The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in deep reinforcement learning to explore multi-agent systems in complex environments and use these benchmarks to advance our understanding. Here, we summarise the recent work of our team and present a taxonomy that we feel highlights many important open challenges in multi-agent research.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Electrically pumped quantum-dot lasers grown on 300 mm patterned Si photonic wafers
Authors:
Chen Shang,
Kaiyin Feng,
Eamonn T. Hughes,
Andrew Clark,
Mukul Debnath,
Rosalyn Koscica,
Gerald Leake,
Joshua Herman,
David Harame,
Peter Ludewig,
Yating Wan,
John E. Bowers
Abstract:
Monolithic integration of quantum dot (QD) gain materials onto Si photonic platforms via direct epitaxial growth is a promising solution for on-chip light sources. Recent developments have demonstrated superior device reliability in blanket hetero-epitaxy of III-V devices on Si at elevated temperatures. Yet, thick, defect management epi designs prevent vertical light coupling from the gain region…
▽ More
Monolithic integration of quantum dot (QD) gain materials onto Si photonic platforms via direct epitaxial growth is a promising solution for on-chip light sources. Recent developments have demonstrated superior device reliability in blanket hetero-epitaxy of III-V devices on Si at elevated temperatures. Yet, thick, defect management epi designs prevent vertical light coupling from the gain region to the Si-on-Insulator (SOI) waveguides. Here, we demonstrate the first electrically pumped QD lasers grown on a 300 mm patterned (001) Si wafer with a butt-coupled configuration by molecular beam epitaxy (MBE). Unique growth and fabrication challenges imposed by the template architecture have been resolved, contributing to continuous wave lasing to 60 °C and a maximum double-side output power of 126.6 mW at 20 °C with a double-side wall plug efficiency of 8.6%. The potential for robust on-chip laser operation and efficient low-loss light coupling to Si photonic circuits makes this heteroepitaxial integration platform on Si promising for scalable and low-cost mass production.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
Semi-supervised Drifted Stream Learning with Short Lookback
Authors:
Weijieying Ren,
Pengyang Wang,
Xiaolin Li,
Charles E. Hughes,
Yanjie Fu
Abstract:
In many scenarios, 1) data streams are generated in real time; 2) labeled data are expensive and only limited labels are available in the beginning; 3) real-world data is not always i.i.d. and data drift over time gradually; 4) the storage of historical streams is limited and model updating can only be achieved based on a very short lookback window. This learning setting limits the applicability a…
▽ More
In many scenarios, 1) data streams are generated in real time; 2) labeled data are expensive and only limited labels are available in the beginning; 3) real-world data is not always i.i.d. and data drift over time gradually; 4) the storage of historical streams is limited and model updating can only be achieved based on a very short lookback window. This learning setting limits the applicability and availability of many Machine Learning (ML) algorithms. We generalize the learning task under such setting as a semi-supervised drifted stream learning with short lookback problem (SDSL). SDSL imposes two under-addressed challenges on existing methods in semi-supervised learning, continuous learning, and domain adaptation: 1) robust pseudo-labeling under gradual shifts and 2) anti-forgetting adaptation with short lookback. To tackle these challenges, we propose a principled and generic generation-replay framework to solve SDSL. The framework is able to accomplish: 1) robust pseudo-labeling in the generation step; 2) anti-forgetting adaption in the replay step. To achieve robust pseudo-labeling, we develop a novel pseudo-label classification model to leverage supervised knowledge of previously labeled data, unsupervised knowledge of new data, and, structure knowledge of invariant label semantics. To achieve adaptive anti-forgetting model replay, we propose to view the anti-forgetting adaptation task as a flat region search problem. We propose a novel minimax game-based replay objective function to solve the flat region search problem and develop an effective optimization solver. Finally, we present extensive experiments to demonstrate our framework can effectively address the task of anti-forgetting learning in drifted streams with short lookback.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning
Authors:
Michael Bradley Johanson,
Edward Hughes,
Finbarr Timbers,
Joel Z. Leibo
Abstract:
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefe…
▽ More
Advances in artificial intelligence often stem from the development of new environments that abstract real-world situations into a form where research can be done conveniently. This paper contributes such an environment based on ideas inspired by elementary Microeconomics. Agents learn to produce resources in a spatially complex world, trade them with one another, and consume those that they prefer. We show that the emergent production, consumption, and pricing behaviors respond to environmental conditions in the directions predicted by supply and demand shifts in Microeconomics. We also demonstrate settings where the agents' emergent prices for goods vary over space, reflecting the local abundance of goods. After the price disparities emerge, some agents then discover a niche of transporting goods between regions with different prevailing prices -- a profitable strategy because they can buy goods where they are cheap and sell them where they are expensive. Finally, in a series of ablation experiments, we investigate how choices in the environmental rewards, bartering actions, agent architecture, and ability to consume tradable goods can either aid or inhibit the emergence of this economic behavior. This work is part of the environment development branch of a research program that aims to build human-like artificial general intelligence through multi-agent interactions in simulated societies. By exploring which environment features are needed for the basic phenomena of elementary microeconomics to emerge automatically from learning, we arrive at an environment that differs from those studied in prior multi-agent reinforcement learning work along several dimensions. For example, the model incorporates heterogeneous tastes and physical abilities, and agents negotiate with one another as a grounded form of communication.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Reinforced Imitative Graph Learning for Mobile User Profiling
Authors:
Dongjie Wang,
Pengyang Wang,
Yanjie Fu,
Kunpeng Liu,
Hui Xiong,
Charles E. Hughes
Abstract:
Mobile user profiling refers to the efforts of extracting users' characteristics from mobile activities. In order to capture the dynamic varying of user characteristics for generating effective user profiling, we propose an imitation-based mobile user profiling framework. Considering the objective of teaching an autonomous agent to imitate user mobility based on the user's profile, the user profil…
▽ More
Mobile user profiling refers to the efforts of extracting users' characteristics from mobile activities. In order to capture the dynamic varying of user characteristics for generating effective user profiling, we propose an imitation-based mobile user profiling framework. Considering the objective of teaching an autonomous agent to imitate user mobility based on the user's profile, the user profile is the most accurate when the agent can perfectly mimic the user behavior patterns. The profiling framework is formulated into a reinforcement learning task, where an agent is a next-visit planner, an action is a POI that a user will visit next, and the state of the environment is a fused representation of a user and spatial entities. An event in which a user visits a POI will construct a new state, which helps the agent predict users' mobility more accurately. In the framework, we introduce a spatial Knowledge Graph (KG) to characterize the semantics of user visits over connected spatial entities. Additionally, we develop a mutual-updating strategy to quantify the state that evolves over time. Along these lines, we develop a reinforcement imitative graph learning framework for mobile user profiling. Finally, we conduct extensive experiments to demonstrate the superiority of our approach.
△ Less
Submitted 12 March, 2022;
originally announced March 2022.
-
Learning Robust Real-Time Cultural Transmission without Human Data
Authors:
Cultural General Intelligence Team,
Avishkar Bhoopchand,
Bethanie Brownfield,
Adrian Collister,
Agustin Dal Lago,
Ashley Edwards,
Richard Everett,
Alexandre Frechette,
Yanko Gitahy Oliveira,
Edward Hughes,
Kory W. Mathewson,
Piermaria Mendolicchio,
Julia Pawar,
Miruna Pislar,
Alex Platonov,
Evan Senter,
Sukhdeep Singh,
Alexander Zacherl,
Lei M. Zhang
Abstract:
Cultural transmission is the domain-general social skill that allows agents to acquire and use information from each other in real-time with high fidelity and recall. In humans, it is the inheritance process that powers cumulative cultural evolution, expanding our skills, tools and knowledge across generations. We provide a method for generating zero-shot, high recall cultural transmission in arti…
▽ More
Cultural transmission is the domain-general social skill that allows agents to acquire and use information from each other in real-time with high fidelity and recall. In humans, it is the inheritance process that powers cumulative cultural evolution, expanding our skills, tools and knowledge across generations. We provide a method for generating zero-shot, high recall cultural transmission in artificially intelligent agents. Our agents succeed at real-time cultural transmission from humans in novel contexts without using any pre-collected human data. We identify a surprisingly simple set of ingredients sufficient for generating cultural transmission and develop an evaluation methodology for rigorously assessing it. This paves the way for cultural evolution as an algorithm for developing artificial general intelligence.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
Collaborating with Humans without Human Data
Authors:
DJ Strouse,
Kevin R. McKee,
Matt Botvinick,
Edward Hughes,
Richard Everett
Abstract:
Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model…
▽ More
Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train "human-aware" agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. We train our agent partner as the best response to a population of self-play agents and their past checkpoints taken throughout training, a method we call Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. Furthermore, humans also report a strong subjective preference to partnering with FCP agents over all baselines.
△ Less
Submitted 7 January, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
A multi-agent reinforcement learning model of reputation and cooperation in human groups
Authors:
Kevin R. McKee,
Edward Hughes,
Tina O. Zhu,
Martin J. Chadwick,
Raphael Koster,
Antonio Garcia Castaneda,
Charlie Beattie,
Thore Graepel,
Matt Botvinick,
Joel Z. Leibo
Abstract:
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive me…
▽ More
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive mechanisms contribute to the where and when of collective action. We build and test a computational model of human behavior in Clean Up, a social dilemma task popular in multi-agent reinforcement learning research. We show that human groups effectively cooperate in Clean Up when they can identify group members and track reputations over time, but fail to organize under conditions of anonymity. A multi-agent reinforcement learning model of reputation demonstrates the same difference in cooperation under conditions of identifiability and anonymity. In addition, the model accurately predicts spatial and temporal patterns of group behavior: in this public goods dilemma, the intrinsic motivation for reputation catalyzes the development of a non-territorial, turn-taking strategy to coordinate collective action.
△ Less
Submitted 22 February, 2023; v1 submitted 8 March, 2021;
originally announced March 2021.
-
Modelling Cooperation in Network Games with Spatio-Temporal Complexity
Authors:
Michiel A. Bakker,
Richard Everett,
Laura Weidinger,
Iason Gabriel,
William S. Isaac,
Joel Z. Leibo,
Edward Hughes
Abstract:
The real world is awash with multi-agent problems that require collective action by self-interested agents, from the routing of packets across a computer network to the management of irrigation systems. Such systems have local incentives for individuals, whose behavior has an impact on the global outcome for the group. Given appropriate mechanisms describing agent interaction, groups may achieve s…
▽ More
The real world is awash with multi-agent problems that require collective action by self-interested agents, from the routing of packets across a computer network to the management of irrigation systems. Such systems have local incentives for individuals, whose behavior has an impact on the global outcome for the group. Given appropriate mechanisms describing agent interaction, groups may achieve socially beneficial outcomes, even in the face of short-term selfish incentives. In many cases, collective action problems possess an underlying graph structure, whose topology crucially determines the relationship between local decisions and emergent global effects. Such scenarios have received great attention through the lens of network games. However, this abstraction typically collapses important dimensions, such as geometry and time, relevant to the design of mechanisms promoting cooperation. In parallel work, multi-agent deep reinforcement learning has shown great promise in modelling the emergence of self-organized cooperation in complex gridworld domains. Here we apply this paradigm in graph-structured collective action problems. Using multi-agent deep reinforcement learning, we simulate an agent society for a variety of plausible mechanisms, finding clear transitions between different equilibria over time. We define analytic tools inspired by related literatures to measure the social outcomes, and use these to draw conclusions about the efficacy of different environmental interventions. Our methods have implications for mechanism design in both human and artificial agent systems.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Neural Recursive Belief States in Multi-Agent Reinforcement Learning
Authors:
Pol Moreno,
Edward Hughes,
Kevin R. McKee,
Bernardo Avila Pires,
Théophane Weber
Abstract:
In multi-agent reinforcement learning, the problem of learning to act is particularly difficult because the policies of co-players may be heavily conditioned on information only observed by them. On the other hand, humans readily form beliefs about the knowledge possessed by their peers and leverage beliefs to inform decision-making. Such abilities underlie individual success in a wide range of Ma…
▽ More
In multi-agent reinforcement learning, the problem of learning to act is particularly difficult because the policies of co-players may be heavily conditioned on information only observed by them. On the other hand, humans readily form beliefs about the knowledge possessed by their peers and leverage beliefs to inform decision-making. Such abilities underlie individual success in a wide range of Markov games, from bluffing in Poker to conditional cooperation in the Prisoner's Dilemma, to convention-building in Bridge. Classical methods are usually not applicable to complex domains due to the intractable nature of hierarchical beliefs (i.e. beliefs of other agents' beliefs). We propose a scalable method to approximate these belief structures using recursive deep generative models, and to use the belief models to obtain representations useful to acting in complex tasks. Our agents trained with belief models outperform model-free baselines with equivalent representational capacity using common training paradigms. We also show that higher-order belief models outperform agents with lower-order models.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Open Problems in Cooperative AI
Authors:
Allan Dafoe,
Edward Hughes,
Yoram Bachrach,
Tantum Collins,
Kevin R. McKee,
Joel Z. Leibo,
Kate Larson,
Thore Graepel
Abstract:
Problems of cooperation--in which agents seek ways to jointly improve their welfare--are ubiquitous and important. They can be found at scales ranging from our daily routines--such as driving on highways, scheduling meetings, and working collaboratively--to our global challenges--such as peace, commerce, and pandemic preparedness. Arguably, the success of the human species is rooted in our ability…
▽ More
Problems of cooperation--in which agents seek ways to jointly improve their welfare--are ubiquitous and important. They can be found at scales ranging from our daily routines--such as driving on highways, scheduling meetings, and working collaboratively--to our global challenges--such as peace, commerce, and pandemic preparedness. Arguably, the success of the human species is rooted in our ability to cooperate. Since machines powered by artificial intelligence are playing an ever greater role in our lives, it will be important to equip them with the capabilities necessary to cooperate and to foster cooperation.
We see an opportunity for the field of artificial intelligence to explicitly focus effort on this class of problems, which we term Cooperative AI. The objective of this research would be to study the many aspects of the problems of cooperation and to innovate in AI to contribute to solving these problems. Central goals include building machine agents with the capabilities needed for cooperation, building tools to foster cooperation in populations of (machine and/or human) agents, and otherwise conducting AI research for insight relevant to problems of cooperation. This research integrates ongoing work on multi-agent systems, game theory and social choice, human-machine interaction and alignment, natural-language processing, and the construction of social tools and platforms. However, Cooperative AI is not the union of these existing areas, but rather an independent bet about the productivity of specific kinds of conversations that involve these and other areas. We see opportunity to more explicitly focus on the problem of cooperation, to construct unified theory and vocabulary, and to build bridges with adjacent communities working on cooperation, including in the natural, social, and behavioural sciences.
△ Less
Submitted 15 December, 2020;
originally announced December 2020.
-
Negotiating Team Formation Using Deep Reinforcement Learning
Authors:
Yoram Bachrach,
Richard Everett,
Edward Hughes,
Angeliki Lazaridou,
Joel Z. Leibo,
Marc Lanctot,
Michael Johanson,
Wojciech M. Czarnecki,
Thore Graepel
Abstract:
When autonomous agents interact in the same environment, they must often cooperate to achieve their goals. One way for agents to cooperate effectively is to form a team, make a binding agreement on a joint plan, and execute it. However, when agents are self-interested, the gains from team formation must be allocated appropriately to incentivize agreement. Various approaches for multi-agent negotia…
▽ More
When autonomous agents interact in the same environment, they must often cooperate to achieve their goals. One way for agents to cooperate effectively is to form a team, make a binding agreement on a joint plan, and execute it. However, when agents are self-interested, the gains from team formation must be allocated appropriately to incentivize agreement. Various approaches for multi-agent negotiation have been proposed, but typically only work for particular negotiation protocols. More general methods usually require human input or domain-specific data, and so do not scale. To address this, we propose a framework for training agents to negotiate and form teams using deep reinforcement learning. Importantly, our method makes no assumptions about the specific negotiation protocol, and is instead completely experience driven. We evaluate our approach on both non-spatial and spatially extended team-formation negotiation environments, demonstrating that our agents beat hand-crafted bots and reach negotiation outcomes consistent with fair solutions predicted by cooperative game theory. Additionally, we investigate how the physical location of agents influences negotiation outcomes.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences
Authors:
Raphael Köster,
Kevin R. McKee,
Richard Everett,
Laura Weidinger,
William S. Isaac,
Edward Hughes,
Edgar A. Duéñez-Guzmán,
Thore Graepel,
Matthew Botvinick,
Joel Z. Leibo
Abstract:
Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with…
▽ More
Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with model-based (MB) reinforcement learning. The former captures habitual and procedural learning while the latter captures choices taken via explicit planning and deduction. Some conventions (e.g. international treaties) are likely supported by cognition that resonates with the game theoretic and MB accounts. However, convention formation may also occur via MF mechanisms like habit learning; though this possibility has been understudied. Here, we demonstrate that complex, large-scale conventions can emerge from MF learning mechanisms. This suggests that some conventions may be supported by habit-like cognition rather than explicit reasoning. We apply MF multi-agent reinforcement learning to a temporo-spatially extended game with incomplete information. In this game, large parts of the state space are reachable only by collective action. However, heterogeneity of tastes makes such coordinated action difficult: multiple equilibria are desirable for all players, but subgroups prefer a particular equilibrium over all others. This creates a coordination problem that can be solved by establishing a convention. We investigate start-up and free rider subproblems as well as the effects of group size, intensity of intrinsic preference, and salience on the emergence dynamics of coordination conventions. Results of our simulations show agents establish and switch between conventions, even working against their own preferred outcome when doing so is necessary for effective coordination.
△ Less
Submitted 14 December, 2020; v1 submitted 18 October, 2020;
originally announced October 2020.
-
Learning to Incentivize Other Learning Agents
Authors:
Jiachen Yang,
Ang Li,
Mehrdad Farajtabar,
Peter Sunehag,
Edward Hughes,
Hongyuan Zha
Abstract:
The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term question inevitably arises: how will such independent agents cooperate when they are continually learning and actin…
▽ More
The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term question inevitably arises: how will such independent agents cooperate when they are continually learning and acting in a shared multi-agent environment? Observing that humans often provide incentives to influence others' behavior, we propose to equip each RL agent in a multi-agent environment with the ability to give rewards directly to other agents, using a learned incentive function. Each agent learns its own incentive function by explicitly accounting for its impact on the learning of recipients and, through them, the impact on its own extrinsic objective. We demonstrate in experiments that such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games, often by finding a near-optimal division of labor. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
△ Less
Submitted 19 October, 2020; v1 submitted 10 June, 2020;
originally announced June 2020.
-
An Efficient Integration of Disentangled Attended Expression and Identity FeaturesFor Facial Expression Transfer andSynthesis
Authors:
Kamran Ali,
Charles E. Hughes
Abstract:
In this paper, we present an Attention-based Identity Preserving Generative Adversarial Network (AIP-GAN) to overcome the identity leakage problem from a source image to a generated face image, an issue that is encountered in a cross-subject facial expression transfer and synthesis process. Our key insight is that the identity preserving network should be able to disentangle and compose shape, app…
▽ More
In this paper, we present an Attention-based Identity Preserving Generative Adversarial Network (AIP-GAN) to overcome the identity leakage problem from a source image to a generated face image, an issue that is encountered in a cross-subject facial expression transfer and synthesis process. Our key insight is that the identity preserving network should be able to disentangle and compose shape, appearance, and expression information for efficient facial expression transfer and synthesis. Specifically, the expression encoder of our AIP-GAN disentangles the expression information from the input source image by predicting its facial landmarks using our supervised spatial and channel-wise attention module. Similarly, the disentangled expression-agnostic identity features are extracted from the input target image by inferring its combined intrinsic-shape and appearance image employing our self-supervised spatial and channel-wise attention mod-ule. To leverage the expression and identity information encoded by the intermediate layers of both of our encoders, we combine these features with the features learned by the intermediate layers of our decoder using a cross-encoder bilinear pooling operation. Experimental results show the promising performance of our AIP-GAN based technique.
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games
Authors:
Edward Hughes,
Thomas W. Anthony,
Tom Eccles,
Joel Z. Leibo,
David Balduzzi,
Yoram Bachrach
Abstract:
Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. What's more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum…
▽ More
Zero-sum games have long guided artificial intelligence research, since they possess both a rich strategy space of best-responses and a clear evaluation metric. What's more, competition is a vital mechanism in many real-world multi-agent systems capable of generating intelligent innovations: Darwinian evolution, the market economy and the AlphaZero algorithm, to name a few. In two-player zero-sum games, the challenge is usually viewed as finding Nash equilibrium strategies, safeguarding against exploitation regardless of the opponent. While this captures the intricacies of chess or Go, it avoids the notion of cooperation with co-players, a hallmark of the major transitions leading from unicellular organisms to human civilization. Beyond two players, alliance formation often confers an advantage; however this requires trust, namely the promise of mutual cooperation in the face of incentives to defect. Successful play therefore requires adaptation to co-players rather than the pursuit of non-exploitability. Here we argue that a systematic study of many-player zero-sum games is a crucial element of artificial intelligence research. Using symmetric zero-sum matrix games, we demonstrate formally that alliance formation may be seen as a social dilemma, and empirically that naïve multi-agent reinforcement learning therefore fails to form alliances. We introduce a toy model of economic competition, and show how reinforcement learning may be augmented with a peer-to-peer contract mechanism to discover and enforce alliances. Finally, we generalize our agent model to incorporate temporally-extended contracts, presenting opportunities for further work.
△ Less
Submitted 27 February, 2020;
originally announced March 2020.
-
Social diversity and social preferences in mixed-motive reinforcement learning
Authors:
Kevin R. McKee,
Ian Gemp,
Brian McWilliams,
Edgar A. Duéñez-Guzmán,
Edward Hughes,
Joel Z. Leibo
Abstract:
Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity. In contrast, studies of reinforcement learning in mixed-motive games have primarily leveraged homogeneous approaches. Given the defining characteristic of mixed-motive games--the imperfect correlation of incentives between group members--we study the…
▽ More
Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity. In contrast, studies of reinforcement learning in mixed-motive games have primarily leveraged homogeneous approaches. Given the defining characteristic of mixed-motive games--the imperfect correlation of incentives between group members--we study the effect of population heterogeneity on mixed-motive reinforcement learning. We draw on interdependence theory from social psychology and imbue reinforcement learning agents with Social Value Orientation (SVO), a flexible formalization of preferences over group outcome distributions. We subsequently explore the effects of diversity in SVO on populations of reinforcement learning agents in two mixed-motive Markov games. We demonstrate that heterogeneity in SVO generates meaningful and complex behavioral variation among agents similar to that suggested by interdependence theory. Empirical results in these mixed-motive dilemmas suggest agents trained in heterogeneous populations develop particularly generalized, high-performing policies relative to those trained in homogeneous populations.
△ Less
Submitted 12 February, 2020; v1 submitted 6 February, 2020;
originally announced February 2020.
-
The Rockerverse: Packages and Applications for Containerization with R
Authors:
Daniel Nüst,
Dirk Eddelbuettel,
Dom Bennett,
Robrecht Cannoodt,
Dav Clark,
Gergely Daroczi,
Mark Edmondson,
Colin Fay,
Ellis Hughes,
Lars Kjeldgaard,
Sean Lopp,
Ben Marwick,
Heather Nolis,
Jacqueline Nolis,
Hong Ooi,
Karthik Ram,
Noam Ross,
Lori Shepherd,
Péter Sólymos,
Tyson Lee Swetnam,
Nitesh Turaga,
Charlotte Van Petegem,
Jason Williams,
Craig Willis,
Nan Xiao
Abstract:
The Rocker Project provides widely used Docker images for R across different application scenarios. This article surveys downstream projects that build upon the Rocker Project images and presents the current state of R packages for managing Docker images and controlling containers. These use cases cover diverse topics such as package development, reproducible research, collaborative work, cloud-ba…
▽ More
The Rocker Project provides widely used Docker images for R across different application scenarios. This article surveys downstream projects that build upon the Rocker Project images and presents the current state of R packages for managing Docker images and controlling containers. These use cases cover diverse topics such as package development, reproducible research, collaborative work, cloud-based data processing, and production deployment of services. The variety of applications demonstrates the power of the Rocker Project specifically and containerisation in general. Across the diverse ways to use containers, we identified common themes: reproducible environments, scalability and efficiency, and portability across clouds. We conclude that the current growth and diversification of use cases is likely to continue its positive impact, but see the need for consolidating the Rockerverse ecosystem of packages, developing common practices for applications, and exploring alternative containerisation software.
△ Less
Submitted 17 August, 2020; v1 submitted 28 January, 2020;
originally announced January 2020.
-
Smooth markets: A basic mechanism for organizing gradient-based learners
Authors:
David Balduzzi,
Wojciech M Czarnecki,
Thomas W Anthony,
Ian M Gemp,
Edward Hughes,
Joel Z Leibo,
Georgios Piliouras,
Thore Graepel
Abstract:
With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codi…
▽ More
With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact. Unfortunately, negative results from game theory show there is little hope of understanding or controlling general n-player games. We therefore introduce smooth markets (SM-games), a class of n-player games with pairwise zero sum interactions. SM-games codify a common design pattern in machine learning that includes (some) GANs, adversarial training, and other recent algorithms. We show that SM-games are amenable to analysis and optimization using first-order methods.
△ Less
Submitted 18 January, 2020; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Facial Expression Representation Learning by Synthesizing Expression Images
Authors:
Kamran Ali,
Charles E. Hughes
Abstract:
Representations used for Facial Expression Recognition (FER) usually contain expression information along with identity features. In this paper, we propose a novel Disentangled Expression learning-Generative Adversarial Network (DE-GAN) which combines the concept of disentangled representation learning with residue learning to explicitly disentangle facial expression representation from identity i…
▽ More
Representations used for Facial Expression Recognition (FER) usually contain expression information along with identity features. In this paper, we propose a novel Disentangled Expression learning-Generative Adversarial Network (DE-GAN) which combines the concept of disentangled representation learning with residue learning to explicitly disentangle facial expression representation from identity information. In this method the facial expression representation is learned by reconstructing an expression image employing an encoder-decoder based generator. Unlike previous works using only expression residual learning for facial expression recognition, our method learns the disentangled expression representation along with the expressive component recorded by the encoder of DE-GAN. In order to improve the quality of synthesized expression images and the effectiveness of the learned disentangled expression representation, expression and identity classification is performed by the discriminator of DE-GAN. Experiments performed on widely used datasets (CK+, MMI, Oulu-CASIA) show that the proposed technique produces comparable or better results than state-of-the-art methods.
△ Less
Submitted 30 November, 2019;
originally announced December 2019.
-
All-In-One: Facial Expression Transfer, Editing and Recognition Using A Single Network
Authors:
Kamran Ali,
Charles E. Hughes
Abstract:
In this paper, we present a unified architecture known as Transfer-Editing and Recognition Generative Adversarial Network (TER-GAN) which can be used: 1. to transfer facial expressions from one identity to another identity, known as Facial Expression Transfer (FET), 2. to transform the expression of a given image to a target expression, while preserving the identity of the image, known as Facial E…
▽ More
In this paper, we present a unified architecture known as Transfer-Editing and Recognition Generative Adversarial Network (TER-GAN) which can be used: 1. to transfer facial expressions from one identity to another identity, known as Facial Expression Transfer (FET), 2. to transform the expression of a given image to a target expression, while preserving the identity of the image, known as Facial Expression Editing (FEE), and 3. to recognize the facial expression of a face image, known as Facial Expression Recognition (FER). In TER-GAN, we combine the capabilities of generative models to generate synthetic images, while learning important information about the input images during the reconstruction process. More specifically, two encoders are used in TER-GAN to encode identity and expression information from two input images, and a synthetic expression image is generated by the decoder part of TER-GAN. To improve the feature disentanglement and extraction process, we also introduce a novel expression consistency loss and an identity consistency loss which exploit extra expression and identity information from generated images. Experimental results show that the proposed method can be used for efficient facial expression transfer, facial expression editing and facial expression recognition. In order to evaluate the proposed technique and to compare our results with state-of-the-art methods, we have used the Oulu-CASIA dataset for our experiments.
△ Less
Submitted 16 November, 2019;
originally announced November 2019.
-
Facial Expression Recognition Using Disentangled Adversarial Learning
Authors:
Kamran Ali,
Charles E. Hughes
Abstract:
The representation used for Facial Expression Recognition (FER) usually contain expression information along with other variations such as identity and illumination. In this paper, we propose a novel Disentangled Expression learning-Generative Adversarial Network (DE-GAN) to explicitly disentangle facial expression representation from identity information. In this learning by reconstruction method…
▽ More
The representation used for Facial Expression Recognition (FER) usually contain expression information along with other variations such as identity and illumination. In this paper, we propose a novel Disentangled Expression learning-Generative Adversarial Network (DE-GAN) to explicitly disentangle facial expression representation from identity information. In this learning by reconstruction method, facial expression representation is learned by reconstructing an expression image employing an encoder-decoder based generator. This expression representation is disentangled from identity component by explicitly providing the identity code to the decoder part of DE-GAN. The process of expression image reconstruction and disentangled expression representation learning is improved by performing expression and identity classification in the discriminator of DE-GAN. The disentangled facial expression representation is then used for facial expression recognition employing simple classifiers like SVM or MLP. The experiments are performed on publicly available and widely used face expression databases (CK+, MMI, Oulu-CASIA). The experimental results show that the proposed technique produces comparable results with state-of-the-art methods.
△ Less
Submitted 28 September, 2019;
originally announced September 2019.
-
A Generalized Training Approach for Multiagent Learning
Authors:
Paul Muller,
Shayegan Omidshafiei,
Mark Rowland,
Karl Tuyls,
Julien Perolat,
Siqi Liu,
Daniel Hennes,
Luke Marris,
Marc Lanctot,
Edward Hughes,
Zhe Wang,
Guy Lever,
Nicolas Heess,
Thore Graepel,
Remi Munos
Abstract:
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-…
▽ More
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, $α$-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and applies readily to general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and $α$-Rank. We demonstrate the competitive performance of $α$-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where $α$-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain.
△ Less
Submitted 14 February, 2020; v1 submitted 27 September, 2019;
originally announced September 2019.
-
OpenSpiel: A Framework for Reinforcement Learning in Games
Authors:
Marc Lanctot,
Edward Lockhart,
Jean-Baptiste Lespiau,
Vinicius Zambaldi,
Satyaki Upadhyay,
Julien Pérolat,
Sriram Srinivasan,
Finbarr Timbers,
Karl Tuyls,
Shayegan Omidshafiei,
Daniel Hennes,
Dustin Morrill,
Paul Muller,
Timo Ewalds,
Ryan Faulkner,
János Kramár,
Bart De Vylder,
Brennan Saeta,
James Bradbury,
David Ding,
Sebastian Borgeaud,
Matthew Lai,
Julian Schrittwieser,
Thomas Anthony,
Edward Hughes
, et al. (2 additional authors not shown)
Abstract:
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia…
▽ More
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully- observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning, computational game theory, and search.
△ Less
Submitted 26 September, 2020; v1 submitted 25 August, 2019;
originally announced August 2019.
-
Learning Reciprocity in Complex Sequential Social Dilemmas
Authors:
Tom Eccles,
Edward Hughes,
János Kramár,
Steven Wheelwright,
Joel Z. Leibo
Abstract:
Reciprocity is an important feature of human social interaction and underpins our cooperative nature. What is more, simple forms of reciprocity have proved remarkably resilient in matrix game social dilemmas. Most famously, the tit-for-tat strategy performs very well in tournaments of Prisoner's Dilemma. Unfortunately this strategy is not readily applicable to the real world, in which options to c…
▽ More
Reciprocity is an important feature of human social interaction and underpins our cooperative nature. What is more, simple forms of reciprocity have proved remarkably resilient in matrix game social dilemmas. Most famously, the tit-for-tat strategy performs very well in tournaments of Prisoner's Dilemma. Unfortunately this strategy is not readily applicable to the real world, in which options to cooperate or defect are temporally and spatially extended. Here, we present a general online reinforcement learning algorithm that displays reciprocal behavior towards its co-players. We show that it can induce pro-social outcomes for the wider group when learning alongside selfish agents, both in a $2$-player Markov game, and in $5$-player intertemporal social dilemmas. We analyse the resulting policies to show that the reciprocating agents are strongly influenced by their co-players' behavior.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research
Authors:
Joel Z. Leibo,
Edward Hughes,
Marc Lanctot,
Thore Graepel
Abstract:
Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally eme…
▽ More
Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally emergent curriculum, which we term an autocurriculum. The solution of one social task often begets new social tasks, continually generating novel challenges, and thereby promoting innovation. Under certain conditions these challenges may become increasingly complex over time, demanding that agents accumulate ever more innovations.
△ Less
Submitted 11 March, 2019; v1 submitted 2 March, 2019;
originally announced March 2019.
-
The Hanabi Challenge: A New Frontier for AI Research
Authors:
Nolan Bard,
Jakob N. Foerster,
Sarath Chandar,
Neil Burch,
Marc Lanctot,
H. Francis Song,
Emilio Parisotto,
Vincent Dumoulin,
Subhodeep Moitra,
Edward Hughes,
Iain Dunning,
Shibl Mourad,
Hugo Larochelle,
Marc G. Bellemare,
Michael Bowling
Abstract:
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains…
▽ More
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet well-defined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay with two to five players and imperfect information. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques for such theory of mind reasoning will not only be crucial for success in Hanabi, but also in broader collaborative efforts, especially those with human partners. To facilitate future research, we introduce the open-source Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current state-of-the-art techniques.
△ Less
Submitted 6 December, 2019; v1 submitted 1 February, 2019;
originally announced February 2019.
-
Causal Reasoning from Meta-reinforcement Learning
Authors:
Ishita Dasgupta,
Jane Wang,
Silvia Chiappa,
Jovana Mitrovic,
Pedro Ortega,
David Raposo,
Edward Hughes,
Peter Battaglia,
Matthew Botvinick,
Zeb Kurth-Nelson
Abstract:
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel…
▽ More
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents. Here we explore whether causal reasoning can emerge via meta-reinforcement learning. We train a recurrent network with model-free reinforcement learning to solve a range of problems that each contain causal structure. We find that the trained agent can perform causal reasoning in novel situations in order to obtain rewards. The agent can select informative interventions, draw causal inferences from observational data, and make counterfactual predictions. Although established formal causal reasoning algorithms also exist, in this paper we show that such reasoning can arise from model-free reinforcement learning, and suggest that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here. This work also offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform -- and interpret -- experiments.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Malthusian Reinforcement Learning
Authors:
Joel Z. Leibo,
Julien Perolat,
Edward Hughes,
Steven Wheelwright,
Adam H. Marblestone,
Edgar Duéñez-Guzmán,
Peter Sunehag,
Iain Dunning,
Thore Graepel
Abstract:
Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation's average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship betwe…
▽ More
Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation's average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship between preindustrial income levels and population growth. Malthusian reinforcement learning harnesses the competitive pressures arising from growing and shrinking population size to drive agents to explore regions of state and policy spaces that they could not otherwise reach. Furthermore, in environments where there are potential gains from specialization and division of labor, we show that Malthusian reinforcement learning is better positioned to take advantage of such synergies than algorithms based on self-play.
△ Less
Submitted 3 March, 2019; v1 submitted 17 December, 2018;
originally announced December 2018.
-
Evolving intrinsic motivations for altruistic behavior
Authors:
Jane X. Wang,
Edward Hughes,
Chrisantha Fernando,
Wojciech M. Czarnecki,
Edgar A. Duenez-Guzman,
Joel Z. Leibo
Abstract:
Multi-agent cooperation is an important feature of the natural world. Many tasks involve individual incentives that are misaligned with the common good, yet a wide range of organisms from bacteria to insects and humans are able to overcome their differences and collaborate. Therefore, the emergence of cooperative behavior amongst self-interested individuals is an important question for the fields…
▽ More
Multi-agent cooperation is an important feature of the natural world. Many tasks involve individual incentives that are misaligned with the common good, yet a wide range of organisms from bacteria to insects and humans are able to overcome their differences and collaborate. Therefore, the emergence of cooperative behavior amongst self-interested individuals is an important question for the fields of multi-agent reinforcement learning (MARL) and evolutionary theory. Here, we study a particular class of multi-agent problems called intertemporal social dilemmas (ISDs), where the conflict between the individual and the group is particularly sharp. By combining MARL with appropriately structured natural selection, we demonstrate that individual inductive biases for cooperation can be learned in a model-free way. To achieve this, we introduce an innovative modular architecture for deep reinforcement learning agents which supports multi-level selection. We present results in two challenging environments, and interpret these in the context of cultural and ecological evolution.
△ Less
Submitted 11 March, 2019; v1 submitted 14 November, 2018;
originally announced November 2018.
-
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Authors:
Jakob N. Foerster,
Francis Song,
Edward Hughes,
Neil Burch,
Iain Dunning,
Shimon Whiteson,
Matthew Botvinick,
Michael Bowling
Abstract:
When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player,…
▽ More
When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player, zero-sum games, scalable multi-agent reinforcement learning algorithms that can discover effective strategies and conventions in complex, partially observable settings have proven elusive. We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment. BAD introduces a new Markov decision process, the public belief MDP, in which the action space consists of all deterministic partial policies, and exploits the fact that an agent acting only on this public belief state can still learn to use its private information if the action space is augmented to be over all partial policies mapping private information into environment actions. The Bayesian update is closely related to the theory of mind reasoning that humans carry out when observing others' actions. We first validate BAD on a proof-of-principle two-step matrix game, where it outperforms policy gradient methods; we then evaluate BAD on the challenging, cooperative partial-information card game Hanabi, where, in the two-player setting, it surpasses all previously published learning and hand-coded approaches, establishing a new state of the art.
△ Less
Submitted 10 September, 2019; v1 submitted 4 November, 2018;
originally announced November 2018.
-
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Authors:
Natasha Jaques,
Angeliki Lazaridou,
Edward Hughes,
Caglar Gulcehre,
Pedro A. Ortega,
DJ Strouse,
Joel Z. Leibo,
Nando de Freitas
Abstract:
We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agen…
▽ More
We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agents. Actions that lead to bigger changes in other agents' behavior are considered influential and are rewarded. We show that this is equivalent to rewarding agents for having high mutual information between their actions. Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. The influence rewards for all agents can be computed in a decentralized way by enabling agents to learn a model of other agents using deep neural networks. In contrast, key previous works on emergent communication in the MARL setting were unable to learn diverse policies in a decentralized manner and had to resort to centralized training. Consequently, the influence reward opens up a window of new opportunities for research in this area.
△ Less
Submitted 18 June, 2019; v1 submitted 19 October, 2018;
originally announced October 2018.
-
Learning to Understand Goal Specifications by Modelling Reward
Authors:
Dzmitry Bahdanau,
Felix Hill,
Jan Leike,
Edward Hughes,
Arian Hosseini,
Pushmeet Kohli,
Edward Grefenstette
Abstract:
Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we prese…
▽ More
Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we present a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples. As reward models improve, they learn to accurately reward agents for completing tasks for environment configurations---and for instructions---not present amongst the expert data. This framework effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, it enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new expert examples.
△ Less
Submitted 23 December, 2019; v1 submitted 5 June, 2018;
originally announced June 2018.
-
Inequity aversion improves cooperation in intertemporal social dilemmas
Authors:
Edward Hughes,
Joel Z. Leibo,
Matthew G. Phillips,
Karl Tuyls,
Edgar A. Duéñez-Guzmán,
Antonio García Castañeda,
Iain Dunning,
Tina Zhu,
Kevin R. McKee,
Raphael Koster,
Heather Roff,
Thore Graepel
Abstract:
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However…
▽ More
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
△ Less
Submitted 27 September, 2018; v1 submitted 23 March, 2018;
originally announced March 2018.
-
Hand2Face: Automatic Synthesis and Recognition of Hand Over Face Occlusions
Authors:
Behnaz Nojavanasghari,
Charles. E. Hughes,
Tadas Baltrusaitis,
Louis-philippe Morency
Abstract:
A person's face discloses important information about their affective state. Although there has been extensive research on recognition of facial expressions, the performance of existing approaches is challenged by facial occlusions. Facial occlusions are often treated as noise and discarded in recognition of affective states. However, hand over face occlusions can provide additional information fo…
▽ More
A person's face discloses important information about their affective state. Although there has been extensive research on recognition of facial expressions, the performance of existing approaches is challenged by facial occlusions. Facial occlusions are often treated as noise and discarded in recognition of affective states. However, hand over face occlusions can provide additional information for recognition of some affective states such as curiosity, frustration and boredom. One of the reasons that this problem has not gained attention is the lack of naturalistic occluded faces that contain hand over face occlusions as well as other types of occlusions. Traditional approaches for obtaining affective data are time demanding and expensive, which limits researchers in affective computing to work on small datasets. This limitation affects the generalizability of models and deprives researchers from taking advantage of recent advances in deep learning that have shown great success in many fields but require large volumes of data. In this paper, we first introduce a novel framework for synthesizing naturalistic facial occlusions from an initial dataset of non-occluded faces and separate images of hands, reducing the costly process of data collection and annotation. We then propose a model for facial occlusion type recognition to differentiate between hand over face occlusions and other types of occlusions such as scarves, hair, glasses and objects. Finally, we present a model to localize hand over face occlusions and identify the occluded regions of the face.
△ Less
Submitted 16 August, 2017; v1 submitted 1 August, 2017;
originally announced August 2017.