Skip to main content

Showing 1–20 of 20 results for author: Buesing, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2301.05158  [pdf, other

    cs.CV cs.AI cs.LG

    SemPPL: Predicting pseudo-labels for better contrastive representations

    Authors: Matko Bošnjak, Pierre H. Richemond, Nenad Tomasev, Florian Strub, Jacob C. Walker, Felix Hill, Lars Holger Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

    Abstract: Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning -- where representations are shape… ▽ More

    Submitted 10 January, 2024; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Published as a conference paper at ICLR 2023. For checkpoints and source code see https://github.com/google-deepmind/semppl

  2. arXiv:2201.05119  [pdf, other

    cs.CV cs.LG stat.ML

    Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

    Authors: Nenad Tomasev, Ioana Bica, Brian McWilliams, Lars Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

    Abstract: Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning.… ▽ More

    Submitted 3 November, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

  3. arXiv:2011.09464  [pdf, other

    cs.LG

    Counterfactual Credit Assignment in Model-Free Reinforcement Learning

    Authors: Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos

    Abstract: Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, i.e. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to… ▽ More

    Submitted 14 December, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

  4. arXiv:2011.04021  [pdf, other

    cs.AI cs.LG

    On the role of planning in model-based deep reinforcement learning

    Authors: Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber

    Abstract: Model-based planning is often thought to be necessary for deep, careful reasoning and generalization in artificial agents. While recent successes of model-based reinforcement learning (MBRL) with deep function approximation have strengthened this hypothesis, the resulting diversity of model-based methods has also made it difficult to track which components drive success and why. In this paper, we… ▽ More

    Submitted 17 March, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: Published at ICLR 2021

  5. arXiv:2010.07922  [pdf, other

    cs.LG cs.CV stat.ML

    Representation Learning via Invariant Causal Mechanisms

    Authors: Jovana Mitrovic, Brian McWilliams, Jacob Walker, Lars Buesing, Charles Blundell

    Abstract: Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representa… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  6. arXiv:2010.01298  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

    Authors: Peter Karkus, Mehdi Mirza, Arthur Guez, Andrew Jaegle, Timothy Lillicrap, Lars Buesing, Nicolas Heess, Theophane Weber

    Abstract: Intelligent robots need to achieve abstract objectives using concrete, spatiotemporally complex sensory information and motor control. Tabula rasa deep reinforcement learning (RL) has tackled demanding tasks in terms of either visual, abstract, or physical reasoning, but solving these jointly remains a formidable challenge. One recent, unsolved benchmark task that integrates these challenges is Mu… ▽ More

    Submitted 3 October, 2020; originally announced October 2020.

  7. arXiv:2009.05524  [pdf, other

    cs.AI cs.LG

    Physically Embedded Planning Problems: New Challenges for Reinforcement Learning

    Authors: Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez, Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber, Peter Karkus, Sébastien Racanière, Lars Buesing, Timothy Lillicrap, Nicolas Heess

    Abstract: Recent work in deep reinforcement learning (RL) has produced algorithms capable of mastering challenging games such as Go, chess, or shogi. In these works the RL agent directly observes the natural state of the game and controls that state directly with its actions. However, when humans play such games, they do not just reason about the moves but also interact with their physical environment. They… ▽ More

    Submitted 29 October, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: 17 pages + appendix. Updated text and references

  8. arXiv:2006.06380  [pdf, other

    stat.ML cs.DS cs.LG

    Pointer Graph Networks

    Authors: Petar Veličković, Lars Buesing, Matthew C. Overlan, Razvan Pascanu, Oriol Vinyals, Charles Blundell

    Abstract: Graph neural networks (GNNs) are typically applied to static graphs that are assumed to be known upfront. This static input structure is often informed purely by insight of the machine learning practitioner, and might not be optimal for the actual task the GNN is solving. In absence of reliable domain expertise, one might resort to inferring the latent graph structure, which is often difficult due… ▽ More

    Submitted 18 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: To appear at NeurIPS 2020 (Spotlight talk)

  9. arXiv:2004.11410  [pdf, other

    cs.LG cs.AI stat.ML

    Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning

    Authors: Giambattista Parascandolo, Lars Buesing, Josh Merel, Leonard Hasenclever, John Aslanides, Jessica B. Hamrick, Nicolas Heess, Alexander Neitz, Theophane Weber

    Abstract: Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environmen… ▽ More

    Submitted 23 April, 2020; originally announced April 2020.

  10. arXiv:2002.08329  [pdf, other

    cs.LG stat.ML

    Value-driven Hindsight Modelling

    Authors: Arthur Guez, Fabio Viola, Théophane Weber, Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess

    Abstract: Value estimation is a critical component of the reinforcement learning (RL) paradigm. The question of how to effectively learn value predictors from data is one of the major problems studied by the RL community, and different approaches exploit structure in the problem domain in different ways. Model learning can make use of the rich transition structure present in sequences of observations, but t… ▽ More

    Submitted 20 October, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: 9 pages + reference + appendix. NeurIPS 2020 version

  11. arXiv:2002.02836  [pdf, other

    cs.LG cs.AI stat.ML

    Causally Correct Partial Models for Reinforcement Learning

    Authors: Danilo J. Rezende, Ivo Danihelka, George Papamakarios, Nan Rosemary Ke, Ray Jiang, Theophane Weber, Karol Gregor, Hamza Merzic, Fabio Viola, Jane Wang, Jovana Mitrovic, Frederic Besse, Ioannis Antonoglou, Lars Buesing

    Abstract: In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent's next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this pa… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  12. arXiv:1912.02807  [pdf, other

    cs.LG stat.ML

    Combining Q-Learning and Search with Amortized Value Estimates

    Authors: Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Tobias Pfaff, Theophane Weber, Lars Buesing, Peter W. Battaglia

    Abstract: We introduce "Search with Amortized Value Estimates" (SAVE), an approach for combining model-free Q-learning with model-based Monte-Carlo Tree Search (MCTS). In SAVE, a learned prior over state-action values is used to guide MCTS, which estimates an improved set of state-action values. The new Q-estimates are then used in combination with real experience to update the prior. This effectively amort… ▽ More

    Submitted 10 January, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: Published as a conference paper at ICLR 2020

  13. arXiv:1910.06862  [pdf, other

    cs.LG cs.AI

    Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

    Authors: Lars Buesing, Nicolas Heess, Theophane Weber

    Abstract: A plethora of problems in AI, engineering and the sciences are naturally formalized as inference in discrete probabilistic models. Exact inference is often prohibitively expensive, as it may require evaluating the (unnormalized) target density on its entire domain. Here we consider the setting where only a limited budget of calls to the unnormalized density oracle is available, raising the challen… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

  14. arXiv:1901.01761  [pdf, other

    cs.LG stat.ML

    Credit Assignment Techniques in Stochastic Computation Graphs

    Authors: Théophane Weber, Nicolas Heess, Lars Buesing, David Silver

    Abstract: Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

  15. arXiv:1811.06272  [pdf, other

    cs.LG stat.ML

    Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

    Authors: Lars Buesing, Theophane Weber, Yori Zwols, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau, Nicolas Heess

    Abstract: Learning policies on data synthesized by models can in principle quench the thirst of reinforcement learning algorithms for large amounts of real experience, which is often costly to acquire. However, simulating plausible experience de novo is a hard problem for many complex environments, often resulting in biases for model-based policy evaluation and search. Instead of de novo synthesis of data,… ▽ More

    Submitted 15 November, 2018; originally announced November 2018.

  16. arXiv:1806.03107  [pdf, other

    cs.LG stat.ML

    Temporal Difference Variational Auto-Encoder

    Authors: Karol Gregor, George Papamakarios, Frederic Besse, Lars Buesing, Theophane Weber

    Abstract: To act and plan in complex environments, we posit that agents should have a mental simulator of the world with three characteristics: (a) it should build an abstract state representing the condition of the world; (b) it should form a belief which represents uncertainty on the world; (c) it should go beyond simple step-by-step simulation, and exhibit temporal abstraction. Motivated by the absence o… ▽ More

    Submitted 2 January, 2019; v1 submitted 8 June, 2018; originally announced June 2018.

  17. arXiv:1802.03006  [pdf, other

    cs.LG

    Learning and Querying Fast Generative Models for Reinforcement Learning

    Authors: Lars Buesing, Theophane Weber, Sebastien Racaniere, S. M. Ali Eslami, Danilo Rezende, David P. Reichert, Fabio Viola, Frederic Besse, Karol Gregor, Demis Hassabis, Daan Wierstra

    Abstract: A key challenge in model-based reinforcement learning (RL) is to synthesize computationally efficient and accurate environment models. We show that carefully designed generative models that learn and operate on compact state representations, so-called state-space models, substantially reduce the computational costs for predicting outcomes of sequences of actions. Extensive experiments establish th… ▽ More

    Submitted 8 February, 2018; originally announced February 2018.

  18. arXiv:1711.01846  [pdf, other

    stat.ML cs.LG q-bio.NC

    Fast amortized inference of neural activity from calcium imaging data with variational autoencoders

    Authors: Artur Speiser, Jinyao Yan, Evan Archer, Lars Buesing, Srinivas C. Turaga, Jakob H. Macke

    Abstract: Calcium imaging permits optical measurement of neural activity. Since intracellular calcium concentration is an indirect measurement of neural activity, computational tools are necessary to infer the true underlying spiking activity from fluorescence measurements. Bayesian model inversion can be used to solve this problem, but typically requires either computationally expensive MCMC sampling, or f… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

    Comments: NIPS 2017

  19. arXiv:1707.06203  [pdf, other

    cs.LG cs.AI stat.ML

    Imagination-Augmented Agents for Deep Reinforcement Learning

    Authors: Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra

    Abstract: We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in… ▽ More

    Submitted 14 February, 2018; v1 submitted 19 July, 2017; originally announced July 2017.

  20. arXiv:1707.06170  [pdf, other

    cs.AI cs.LG cs.NE stat.ML

    Learning model-based planning from scratch

    Authors: Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra, Peter Battaglia

    Abstract: Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not prescribe how to construct a plan. Here we introduce the "Imagination-based Planner", the first model-based, sequential decision-making agent that can learn to construct, evalua… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.