Skip to main content

Showing 1–10 of 10 results for author: Jenner, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.00877  [pdf, other

    cs.LG cs.AI

    Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

    Authors: Erik Jenner, Shreyas Kapur, Vasil Georgiev, Cameron Allen, Scott Emmons, Stuart Russell

    Abstract: Do neural networks learn to implement algorithms such as look-ahead or search "in the wild"? Or do they rely purely on collections of simple heuristics? We present evidence of learned look-ahead in the policy network of Leela Chess Zero, the currently strongest neural chess engine. We find that Leela internally represents future optimal moves and that these representations are crucial for its fina… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project page: https://leela-interp.github.io/

  2. arXiv:2405.20519  [pdf, other

    cs.AI

    Diffusion On Syntax Trees For Program Synthesis

    Authors: Shreyas Kapur, Erik Jenner, Stuart Russell

    Abstract: Large language models generate code one token at a time. Their autoregressive generation process lacks the feedback of observing the program's output. Training LLMs to suggest edits directly can be challenging due to the scarcity of rich edit data. To address these problems, we propose neural diffusion models that operate on syntax trees of any context-free grammar. Similar to image diffusion mode… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: https://tree-diffusion.github.io

  3. arXiv:2404.09932  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Foundational Challenges in Assuring Alignment and Safety of Large Language Models

    Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (13 additional authors not shown)

    Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

    Submitted 15 April, 2024; originally announced April 2024.

  4. arXiv:2402.17747  [pdf, other

    cs.LG cs.AI stat.ML

    When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

    Authors: Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

    Abstract: Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is… ▽ More

    Submitted 8 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  5. arXiv:2309.15257  [pdf, other

    cs.LG cs.AI

    STARC: A General Framework For Quantifying Differences Between Reward Functions

    Authors: Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

    Abstract: In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivises undesirable behaviour. As a result, it is increasingly popular to use \emph{reward learning algorithms}, which attempt to \emph{learn} a reward fun… ▽ More

    Submitted 11 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

  6. arXiv:2211.11972  [pdf, other

    cs.LG cs.AI

    imitation: Clean Imitation Learning Implementations

    Authors: Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

    Abstract: imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL) algorithms, three imitation learning algorithms and a preference comparison algorithm. The implementations have been benchmarked against previous results, and automated tests cover 98% of the code. Moreover, the algorithms are implemented in a… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  7. arXiv:2208.09570  [pdf, ps, other

    cs.LG

    Calculus on MDPs: Potential Shaping as a Gradient

    Authors: Erik Jenner, Herke van Hoof, Adam Gleave

    Abstract: In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce. A particularly well-known and important example is potential shaping, a class of functions that can be added to any reward function without changing the optimal policy set under arbitrary transition dynamics. Potential shaping is conceptually similar to potentials, conservative vec… ▽ More

    Submitted 2 December, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: Fixed mistake in proof that affected several results

  8. arXiv:2203.13553  [pdf, other

    cs.LG

    Preprocessing Reward Functions for Interpretability

    Authors: Erik Jenner, Adam Gleave

    Abstract: In many real-world applications, the reward function is too complex to be manually specified. In such cases, reward functions must instead be learned from human feedback. Since the learned reward may fail to represent user preferences, it is important to be able to validate the learned reward function prior to deployment. One promising approach is to apply interpretability tools to the reward func… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

    Comments: Presented at the NeurIPS 2021 Cooperative AI workshop. Code available at https://github.com/HumanCompatibleAI/reward-preprocessing

  9. arXiv:2110.02750  [pdf, other

    cs.DS cs.CV cs.LG stat.ML

    Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice

    Authors: Erik Jenner, Enrique Fita Sanmartín, Fred A. Hamprecht

    Abstract: The minimum graph cut and minimum $s$-$t$-cut problems are important primitives in the modeling of combinatorial problems in computer science, including in computer vision and machine learning. Some of the most efficient algorithms for finding global minimum cuts are randomized algorithms based on Karger's groundbreaking contraction algorithm. Here, we study whether Karger's algorithm can be succe… ▽ More

    Submitted 16 December, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: Oral at ICCV 2021; added acknowledgements

  10. arXiv:2106.10163  [pdf, other

    cs.LG cs.CV

    Steerable Partial Differential Operators for Equivariant Neural Networks

    Authors: Erik Jenner, Maurice Weiler

    Abstract: Recent work in equivariant deep learning bears strong similarities to physics. Fields over a base space are fundamental entities in both subjects, as are equivariant maps between these fields. In deep learning, however, these maps are usually defined by convolutions with a kernel, whereas they are partial differential operators (PDOs) in physics. Developing the theory of equivariant PDOs in the co… ▽ More

    Submitted 23 April, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Published at ICLR 2022, code available at https://github.com/ejnnr/steerable_pdos