Skip to main content

Showing 1–8 of 8 results for author: Lewis, R L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.15940  [pdf, other

    cs.AI cs.LG

    Combining Behaviors with the Successor Features Keyboard

    Authors: Wilka Carvalho, Andre Saraiva, Angelos Filos, Andrew Kyle Lampinen, Loic Matthey, Richard L. Lewis, Honglak Lee, Satinder Singh, Danilo J. Rezende, Daniel Zoran

    Abstract: The Option Keyboard (OK) was recently proposed as a method for transferring behavioral knowledge across tasks. OK transfers knowledge by adaptively combining subsets of known behaviors using Successor Features (SFs) and Generalized Policy Improvement (GPI). However, it relies on hand-designed state-features and task encodings which are cumbersome to design for every new environment. In this work,… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  2. arXiv:2305.17626  [pdf, other

    cs.AI cs.CL cs.LG

    In-Context Analogical Reasoning with Pre-Trained Language Models

    Authors: Xiaoyang Hu, Shane Storks, Richard L. Lewis, Joyce Chai

    Abstract: Analogical reasoning is a fundamental capacity of human cognition that allows us to reason abstractly about novel situations by relating them to past experiences. While it is thought to be essential for robust reasoning in AI systems, conventional approaches require significant training and/or hard-coding of domain knowledge to be applied to benchmark tasks. Inspired by cognitive science research… ▽ More

    Submitted 5 June, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

  3. arXiv:2301.12305  [pdf, other

    cs.LG cs.AI

    Composing Task Knowledge with Modular Successor Feature Approximators

    Authors: Wilka Carvalho, Angelos Filos, Richard L. Lewis, Honglak lee, Satinder Singh

    Abstract: Recently, the Successor Features and Generalized Policy Improvement (SF&GPI) framework has been proposed as a method for learning, composing, and transferring predictive knowledge and behavior. SF&GPI works by having an agent learn predictive representations (SFs) that can be combined for transfer to new tasks with GPI. However, to be effective this approach requires state features that are useful… ▽ More

    Submitted 25 August, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

    Comments: Accepted to ICLR 2023

  4. arXiv:2210.03821  [pdf, other

    cs.LG

    Large Language Models can Implement Policy Iteration

    Authors: Ethan Brooks, Logan Walls, Richard L. Lewis, Satinder Singh

    Abstract: This work presents In-Context Policy Iteration, an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models. While the application of foundation models to RL has received considerable attention, most approaches rely on either (1) the curation of expert demonstrations (either through manual design or task-specific pretraining) or (2) adaptation to the task of intere… ▽ More

    Submitted 13 August, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: 10 pages, 4 figures, submitted to ICLR 2023

  5. arXiv:2104.12874  [pdf, other

    cs.CL

    Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention

    Authors: Soo Hyun Ryu, Richard L. Lewis

    Abstract: We advance a novel explanation of similarity-based interference effects in subject-verb and reflexive pronoun agreement processing, grounded in surprisal values computed from a pretrained large-scale Transformer model, GPT-2. Specifically, we show that surprisal of the verb or reflexive pronoun predicts facilitatory interference effects in ungrammatical sentences, where a distractor noun that matc… ▽ More

    Submitted 26 April, 2021; originally announced April 2021.

    Comments: CMCL 2021

  6. arXiv:2102.13195  [pdf, other

    cs.LG

    Reinforcement Learning of Implicit and Explicit Control Flow in Instructions

    Authors: Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh

    Abstract: Learning to flexibly follow task instructions in dynamic environments poses interesting challenges for reinforcement learning agents. We focus here on the problem of learning control flow that deviates from a strict step-by-step execution of instructions -- that is, control flow that may skip forward over parts of the instructions or return backward to previously completed or skipped steps. Demand… ▽ More

    Submitted 29 June, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

  7. Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment

    Authors: Wilka Carvalho, Anthony Liang, Kimin Lee, Sungryull Sohn, Honglak Lee, Richard L. Lewis, Satinder Singh

    Abstract: First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor virtual home-environment pose significant sample-efficiency challenges for reinforcement learning (RL) agents learning from sparse task rewards. To alleviate these challenges, prior work has provided extensive supervision via a combination of reward-shaping, ground-truth object-information, and e… ▽ More

    Submitted 20 May, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: Accepted to IJCAI 2021

    Journal ref: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI 2021)

  8. arXiv:1203.3518  [pdf

    cs.LG cs.AI stat.ML

    Variance-Based Rewards for Approximate Bayesian Reinforcement Learning

    Authors: Jonathan Sorg, Satinder Singh, Richard L. Lewis

    Abstract: The explore{exploit dilemma is one of the central challenges in Reinforcement Learning (RL). Bayesian RL solves the dilemma by providing the agent with information in the form of a prior distribution over environments; however, full Bayesian planning is intractable. Planning with the mean MDP is a common myopic approximation of Bayesian planning. We derive a novel reward bonus that is a function o… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-564-571