Skip to main content

Showing 1–50 of 80 results for author: Blunsom, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08274  [pdf, other

    cs.LG

    BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

    Authors: Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Ustun, Acyr Locatelli

    Abstract: The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  2. arXiv:2406.09347  [pdf, other

    cs.LG stat.ML

    Separations in the Representational Capabilities of Transformers and Recurrent Architectures

    Authors: Satwik Bhattamishra, Michael Hahn, Phil Blunsom, Varun Kanade

    Abstract: Transformer architectures have been widely adopted in foundation models. Due to their high inference costs, there is renewed interest in exploring the potential of efficient recurrent architectures (RNNs). In this paper, we analyze the differences in the representational capabilities of Transformers and RNNs across several tasks of practical relevance, including index lookup, nearest neighbor, rec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Preprint

  3. arXiv:2405.20850  [pdf, other

    cs.CL

    Improving Reward Models with Synthetic Critiques

    Authors: Zihuiwen Ye, Fraser Greenlee-Scott, Max Bartolo, Phil Blunsom, Jon Ander Campos, Matthias Gallé

    Abstract: Reward models (RM) play a critical role in aligning language models through the process of reinforcement learning from human feedback. RMs are trained to predict a score reflecting human preference, which requires significant time and cost for human annotation. Additionally, RMs tend to quickly overfit on superficial features in the training set, hindering their generalization performance on unsee… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  4. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2402.07827  [pdf, other

    cs.CL

    Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

    Authors: Ahmet Üstün, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D'souza, Gbemileke Onilude, Neel Bhandari, Shivalika Singh, Hui-Lee Ooi, Amr Kayid, Freddie Vargus, Phil Blunsom, Shayne Longpre, Niklas Muennighoff, Marzieh Fadaee, Julia Kreutzer, Sara Hooker

    Abstract: Recent breakthroughs in large language models (LLMs) have centered around a handful of data-rich languages. What does it take to broaden access to breakthroughs beyond first-class citizen languages? Our work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced. Aya outperforms mT0 and BLOOM… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  6. arXiv:2310.03016  [pdf, other

    cs.LG cs.CL

    Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

    Authors: Satwik Bhattamishra, Arkil Patel, Phil Blunsom, Varun Kanade

    Abstract: In order to understand the in-context learning phenomenon, recent works have adopted a stylized experimental framework and demonstrated that Transformers can learn gradient-based learning algorithms for various classes of real-valued functions. However, the limitations of Transformers in implementing learning algorithms, and their ability to learn other forms of algorithms are not well understood.… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Preprint

  7. arXiv:2309.16349  [pdf, other

    cs.CL

    Human Feedback is not Gold Standard

    Authors: Tom Hosking, Phil Blunsom, Max Bartolo

    Abstract: Human feedback has become the de facto standard for evaluating the performance of Large Language Models, and is increasingly being used as a training objective. However, it is not clear which properties of a generated output this single `preference' score captures. We hypothesise that preference scores are subjective and open to undesirable biases. We critically analyse the use of human feedback f… ▽ More

    Submitted 16 January, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted at ICLR 2024

  8. arXiv:2307.16795  [pdf, other

    cs.CL cs.AI cs.LG

    Structural Transfer Learning in NL-to-Bash Semantic Parsers

    Authors: Kyle Duffy, Satwik Bhattamishra, Phil Blunsom

    Abstract: Large-scale pre-training has made progress in many fields of natural language processing, though little is understood about the design of pre-training datasets. We propose a methodology for obtaining a quantitative understanding of structural overlap between machine translation tasks. We apply our methodology to the natural language to Bash semantic parsing task (NLBash) and show that it is largel… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  9. arXiv:2306.02870  [pdf, ps, other

    cs.CL

    On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research

    Authors: Made Nindyatama Nityasya, Haryo Akbarianto Wibowo, Alham Fikri Aji, Genta Indra Winata, Radityo Eko Prasojo, Phil Blunsom, Adhiguna Kuncoro

    Abstract: This evidence-based position paper critiques current research practices within the language model pre-training literature. Despite rapid recent progress afforded by increasingly better pre-trained language models (PLMs), current PLM research practices often conflate different possible sources of model improvement, without conducting proper ablation studies and principled comparisons between differ… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted at ACL 2023

  10. arXiv:2305.19268  [pdf, other

    cs.LG cs.AI

    Intriguing Properties of Quantization at Scale

    Authors: Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker

    Abstract: Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 32 pages, 14 figures

  11. arXiv:2211.12316  [pdf, other

    cs.LG cs.CL

    Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

    Authors: Satwik Bhattamishra, Arkil Patel, Varun Kanade, Phil Blunsom

    Abstract: Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Bo… ▽ More

    Submitted 10 July, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: ACL 2023

  12. arXiv:2210.12096  [pdf, other

    cs.CL

    Augmenting Multi-Turn Text-to-SQL Datasets with Self-Play

    Authors: Qi Liu, Zihuiwen Ye, Tao Yu, Phil Blunsom, Linfeng Song

    Abstract: The task of context-dependent text-to-SQL aims to convert multi-turn user utterances to formal SQL queries. This is a challenging task due to both the scarcity of training data from which to learn complex contextual dependencies and to generalize to unseen databases. In this paper we explore augmenting the training datasets using self-play, which leverages contextual information to synthesize new… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  13. arXiv:2205.12191  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization

    Authors: Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh

    Abstract: Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA). The quality of such models is commonly assessed by measuring their performance on unseen data that typically comes from the same distribution as the training data. However, when evaluated under out-of-distribu… ▽ More

    Submitted 1 April, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Findings of EACL 2023. Aishwarya, Ivana, Emanuele and Aida had equal first author contributions. Elnaz and Anita had equal contributions. Aida and Aishwarya had equal senior contributions

  14. arXiv:2205.11388  [pdf, other

    cs.CL cs.LG

    StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models

    Authors: Adam Liška, Tomáš Kočiský, Elena Gribovskaya, Tayfun Terzi, Eren Sezener, Devang Agrawal, Cyprien de Masson d'Autume, Tim Scholtes, Manzil Zaheer, Susannah Young, Ellen Gilsenan-McMahon, Sophia Austin, Phil Blunsom, Angeliki Lazaridou

    Abstract: Knowledge and language understanding of models evaluated through question answering (QA) has been usually studied on static snapshots of knowledge, like Wikipedia. However, our world is dynamic, evolves over time, and our models' knowledge becomes outdated. To study how semi-parametric QA models and their underlying parametric language models (LMs) adapt to evolving knowledge, we construct a new l… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  15. arXiv:2203.07402  [pdf, other

    cs.CL

    Revisiting the Compositional Generalization Abilities of Neural Sequence Models

    Authors: Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal

    Abstract: Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences. Recent works have claimed that standard seq-to-seq models severely lack the ability to compositionally generalize. In this paper, we focus on one-shot primitive generalization as introduced by the popular SCAN benchmark. We demonstrate that modifying the trainin… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022

  16. Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

    Authors: Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Miloš Stanojević, Phil Blunsom, Chris Dyer

    Abstract: We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are implemented through a special attention mask and deterministic transformation of the linearized tree. We find that TGs outperform various strong baselines on sentenc… ▽ More

    Submitted 6 December, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: 17 pages, 5 figures, 2 tables and 1 algorithm. To appear in TACL, to be presented at EMNLP 2022

  17. arXiv:2201.09680  [pdf, other

    cs.CL cs.AI

    Relational Memory Augmented Language Models

    Authors: Qi Liu, Dani Yogatama, Phil Blunsom

    Abstract: We present a memory-augmented approach to condition an autoregressive language model on a knowledge graph. We represent the graph as a collection of relation triples and retrieve relevant relations for a given context to improve text generation. Experiments on WikiText-103, WMT19, and enwik8 English datasets demonstrate that our approach produces a better language model in terms of perplexity and… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: Accepted to TACL, pre MIT Press publication version

  18. arXiv:2111.00607  [pdf, other

    cs.CL

    A Systematic Investigation of Commonsense Knowledge in Large Language Models

    Authors: Xiang Lorraine Li, Adhiguna Kuncoro, Jordan Hoffmann, Cyprien de Masson d'Autume, Phil Blunsom, Aida Nematzadeh

    Abstract: Language models (LMs) trained on large amounts of data have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup. Here we aim to better understand the extent to which such models learn commonsense knowledge -- a critical component of many NLP applications. We conduct a systematic and rigorous zero-shot and few-shot commonsense evaluation of large pre-trained LMs, w… ▽ More

    Submitted 31 October, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: Accepted to EMNLP 2022

  19. arXiv:2103.10518  [pdf, other

    cs.CL cs.AI cs.LG

    Pretraining the Noisy Channel Model for Task-Oriented Dialogue

    Authors: Qi Liu, Lei Yu, Laura Rimell, Phil Blunsom

    Abstract: Direct decoding for task-oriented dialogue is known to suffer from the explaining-away effect, manifested in models that prefer short and generic responses. Here we argue for the use of Bayes' theorem to factorize the dialogue task into two models, the distribution of the context given the response, and the prior for the response itself. This approach, an instantiation of the noisy channel model,… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

    Comments: Accepted to TACL, pre MIT Press publication version

  20. arXiv:2102.01951  [pdf, other

    cs.CL cs.AI

    Mind the Gap: Assessing Temporal Generalization in Neural Language Models

    Authors: Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom

    Abstract: Our world is open-ended, non-stationary, and constantly evolving; thus what we talk about and how we talk about it change over time. This inherent dynamic nature of language contrasts with the current static language modelling paradigm, which trains and evaluates models on utterances from overlapping time periods. Despite impressive recent progress, we demonstrate that Transformer-XL language mode… ▽ More

    Submitted 26 October, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: To appear as a Spotlight at NeurIPS 2021

  21. arXiv:2012.00708  [pdf, other

    stat.ML cs.CL cs.LG

    Mutual Information Constraints for Monte-Carlo Objectives

    Authors: Gábor Melis, András György, Phil Blunsom

    Abstract: A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless. Two contributing factors, the underspecification of the model and the looseness of the variational lower bound, have been studied separately in the literature. We weave these two strands of research together, specifically the… ▽ More

    Submitted 9 May, 2022; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: 32 pages, 29 figures

  22. arXiv:2009.11023  [pdf, ps, other

    cs.CL

    The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets

    Authors: Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom

    Abstract: For neural models to garner widespread public trust and ensure fairness, we must have human-intelligible explanations for their predictions. Recently, an increasing number of works focus on explaining the predictions of neural models in terms of the relevance of the input features. In this work, we show that feature-based explanations pose problems even for explaining trivial models. We show that,… ▽ More

    Submitted 14 December, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

    Journal ref: Explainable Agency in Artificial Intelligence Workshop at AAAI 2021

  23. arXiv:2005.13482  [pdf, other

    cs.CL

    Syntactic Structure Distillation Pretraining For Bidirectional Encoders

    Authors: Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

    Abstract: Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they s… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 17 pages, 6 tables, 2 figures. AK and LK contributed equally

  24. arXiv:2005.03684  [pdf, other

    cs.CL cs.CV

    Learning to Segment Actions from Observation and Narration

    Authors: Daniel Fried, Jean-Baptiste Alayrac, Phil Blunsom, Chris Dyer, Stephen Clark, Aida Nematzadeh

    Abstract: We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision u… ▽ More

    Submitted 11 August, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  25. arXiv:2003.07278  [pdf, ps, other

    cs.CL cs.AI cs.LG

    A Survey on Contextual Embeddings

    Authors: Qi Liu, Matt J. Kusner, Phil Blunsom

    Abstract: Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a representation based on its context, thereby capturing uses of words across varied contexts and encoding knowledge that transfers across languages. In this survey, w… ▽ More

    Submitted 13 April, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

    Comments: 13 pages

  26. arXiv:2003.05078  [pdf, other

    cs.CV cs.CL cs.LG

    Visual Grounding in Video for Unsupervised Word Translation

    Authors: Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh, Lucas Smaira, Mateusz Malinowski, João Carreira, Phil Blunsom, Andrew Zisserman

    Abstract: There are thousands of actively spoken languages on Earth, but a single visual world. Grounding in this visual world has the potential to bridge the gap between all these languages. Our goal is to use visual grounding to improve unsupervised word mapping between languages. The key idea is to establish a common visual representation between two languages by learning embeddings from unpaired instruc… ▽ More

    Submitted 26 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

    Journal ref: CVPR 2020

  27. arXiv:2001.11128  [pdf, other

    cs.CL cs.LG eess.AS

    Learning Robust and Multilingual Speech Representations

    Authors: Kazuya Kawakami, Luyu Wang, Chris Dyer, Phil Blunsom, Aaron van den Oord

    Abstract: Unsupervised speech representation learning has shown remarkable success at finding representations that correlate with phonetic structures and improve downstream speech recognition performance. However, most research has been focused on evaluating the representations in terms of their ability to improve the performance of speech recognition systems on read English (e.g. Wall Street Journal and Li… ▽ More

    Submitted 29 January, 2020; originally announced January 2020.

  28. arXiv:1910.03065  [pdf, ps, other

    cs.CL cs.AI

    Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations

    Authors: Oana-Maria Camburu, Brendan Shillingford, Pasquale Minervini, Thomas Lukasiewicz, Phil Blunsom

    Abstract: To increase trust in artificial intelligence systems, a promising research direction consists of designing neural models capable of generating natural language explanations for their predictions. In this work, we show that such models are nonetheless prone to generating mutually inconsistent explanations, such as "Because there is a dog in the image" and "Because there is no dog in the [same] imag… ▽ More

    Submitted 2 May, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

    Journal ref: Short Paper at ACL, 2020

  29. arXiv:1910.02065  [pdf, other

    cs.CL cs.LG

    Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

    Authors: Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom

    Abstract: For AI systems to garner widespread public acceptance, we must develop methods capable of explaining the decisions of black-box models such as neural networks. In this work, we identify two issues of current explanatory methods. First, we show that two prevalent perspectives on explanations --- feature-additivity and feature-selection --- lead to fundamentally different instance-wise explanations.… ▽ More

    Submitted 5 December, 2019; v1 submitted 4 October, 2019; originally announced October 2019.

    Journal ref: NeurIPS 2019 Workshop on Safety and Robustness in Decision Making, Vancouver, Canada

  30. arXiv:1910.00553  [pdf, other

    cs.CL cs.LG

    Better Document-Level Machine Translation with Bayes' Rule

    Authors: Lei Yu, Laurent Sartran, Wojciech Stokowiec, Wang Ling, Lingpeng Kong, Phil Blunsom, Chris Dyer

    Abstract: We show that Bayes' rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents---a compelling benefit as parallel documents are not always available. In our formulation, the posterior probability of a candidate translation is the product of the unconditional (prior) probability of the candidate output doc… ▽ More

    Submitted 2 July, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted by TACL

  31. arXiv:1909.09428  [pdf, other

    cs.CL cs.LG

    A Critical Analysis of Biased Parsers in Unsupervised Parsing

    Authors: Chris Dyer, Gábor Melis, Phil Blunsom

    Abstract: A series of recent papers has used a parsing algorithm due to Shen et al. (2018) to recover phrase-structure trees based on proxies for "syntactic depth." These proxy depths are obtained from the representations learned by recurrent language models augmented with mechanisms that encourage the (unsupervised) discovery of hierarchical structure latent in natural language sentences. Using the same pa… ▽ More

    Submitted 20 September, 2019; originally announced September 2019.

  32. arXiv:1909.01792  [pdf, other

    cs.CL

    Mogrifier LSTM

    Authors: Gábor Melis, Tomáš Kočiský, Phil Blunsom

    Abstract: Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mut… ▽ More

    Submitted 29 January, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

  33. arXiv:1908.08025  [pdf, other

    cs.CL

    WikiCREM: A Large Unsupervised Corpus for Coreference Resolution

    Authors: Vid Kocijan, Oana-Maria Camburu, Ana-Maria Cretu, Yordan Yordanov, Phil Blunsom, Thomas Lukasiewicz

    Abstract: Pronoun resolution is a major area of natural language understanding. However, large-scale training sets are still scarce, since manually labelling data is costly. In this work, we introduce WikiCREM (Wikipedia CoREferences Masked) a large-scale, yet accurate dataset of pronoun disambiguation instances. We use a language-model-based approach for pronoun resolution in combination with our WikiCREM… ▽ More

    Submitted 13 October, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: Accepted to the EMNLP 2019 conference

    Journal ref: IJCNLP-EMNLP 2019

  34. arXiv:1906.06438  [pdf, other

    cs.CL cs.LG

    Scalable Syntax-Aware Language Models Using Knowledge Distillation

    Authors: Adhiguna Kuncoro, Chris Dyer, Laura Rimell, Stephen Clark, Phil Blunsom

    Abstract: Prior work has shown that, on small amounts of training data, syntactic neural language models learn structurally sensitive generalisations more successfully than sequential language models. However, their computational complexity renders scaling difficult, and it remains an open question whether structural biases are still necessary when sequential models have access to ever larger amounts of tra… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: ACL 2019

  35. arXiv:1901.11373  [pdf, other

    cs.LG cs.CL stat.ML

    Learning and Evaluating General Linguistic Intelligence

    Authors: Dani Yogatama, Cyprien de Masson d'Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom

    Abstract: We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of ex… ▽ More

    Submitted 31 January, 2019; originally announced January 2019.

  36. arXiv:1812.01193  [pdf, ps, other

    cs.CL

    e-SNLI: Natural Language Inference with Natural Language Explanations

    Authors: Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom

    Abstract: In order for machine learning to garner widespread public adoption, models must be able to provide interpretable and robust explanations for their decisions, as well as learn from human-provided explanations at train time. In this work, we extend the Stanford Natural Language Inference dataset with an additional layer of human-annotated natural language explanations of the entailment relations. We… ▽ More

    Submitted 6 December, 2018; v1 submitted 3 December, 2018; originally announced December 2018.

    Comments: NeurIPS 2018

  37. arXiv:1811.10756  [pdf, other

    cs.RO

    Learning with Stochastic Guidance for Navigation

    Authors: Linhai Xie, Yishu Miao, Sen Wang, Phil Blunsom, Zhihua Wang, Changhao Chen, Andrew Markham, Niki Trigoni

    Abstract: Due to the sparse rewards and high degree of environment variation, reinforcement learning approaches such as Deep Deterministic Policy Gradient (DDPG) are plagued by issues of high variance when applied in complex real world environments. We present a new framework for overcoming these issues by incorporating a stochastic switch, allowing an agent to choose between high and low variance policies.… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: A short version is accepted by the NIPS 2018 workshop: Infer2Control

  38. arXiv:1811.10475  [pdf, other

    cs.CL cs.AI cs.LG

    Sentence Encoding with Tree-constrained Relation Networks

    Authors: Lei Yu, Cyprien de Masson d'Autume, Chris Dyer, Phil Blunsom, Lingpeng Kong, Wang Ling

    Abstract: The meaning of a sentence is a function of the relations that hold between its words. We instantiate this relational view of semantics in a series of neural models based on variants of relation networks (RNs) which represent a set of objects (for us, words forming a sentence) in terms of representations of pairs of objects. We propose two extensions to the basic RN model for natural language. Firs… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

    Comments: 12 pages

  39. arXiv:1811.09353  [pdf, other

    cs.CL

    Learning to Discover, Ground and Use Words with Segmental Neural Language Models

    Authors: Kazuya Kawakami, Chris Dyer, Phil Blunsom

    Abstract: We propose a segmental neural language model that combines the generalization power of neural networks with the ability to discover word-like units that are latent in unsegmented character sequences. In contrast to previous segmentation models that treat word segmentation as an isolated task, our model unifies word discovery, learning how words fit together to form sentences, and, by conditioning… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 November, 2018; originally announced November 2018.

  40. arXiv:1810.02076  [pdf, other

    cs.LG cs.CV cs.RO stat.ML

    Transferring Physical Motion Between Domains for Neural Inertial Tracking

    Authors: Changhao Chen, Yishu Miao, Chris Xiaoxuan Lu, Phil Blunsom, Andrew Markham, Niki Trigoni

    Abstract: Inertial information processing plays a pivotal role in ego-motion awareness for mobile agents, as inertial measurements are entirely egocentric and not environment dependent. However, they are affected greatly by changes in sensor placement/orientation or motion dynamics, and it is infeasible to collect labelled data from every domain. To overcome the challenges of domain adaptation on long senso… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.

    Comments: NIPS 2018 workshop on Modeling the Physical World: Perception, Learning, and Control. A complete version will be released soon

  41. arXiv:1808.00508  [pdf, other

    cs.NE

    Neural Arithmetic Logic Units

    Authors: Andrew Trask, Felix Hill, Scott Reed, Jack Rae, Chris Dyer, Phil Blunsom

    Abstract: Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

  42. arXiv:1807.01670  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Encoding Spatial Relations from Natural Language

    Authors: Tiago Ramalho, Tomáš Kočiský, Frederic Besse, S. M. Ali Eslami, Gábor Melis, Fabio Viola, Phil Blunsom, Karl Moritz Hermann

    Abstract: Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes.… ▽ More

    Submitted 5 July, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

  43. arXiv:1805.09208  [pdf, other

    stat.ML cs.CL cs.LG

    Pushing the bounds of dropout

    Authors: Gábor Melis, Charles Blundell, Tomáš Kočiský, Karl Moritz Hermann, Chris Dyer, Phil Blunsom

    Abstract: We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that… ▽ More

    Submitted 27 September, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

  44. arXiv:1712.07040  [pdf, other

    cs.CL cs.AI cs.NE

    The NarrativeQA Reading Comprehension Challenge

    Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

    Abstract: Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecti… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  45. arXiv:1710.09867  [pdf, other

    cs.CL cs.AI cs.NE

    Understanding Early Word Learning in Situated Artificial Agents

    Authors: Felix Hill, Stephen Clark, Karl Moritz Hermann, Phil Blunsom

    Abstract: Neural network-based systems can now learn to locate the referents of words and phrases in images, answer questions about visual scenes, and execute symbolic instructions as first-person actors in partially-observable worlds. To achieve this so-called grounded language learning, models must overcome challenges that infants face when learning their first words. While it is notable that models with… ▽ More

    Submitted 1 October, 2019; v1 submitted 26 October, 2017; originally announced October 2017.

  46. arXiv:1707.05589  [pdf, other

    cs.CL

    On the State of the Art of Evaluation in Neural Language Models

    Authors: Gábor Melis, Chris Dyer, Phil Blunsom

    Abstract: Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods w… ▽ More

    Submitted 20 November, 2017; v1 submitted 18 July, 2017; originally announced July 2017.

  47. arXiv:1706.06551  [pdf, other

    cs.CL cs.LG stat.ML

    Grounded Language Learning in a Simulated 3D World

    Authors: Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, Phil Blunsom

    Abstract: We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to… ▽ More

    Submitted 26 June, 2017; v1 submitted 20 June, 2017; originally announced June 2017.

    Comments: 16 pages, 8 figures

  48. arXiv:1706.00359  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Discovering Discrete Latent Topics with Neural Variational Inference

    Authors: Yishu Miao, Edward Grefenstette, Phil Blunsom

    Abstract: Topic models have been widely explored as probabilistic generative models of documents. Traditional inference methods have sought closed-form derivations for updating the models, however as the expressiveness of these models grows, so does the difficulty of performing fast and accurate inference over their parameters. This paper presents alternative neural approaches to topic modelling by providin… ▽ More

    Submitted 21 May, 2018; v1 submitted 1 June, 2017; originally announced June 2017.

    Comments: ICML 2017

  49. arXiv:1705.10229  [pdf, other

    cs.CL cs.LG cs.NE stat.ML

    Latent Intention Dialogue Models

    Authors: Tsung-Hsien Wen, Yishu Miao, Phil Blunsom, Steve Young

    Abstract: Developing a dialogue agent that is capable of making autonomous decisions and communicating by natural language is one of the long-term goals of machine learning research. Traditional approaches either rely on hand-crafting a small state-action set for applying reinforcement learning that is not scalable or constructing deterministic models for learning dialogue sentences that fail to capture nat… ▽ More

    Submitted 29 May, 2017; originally announced May 2017.

    Comments: Accepted at ICML 2017

  50. arXiv:1705.04146  [pdf, other

    cs.AI cs.CL cs.LG

    Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems

    Authors: Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom

    Abstract: Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mat… ▽ More

    Submitted 23 October, 2017; v1 submitted 11 May, 2017; originally announced May 2017.