Skip to main content

Showing 1–41 of 41 results for author: Kaiser, Ł

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05713  [pdf, other

    cs.LG

    tsGT: Stochastic Time Series Modeling With Transformer

    Authors: Łukasz Kuciński, Witold Drzewakowski, Mateusz Olko, Piotr Kozakowski, Łukasz Maziarka, Marta Emilia Nowakowska, Łukasz Kaiser, Piotr Miłoś

    Abstract: Time series methods are of fundamental importance in virtually any field of science that deals with temporally structured data. Recently, there has been a surge of deterministic transformer models with time series-specific architectural biases. In this paper, we go in a different direction by introducing tsGT, a stochastic time series model built on a general-purpose transformer architecture. We f… ▽ More

    Submitted 3 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2402.02304  [pdf, other

    math.AP cs.LG

    Efficient Numerical Wave Propagation Enhanced By An End-to-End Deep Learning Model

    Authors: Luis Kaiser, Richard Tsai, Christian Klingenberg

    Abstract: Recent advances in wave modeling use sufficiently accurate fine solver outputs to train a neural network that enhances the accuracy of a fast but inaccurate coarse solver. In this paper we build upon the work of Nguyen and Tsai (2023) and present a novel unified system that integrates a numerical solver with a deep learning component into an end-to-end framework. In the proposed setting, we invest… ▽ More

    Submitted 13 February, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  3. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  4. arXiv:2111.12763  [pdf, other

    cs.LG cs.CL

    Sparse is Enough in Scaling Transformers

    Authors: Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva

    Abstract: Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to sca… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021

  5. arXiv:2111.03728  [pdf

    cs.AI

    Shared Model of Sense-making for Human-Machine Collaboration

    Authors: Gheorghe Tecuci, Dorin Marcu, Louis Kaiser, Mihai Boicu

    Abstract: We present a model of sense-making that greatly facilitates the collaboration between an intelligent analyst and a knowledge-based agent. It is a general model grounded in the science of evidence and the scientific method of hypothesis generation and testing, where sense-making hypotheses that explain an observation are generated, relevant evidence is then discovered, and the hypotheses are tested… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: Presented at AAAI FSS-21: Artificial Intelligence in Government and Public Sector, Washington, DC, USA

  6. arXiv:2110.14168  [pdf, other

    cs.LG cs.CL

    Training Verifiers to Solve Math Word Problems

    Authors: Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

    Abstract: State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high tes… ▽ More

    Submitted 17 November, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

  7. arXiv:2110.13711  [pdf, other

    cs.LG cs.CL

    Hierarchical Transformers Are More Efficient Language Models

    Authors: Piotr Nawrot, Szymon Tworkowski, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski

    Abstract: Transformer models yield impressive results on many NLP and sequence modeling tasks. Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs: full paragraphs produced by GPT-3 or well-structured images produced by DALL-E. These large language models are impressive but also very inefficient and costly, which limits their applications and accessibility.… ▽ More

    Submitted 16 April, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  8. arXiv:2107.03374  [pdf, other

    cs.LG

    Evaluating Large Language Models Trained on Code

    Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

    Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More

    Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: corrected typos, added references, added authors, added acknowledgements

  9. arXiv:2102.06782  [pdf, other

    cs.LG

    Q-Value Weighted Regression: Reinforcement Learning with Limited Data

    Authors: Piotr Kozakowski, Łukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, Katarzyna Kańska

    Abstract: Sample efficiency and performance in the offline setting have emerged as significant challenges of deep reinforcement learning. We introduce Q-Value Weighted Regression (QWR), a simple RL algorithm that excels in these aspects. QWR is an extension of Advantage Weighted Regression (AWR), an off-policy actor-critic algorithm that performs very well on continuous control tasks, also in the offline se… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  10. arXiv:2009.14794  [pdf, other

    cs.LG cs.CL stat.ML

    Rethinking Attention with Performers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random featu… ▽ More

    Submitted 19 November, 2022; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Published as a conference paper + oral presentation at ICLR 2021. 38 pages. See https://github.com/google-research/google-research/tree/master/protein_lm for protein language model code, and https://github.com/google-research/google-research/tree/master/performer for Performer code. See https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html for Google AI Blog

  11. arXiv:2001.04451  [pdf, other

    cs.LG cs.CL stat.ML

    Reformer: The Efficient Transformer

    Authors: Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya

    Abstract: Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is… ▽ More

    Submitted 18 February, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: ICLR 2020

  12. arXiv:1906.04331  [pdf, other

    cs.CL cs.LG

    Parallel Scheduled Sampling

    Authors: Daniel Duckworth, Arvind Neelakantan, Ben Goodrich, Lukasz Kaiser, Samy Bengio

    Abstract: Auto-regressive models are widely used in sequence generation problems. The output sequence is typically generated in a predetermined order, one discrete unit (pixel or word or character) at a time. The models are trained by teacher-forcing where ground-truth history is fed to the model as input, which at test time is replaced by the model prediction. Scheduled Sampling aims to mitigate this discr… ▽ More

    Submitted 21 October, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: 2nd submission

  13. arXiv:1905.08836  [pdf, other

    cs.CL

    Sample Efficient Text Summarization Using a Single Pre-Trained Transformer

    Authors: Urvashi Khandelwal, Kevin Clark, Dan Jurafsky, Lukasz Kaiser

    Abstract: Language model (LM) pre-training has resulted in impressive performance and sample efficiency on a variety of language understanding tasks. However, it remains unclear how to best use pre-trained LMs for generation tasks such as abstractive summarization, particularly to enhance sample efficiency. In these sequence-to-sequence settings, prior work has experimented with loading pre-trained weights… ▽ More

    Submitted 21 May, 2019; originally announced May 2019.

  14. arXiv:1903.00374  [pdf, other

    cs.LG stat.ML

    Model-Based Reinforcement Learning for Atari

    Authors: Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

    Abstract: Model-free reinforcement learning (RL) can be used to learn effective policies for complex tasks, such as Atari games, even from image observations. However, this typically requires very large amounts of interaction -- substantially more, in fact, than a human would need to learn the same games. How can people learn so quickly? Part of the answer may be that people can learn how the game works and… ▽ More

    Submitted 3 April, 2024; v1 submitted 1 March, 2019; originally announced March 2019.

  15. arXiv:1810.10126  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Area Attention

    Authors: Yang Li, Lukasz Kaiser, Samy Bengio, Si Si

    Abstract: Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such… ▽ More

    Submitted 7 May, 2020; v1 submitted 23 October, 2018; originally announced October 2018.

    Comments: @InProceedings{pmlr-v97-li19e, title = {Area Attention}, author = {Li, Yang and Kaiser, Lukasz and Bengio, Samy and Si, Si}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3846--3855}, year = {2019}, volume = {97}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR} }

    Journal ref: ICML 2019

  16. arXiv:1810.01541  [pdf

    cs.AI

    Co-Arg: Cogent Argumentation with Crowd Elicitation

    Authors: Mihai Boicu, Dorin Marcu, Gheorghe Tecuci, Lou Kaiser, Chirag Uttamsingh, Navya Kalale

    Abstract: This paper presents Co-Arg, a new type of cognitive assistant to an intelligence analyst that enables the synergistic integration of analyst imagination and expertise, computer knowledge and critical reasoning, and crowd wisdom, to draw defensible and persuasive conclusions from masses of evidence of all types, in a world that is changing all the time. Co-Arg's goal is to improve the quality of th… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: Presented at AAAI FSS-18: Artificial Intelligence in Government and Public Sector, Arlington, Virginia, USA

  17. arXiv:1807.03819  [pdf, other

    cs.CL cs.LG stat.ML

    Universal Transformers

    Authors: Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Łukasz Kaiser

    Abstract: Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherently sequential computation makes them slow to train. Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine tr… ▽ More

    Submitted 5 March, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: Published at ICLR2019

  18. arXiv:1803.07416  [pdf, other

    cs.LG cs.CL stat.ML

    Tensor2Tensor for Neural Machine Translation

    Authors: Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, Jakob Uszkoreit

    Abstract: Tensor2Tensor is a library for deep learning models that is well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model.

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: arXiv admin note: text overlap with arXiv:1706.03762

  19. arXiv:1803.03382  [pdf, other

    cs.LG

    Fast Decoding in Sequence Models using Discrete Latent Variables

    Authors: Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, Noam Shazeer

    Abstract: Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallelizable during training, yet st… ▽ More

    Submitted 7 June, 2018; v1 submitted 8 March, 2018; originally announced March 2018.

    Comments: ICML 2018

  20. arXiv:1802.05751  [pdf, other

    cs.CV

    Image Transformer

    Authors: Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

    Abstract: Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By… ▽ More

    Submitted 15 June, 2018; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: Appears in International Conference on Machine Learning, 2018. Code available at https://github.com/tensorflow/tensor2tensor

  21. arXiv:1801.10198  [pdf, other

    cs.CL

    Generating Wikipedia by Summarizing Long Sequences

    Authors: Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer

    Abstract: We show that generating English Wikipedia articles can be approached as a multi- document summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. For the abstractive model, we introduce a decoder-only architecture that can scalably attend to very long sequences, much longer than typical enco… ▽ More

    Submitted 30 January, 2018; originally announced January 2018.

    Comments: Published as a conference paper at ICLR 2018

  22. arXiv:1801.09797  [pdf, ps, other

    cs.LG stat.ML

    Discrete Autoencoders for Sequence Models

    Authors: Łukasz Kaiser, Samy Bengio

    Abstract: Recurrent models for sequences have been recently successful at many tasks, especially for language modeling and machine translation. Nevertheless, it remains challenging to extract good representations from these models. For instance, even though language has a clear hierarchical structure going from characters through words to sentences, it is not apparent in current language models. We propose… ▽ More

    Submitted 29 January, 2018; originally announced January 2018.

  23. arXiv:1801.04883  [pdf, other

    cs.LG

    Unsupervised Cipher Cracking Using Discrete GANs

    Authors: Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, Lukasz Kaiser

    Abstract: This work details CipherGAN, an architecture inspired by CycleGAN used for inferring the underlying cipher mapping given banks of unpaired ciphertext and plaintext. We demonstrate that CipherGAN is capable of cracking language data enciphered using shift and Vigenere ciphers to a high degree of fidelity and for vocabularies much larger than previously achieved. We present how CycleGAN can be made… ▽ More

    Submitted 15 January, 2018; originally announced January 2018.

  24. arXiv:1706.05137  [pdf, other

    cs.LG stat.ML

    One Model To Learn Them All

    Authors: Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

    Abstract: Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrentl… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

  25. arXiv:1706.03762  [pdf, other

    cs.CL cs.LG

    Attention Is All You Need

    Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin

    Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experi… ▽ More

    Submitted 1 August, 2023; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: 15 pages, 5 figures

  26. arXiv:1706.03059  [pdf, other

    cs.CL cs.LG

    Depthwise Separable Convolutions for Neural Machine Translation

    Authors: Lukasz Kaiser, Aidan N. Gomez, Francois Chollet

    Abstract: Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency. They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters requir… ▽ More

    Submitted 15 June, 2017; v1 submitted 9 June, 2017; originally announced June 2017.

  27. arXiv:1703.03129  [pdf, other

    cs.LG

    Learning to Remember Rare Events

    Authors: Łukasz Kaiser, Ofir Nachum, Aurko Roy, Samy Bengio

    Abstract: Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the modul… ▽ More

    Submitted 8 March, 2017; originally announced March 2017.

    Comments: Conference paper accepted for ICLR'17

  28. arXiv:1701.06548  [pdf, other

    cs.NE cs.LG

    Regularizing Neural Networks by Penalizing Confident Output Distributions

    Authors: Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton

    Abstract: We systematically explore regularizing neural networks by penalizing low entropy output distributions. We show that penalizing low entropy output distributions, which has been shown to improve exploration in reinforcement learning, acts as a strong regularizer in supervised learning. Furthermore, we connect a maximum entropy based confidence penalty to label smoothing through the direction of the… ▽ More

    Submitted 23 January, 2017; originally announced January 2017.

    Comments: Submitted to ICLR 2017

  29. arXiv:1610.08613  [pdf, ps, other

    cs.LG cs.CL

    Can Active Memory Replace Attention?

    Authors: Łukasz Kaiser, Samy Bengio

    Abstract: Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years. Attention has improved image classification, image captioning, speech recognition, generative models, and learning algorithmic tasks, but it had probably the largest impact on neural machine translation. Recently, similar improvem… ▽ More

    Submitted 6 March, 2017; v1 submitted 27 October, 2016; originally announced October 2016.

  30. arXiv:1609.08144  [pdf, other

    cs.CL cs.AI cs.LG

    Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

    Authors: Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith , et al. (6 additional authors not shown)

    Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NM… ▽ More

    Submitted 8 October, 2016; v1 submitted 26 September, 2016; originally announced September 2016.

  31. arXiv:1609.02664  [pdf, ps, other

    cs.LG cs.LO

    Machine Learning with Guarantees using Descriptive Complexity and SMT Solvers

    Authors: Charles Jordan, Łukasz Kaiser

    Abstract: Machine learning is a thriving part of computer science. There are many efficient approaches to machine learning that do not provide strong theoretical guarantees, and a beautiful general learning theory. Unfortunately, machine learning approaches that give strong theoretical guarantees have not been efficient enough to be applicable. In this paper we introduce a logical approach to machine learni… ▽ More

    Submitted 9 September, 2016; originally announced September 2016.

  32. arXiv:1603.04467  [pdf, other

    cs.DC cs.LG

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

    Authors: Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah , et al. (15 additional authors not shown)

    Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational de… ▽ More

    Submitted 16 March, 2016; v1 submitted 14 March, 2016; originally announced March 2016.

    Comments: Version 2 updates only the metadata, to correct the formatting of Martín Abadi's name

  33. arXiv:1511.08228  [pdf, ps, other

    cs.LG cs.NE

    Neural GPUs Learn Algorithms

    Authors: Łukasz Kaiser, Ilya Sutskever

    Abstract: Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that use backpropagation to learn their own programming. Despite their appeal NTMs have a weakness that is caused by their sequential nature: they are not parallel an… ▽ More

    Submitted 14 March, 2016; v1 submitted 25 November, 2015; originally announced November 2015.

  34. arXiv:1511.06807  [pdf, other

    stat.ML cs.LG

    Adding Gradient Noise Improves Learning for Very Deep Networks

    Authors: Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

    Abstract: Deep feedforward and recurrent networks have achieved impressive results in many perception and language processing applications. This success is partially attributed to architectural innovations such as convolutional and long short-term memory networks. The main motivation for these architectural innovations is that they capture better domain knowledge, and importantly are easier to optimize than… ▽ More

    Submitted 20 November, 2015; originally announced November 2015.

  35. arXiv:1511.06114  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Multi-task Sequence to Sequence Learning

    Authors: Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser

    Abstract: Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machi… ▽ More

    Submitted 1 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: 10 pages, 4 figures, ICLR 2016 camera-ready, added parsing SOTA results

  36. arXiv:1412.7449  [pdf, other

    cs.CL cs.LG stat.ML

    Grammar as a Foreign Language

    Authors: Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton

    Abstract: Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used… ▽ More

    Submitted 9 June, 2015; v1 submitted 23 December, 2014; originally announced December 2014.

  37. arXiv:1408.4745  [pdf, ps, other

    cs.DM math.CO

    Directed Width Measures and Monotonicity of Directed Graph Searching

    Authors: Łukasz Kaiser, Stephan Kreutzer, Roman Rabinovich, Sebastian Siebertz

    Abstract: We consider generalisations of tree width to directed graphs, that attracted much attention in the last fifteen years. About their relative strength with respect to "bounded width in one measure implies bounded width in the other" many problems remain unsolved. Only some results separating directed width measures are known. We give an almost complete picture of this relation. For this, we consider… ▽ More

    Submitted 20 August, 2014; originally announced August 2014.

    MSC Class: 68R10

  38. Model Checking the Quantitative mu-Calculus on Linear Hybrid Systems

    Authors: Diana Fischer, Lukasz Kaiser

    Abstract: We study the model-checking problem for a quantitative extension of the modal mu-calculus on a class of hybrid systems. Qualitative model checking has been proved decidable and implemented for several classes of systems, but this is not the case for quantitative questions that arise naturally in this context. Recently, quantitative formalisms that subsume classical temporal logics and allow the m… ▽ More

    Submitted 19 September, 2012; v1 submitted 8 September, 2012; originally announced September 2012.

    Comments: LMCS submission

    ACM Class: D.2.4, F.4.1

    Journal ref: Logical Methods in Computer Science, Volume 8, Issue 3 (September 20, 2012) lmcs:760

  39. Degrees of Lookahead in Regular Infinite Games

    Authors: Michael Holtmann, Lukasz Kaiser, Wolfgang Thomas

    Abstract: We study variants of regular infinite games where the strict alternation of moves between the two players is subject to modifications. The second player may postpone a move for a finite number of steps, or, in other words, exploit in his strategy some lookahead on the moves of the opponent. This captures situations in distributed systems, e.g. when buffers are present in communication or when sig… ▽ More

    Submitted 25 September, 2012; v1 submitted 4 September, 2012; originally announced September 2012.

    Comments: LMCS submission

    ACM Class: D.2.4

    Journal ref: Logical Methods in Computer Science, Volume 8, Issue 3 (September 27, 2012) lmcs:922

  40. arXiv:0802.2871  [pdf, ps, other

    cs.LO cs.GT

    Model Checking Games for the Quantitative mu-Calculus

    Authors: Diana Fischer, Erich Grädel, Lukasz Kaiser

    Abstract: We investigate quantitative extensions of modal logic and the modal mu-calculus, and study the question whether the tight connection between logic and games can be lifted from the qualitative logics to their quantitative counterparts. It turns out that, if the quantitative mu-calculus is defined in an appropriate way respecting the duality properties between the logical operators, then its model… ▽ More

    Submitted 20 February, 2008; originally announced February 2008.

    Journal ref: Dans Proceedings of the 25th Annual Symposium on the Theoretical Aspects of Computer Science - STACS 2008, Bordeaux : France (2008)

  41. arXiv:0802.2866  [pdf, ps, other

    cs.LO

    Cardinality and counting quantifiers on omega-automatic structures

    Authors: Lukasz Kaiser, Sasha Rubin, Vince Bárány

    Abstract: We investigate structures that can be represented by omega-automata, so called omega-automatic structures, and prove that relations defined over such structures in first-order logic expanded by the first-order quantifiers `there exist at most $\aleph_0$ many', 'there exist finitely many' and 'there exist $k$ modulo $m$ many' are omega-regular. The proof identifies certain algebraic properties of… ▽ More

    Submitted 20 February, 2008; originally announced February 2008.

    Journal ref: Dans Proceedings of the 25th Annual Symposium on the Theoretical Aspects of Computer Science - STACS 2008, Bordeaux : France (2008)