Skip to main content

Showing 1–25 of 25 results for author: Brennan, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  2. arXiv:2404.14068  [pdf, other

    cs.AI cs.LG

    Holistic Safety and Responsibility Evaluations of Advanced AI Models

    Authors: Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, William Isaac

    Abstract: Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages excluding bibliography

  3. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-tr… ▽ More

    Submitted 28 August, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  4. arXiv:2403.17299  [pdf, other

    cs.CL q-bio.NC

    Decoding Probing: Revealing Internal Linguistic Structures in Neural Language Models using Minimal Pairs

    Authors: Linyang He, Peili Chen, Ercong Nie, Yuanning Li, Jonathan R. Brennan

    Abstract: Inspired by cognitive neuroscience studies, we introduce a novel `decoding probing' method that uses minimal pairs benchmark (BLiMP) to probe internal linguistic characteristics in neural language models layer by layer. By treating the language model as the `brain' and its representations as `neural activations', we decode grammaticality labels of minimal pairs from the intermediate layers' repres… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  5. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  6. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  7. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  8. arXiv:2306.09871  [pdf, other

    cs.HC cs.AI cs.CY

    Going public: the role of public participation approaches in commercial AI labs

    Authors: Lara Groves, Aidan Peppin, Andrew Strait, Jenny Brennan

    Abstract: In recent years, discussions of responsible AI practices have seen growing support for "participatory AI" approaches, intended to involve members of the public in the design and development of AI systems. Prior research has identified a lack of standardised methods or approaches for how to use participatory approaches in the AI development process. At present, there is a dearth of evidence on atti… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted to 2023 FAccT Conference

  9. arXiv:2306.00485  [pdf, other

    stat.ME cs.LG econ.EM

    Causal Estimation of User Learning in Personalized Systems

    Authors: Evan Munro, David Jones, Jennifer Brennan, Roland Nelet, Vahab Mirrokni, Jean Pouget-Abadie

    Abstract: In online platforms, the impact of a treatment on an observed outcome may change over time as 1) users learn about the intervention, and 2) the system personalization, such as individualized recommendations, change over time. We introduce a non-parametric causal model of user actions in a personalized system. We show that the Cookie-Cookie-Day (CCD) experiment, designed for the measurement of the… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: EC 2023

  10. arXiv:2210.16147  [pdf, other

    cs.CL

    Modeling structure-building in the brain with CCG parsing and large language models

    Authors: Miloš Stanojević, Jonathan R. Brennan, Donald Dunagan, Mark Steedman, John T. Hale

    Abstract: To model behavioral and neural correlates of language comprehension in naturalistic environments researchers have turned to broad-coverage tools from natural-language processing and machine learning. Where syntactic structure is explicitly modeled, prior work has relied predominantly on context-free grammars (CFG), yet such formalisms are not sufficiently expressive for human languages. Combinator… ▽ More

    Submitted 16 April, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

  11. arXiv:2202.06639  [pdf, other

    cs.CV cs.LG

    On the Complexity of Object Detection on Real-world Public Transportation Images for Social Distancing Measurement

    Authors: Nik Khadijah Nik Aznan, John Brennan, Daniel Bell, Jennine Jonczyk, Paul Watson

    Abstract: Social distancing in public spaces has become an essential aspect in helping to reduce the impact of the COVID-19 pandemic. Exploiting recent advances in machine learning, there have been many studies in the literature implementing social distancing via object detection through the use of surveillance cameras in public spaces. However, to date, there has been no study of social distance measuremen… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  12. arXiv:2202.02405  [pdf, other

    cs.LG stat.ML

    BAM: Bayes with Adaptive Memory

    Authors: Josue Nassar, Jennifer Brennan, Ben Evans, Kendall Lowrey

    Abstract: Online learning via Bayes' theorem allows new data to be continuously integrated into an agent's current beliefs. However, a naive application of Bayesian methods in non stationary environments leads to slow adaptation and results in state estimates that may converge confidently to the wrong parameter value. A common solution when learning in changing environments is to discard/downweight past dat… ▽ More

    Submitted 8 February, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: International Conference on Learning Representations (ICLR), 2022

  13. Deploying Containerized QuantEx Quantum Simulation Software on HPC Systems

    Authors: David Brayford, John Brennan, Momme Allalen, Kenneth Hanley, Luigi Iapichino, Lee ORiordan, Niall Moran

    Abstract: The simulation of quantum circuits using the tensor network method is very computationally demanding and requires significant High Performance Computing (HPC) resources to find an efficient contraction order and to perform the contraction of the large tensor networks. In addition, the researchers want a workflow that is easy to customize, reproduce and migrate to different HPC systems. In this pap… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Journal ref: 2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)

  14. arXiv:2106.08298  [pdf, other

    cs.HC cs.AI cs.CL cs.CY

    StockBabble: A Conversational Financial Agent to support Stock Market Investors

    Authors: Suraj Sharma, Joseph Brennan, Jason R. C. Nurse

    Abstract: We introduce StockBabble, a conversational agent designed to support understanding and engagement with the stock market. StockBabble's value and novelty is in its ability to empower retail investors -- many of which may be new to investing -- and supplement their informational needs using a user-friendly agent. Users have the ability to query information on companies to retrieve a general and fina… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: CUI 2021 - 3rd Conference on Conversational User Interfaces

  15. arXiv:2010.12635  [pdf, other

    cs.LG cs.PF

    Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

    Authors: John Brennan, Stephen Bonner, Amir Atapour-Abarghouei, Philip T Jackson, Boguslaw Obara, Andrew Stephen McGough

    Abstract: With the growing significance of graphs as an effective representation of data in numerous applications, efficient graph analysis using modern machine learning is receiving a growing level of attention. Deep learning approaches often operate over the entire adjacency matrix -- as the input and intermediate network layers are all designed in proportion to the size of the adjacency matrix -- leading… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  16. arXiv:2009.14374  [pdf, other

    cs.SD eess.AS

    Rethinking Evaluation Methodology for Audio-to-Score Alignment

    Authors: John Thickstun, Jennifer Brennan, Harsh Verma

    Abstract: This paper offers a precise, formal definition of an audio-to-score alignment. While the concept of an alignment is intuitively grasped, this precision affords us new insight into the evaluation of audio-to-score alignment algorithms. Motivated by these insights, we introduce new evaluation metrics for audio-to-score alignment. Using an alignment evaluation dataset derived from pairs of KernScores… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: 10 pages, 6 figures

  17. arXiv:2006.09616  [pdf, other

    cs.LG cs.PL stat.ML

    Dynamic Tensor Rematerialization

    Authors: Marisa Kirisame, Steven Lyubomirsky, Altan Haan, Jennifer Brennan, Mike He, Jared Roesch, Tianqi Chen, Zachary Tatlock

    Abstract: Checkpointing enables the training of deep learning models under restricted memory budgets by freeing intermediate activations from memory and recomputing them on demand. Current checkpointing techniques statically plan these recomputations offline and assume static computation graphs. We demonstrate that a simple online algorithm can achieve comparable performance by introducing Dynamic Tensor Re… ▽ More

    Submitted 18 March, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 31 pages, 12 figures, implementation available here: https://github.com/uwsampl/dtr-prototype, OpenReview: https://openreview.net/forum?id=Vfs_2RnOD0H

    ACM Class: C.3

  18. arXiv:2002.07297  [pdf, other

    stat.ML cs.LG

    Estimating the number and effect sizes of non-null hypotheses

    Authors: Jennifer Brennan, Ramya Korlakai Vinayak, Kevin Jamieson

    Abstract: We study the problem of estimating the distribution of effect sizes (the mean of the test statistic under the alternate hypothesis) in a multiple testing setting. Knowing this distribution allows us to calculate the power (type II error) of any experimental design. We show that it is possible to estimate this distribution using an inexpensive pilot experiment, which takes significantly fewer sampl… ▽ More

    Submitted 24 July, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  19. arXiv:2001.06472  [pdf, other

    cs.LG math.OC stat.ML

    Gradient descent with momentum --- to accelerate or to super-accelerate?

    Authors: Goran Nakerst, John Brennan, Masudul Haque

    Abstract: We consider gradient descent with `momentum', a widely used method for loss function minimization in machine learning. This method is often used with `Nesterov acceleration', meaning that the gradient is evaluated not at the current position in parameter space, but at the estimated position after one step. In this work, we show that the algorithm can be improved by extending this `acceleration' --… ▽ More

    Submitted 17 January, 2020; originally announced January 2020.

    Comments: 19 pages + references, 8 figures. A variant of Nesterov acceleration is proposed and studied

  20. arXiv:1908.08402  [pdf, other

    cs.SI

    Temporal Neighbourhood Aggregation: Predicting Future Links in Temporal Graphs via Recurrent Variational Graph Convolutions

    Authors: Stephen Bonner, Amir Atapour-Abarghouei, Philip T Jackson, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara

    Abstract: Graphs have become a crucial way to represent large, complex and often temporal datasets across a wide range of scientific disciplines. However, when graphs are used as input to machine learning models, this rich temporal information is frequently disregarded during the learning process, resulting in suboptimal performance on certain temporal infernce tasks. To combat this, we introduce Temporal N… ▽ More

    Submitted 21 November, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: IEEE International Conference on Big Data 2019

  21. arXiv:1811.11880  [pdf, other

    cs.LG cs.AI stat.ML

    Predicting the Computational Cost of Deep Learning Models

    Authors: Daniel Justus, John Brennan, Stephen Bonner, Andrew Stephen McGough

    Abstract: Deep learning is rapidly becoming a go-to tool for many artificial intelligence problems due to its ability to outperform other approaches and even humans at many problems. Despite its popularity we are still unable to accurately predict the time it will take to train a deep learning network to solve a given problem. This training time can be seen as the product of the training time per epoch and… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: Accepted for publication at the IEEE International Conference on Big Data, (C) IEEE

  22. arXiv:1811.08366  [pdf, other

    cs.SI cs.LG

    Temporal Graph Offset Reconstruction: Towards Temporally Robust Graph Representation Learning

    Authors: Stephen Bonner, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara

    Abstract: Graphs are a commonly used construct for representing relationships between elements in complex high dimensional datasets. Many real-world phenomenon are dynamic in nature, meaning that any graph used to represent them is inherently temporal. However, many of the machine learning models designed to capture knowledge about the structure of these graphs ignore this rich temporal information when cre… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

    Comments: Accepted as a workshop paper at IEEE Big Data 2018

  23. arXiv:1810.08675  [pdf, other

    cs.DC cs.LG

    Using Machine Learning to reduce the energy wasted in Volunteer Computing Environments

    Authors: A. Stephen McGough, Matthew Forshaw, John Brennan, Noura Al Moubayed, Stephen Bonner

    Abstract: High Throughput Computing (HTC) provides a convenient mechanism for running thousands of tasks. Many HTC systems exploit computers which are provisioned for other purposes by utilising their idle time - volunteer computing. This has great advantages as it gives access to vast quantities of computational power for little or no cost. The downside is that running tasks are sacrificed if the computer… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

    Comments: Accepted for publication at THE 9th international Green and sustainable computing Conference, Technically Co-sponsored by IEEE Computer Society & STC Sustainable Computing, October 22-24, Pittsburgh, PA, USA

  24. arXiv:1806.07464  [pdf, other

    cs.LG stat.ML

    Exploring the Semantic Content of Unsupervised Graph Embeddings: An Empirical Study

    Authors: Stephen Bonner, Ibad Kureshi, John Brennan, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara

    Abstract: Graph embeddings have become a key and widely used technique within the field of graph mining, proving to be successful across a broad range of domains including social, citation, transportation and biological. Graph embedding techniques aim to automatically create a low-dimensional representation of a given graph, which captures key structural elements in the resulting embedding space. However, t… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

  25. arXiv:1806.04127  [pdf, other

    cs.CL

    Finding Syntax in Human Encephalography with Beam Search

    Authors: John Hale, Chris Dyer, Adhiguna Kuncoro, Jonathan R. Brennan

    Abstract: Recurrent neural network grammars (RNNGs) are generative models of (tree,string) pairs that rely on neural networks to evaluate derivational choices. Parsing with them using beam search yields a variety of incremental complexity metrics such as word surprisal and parser action count. When used as regressors against human electrophysiological responses to naturalistic text, they derive two amplitud… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: ACL2018