Skip to main content

Showing 1–21 of 21 results for author: Sanh, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12637  [pdf, other

    cs.CV cs.AI

    Building and better understanding vision-language models: insights and future directions

    Authors: Hugo Laurençon, Andrés Marafioti, Victor Sanh, Léo Tronchon

    Abstract: The field of vision-language models (VLMs), which take images and texts as inputs and output texts, is rapidly evolving and has yet to reach consensus on several key aspects of the development pipeline, including data, architecture, and training methods. This paper can be seen as a tutorial for building a VLM. We begin by providing a comprehensive overview of the current state-of-the-art approache… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  2. arXiv:2406.16746  [pdf, other

    cs.LG cs.AI cs.CL

    The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

    Authors: Shayne Longpre, Stella Biderman, Alon Albalak, Hailey Schoelkopf, Daniel McDuff, Sayash Kapoor, Kevin Klyman, Kyle Lo, Gabriel Ilharco, Nay San, Maribeth Rauh, Aviya Skowron, Bertie Vidgen, Laura Weidinger, Arvind Narayanan, Victor Sanh, David Adelani, Percy Liang, Rishi Bommasani, Peter Henderson, Sasha Luccioni, Yacine Jernite, Luca Soldaini

    Abstract: Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation,… ▽ More

    Submitted 3 September, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2405.02246  [pdf, other

    cs.CV cs.AI

    What matters when building vision-language models?

    Authors: Hugo Laurençon, Léo Tronchon, Matthieu Cord, Victor Sanh

    Abstract: The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices im… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  4. arXiv:2403.09029  [pdf, other

    cs.HC cs.AI cs.CV

    Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

    Authors: Hugo Laurençon, Léo Tronchon, Victor Sanh

    Abstract: Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML h… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  5. arXiv:2306.16527  [pdf, other

    cs.IR cs.CV

    OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

    Authors: Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh

    Abstract: Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks. However, the datasets used to train these models have not been released, and the collection process has not been fully specified. We introduce the OBELICS dataset, an open web-scale filtered dataset of interleaved image-text documen… ▽ More

    Submitted 21 August, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  6. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  7. arXiv:2210.15424  [pdf, other

    cs.CL cs.AI cs.LG

    What Language Model to Train if You Have One Million GPU Hours?

    Authors: Teven Le Scao, Thomas Wang, Daniel Hesslow, Lucile Saulnier, Stas Bekman, M Saiful Bari, Stella Biderman, Hady Elsahar, Niklas Muennighoff, Jason Phang, Ofir Press, Colin Raffel, Victor Sanh, Sheng Shen, Lintang Sutawika, Jaesung Tae, Zheng Xin Yong, Julien Launay, Iz Beltagy

    Abstract: The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notabl… ▽ More

    Submitted 7 November, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  8. arXiv:2208.07852  [pdf, other

    cs.CL cs.HC cs.LG

    Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models

    Authors: Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush

    Abstract: State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training. This approach has gained popularity in recent years, and researchers have demonstrated prompts that achieve strong accuracy on specific NLP tasks. However, finding a prompt for new tasks requires experimentation. Different prompt templates wit… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: 9 pages content, 2 pages references

  9. arXiv:2202.01279  [pdf, other

    cs.LG cs.CL

    PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

    Authors: Stephen H. Bach, Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-David, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Alan Fries, Maged S. Al-shaibani, Shanya Sharma, Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev , et al. (2 additional authors not shown)

    Abstract: PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges… ▽ More

    Submitted 29 March, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: ACL 2022 Demo

  10. arXiv:2110.08207  [pdf, other

    cs.LG cs.CL

    Multitask Prompted Training Enables Zero-Shot Task Generalization

    Authors: Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen , et al. (16 additional authors not shown)

    Abstract: Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale,… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: ICLR 2022 Spotlight (with extended discussion)

  11. arXiv:2109.04838  [pdf, other

    cs.LG cs.CL

    Block Pruning For Faster Transformers

    Authors: François Lagunas, Ella Charlaix, Victor Sanh, Alexander M. Rush

    Abstract: Pre-training has improved model accuracy for both classification and generation tasks at the cost of introducing much larger and slower models. Pruning methods have proven to be an effective way of reducing model size, whereas distillation methods are proven for speeding up inference. We introduce a block pruning approach targeting both small and fast models. Our approach extends structured method… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021. Code, hyper-parameters, evaluation results and checkpoints available at https://github.com/huggingface/nn_pruning

    ACM Class: I.2.6; I.2.7

  12. arXiv:2109.04144  [pdf, other

    cs.CL cs.AI

    Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning

    Authors: Prasetya Ajie Utama, Nafise Sadat Moosavi, Victor Sanh, Iryna Gurevych

    Abstract: Recent prompt-based approaches allow pretrained language models to achieve strong performances on few-shot finetuning by reformulating downstream tasks as a language modeling problem. In this work, we demonstrate that, despite its advantages on low data regimes, finetuned prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inference heuristics… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP 2021

  13. arXiv:2109.02846  [pdf, other

    cs.CL

    Datasets: A Community Library for Natural Language Processing

    Authors: Quentin Lhoest, Albert Villanova del Moral, Yacine Jernite, Abhishek Thakur, Patrick von Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, Joe Davison, Mario Šaško, Gunjan Chhablani, Bhavitvya Malik, Simon Brandeis, Teven Le Scao, Victor Sanh, Canwen Xu, Nicolas Patry, Angelina McMillan-Major, Philipp Schmid, Sylvain Gugger, Clément Delangue, Théo Matussière, Lysandre Debut , et al. (7 additional authors not shown)

    Abstract: The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: EMNLP Demo 2021

  14. arXiv:2104.03514  [pdf, other

    cs.CL

    Low-Complexity Probing via Finding Subnetworks

    Authors: Steven Cao, Victor Sanh, Alexander M. Rush

    Abstract: The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model's internal representations. This approach can detect properties encoded in the model, but at the cost of adding new parameters that may learn the task directly. We instead propose a subtractive pruning-based probe, where we find an existing subnetwor… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Comments: NAACL-HLT 2021

  15. arXiv:2012.01300  [pdf, other

    cs.CL cs.LG

    Learning from others' mistakes: Avoiding dataset biases without modeling them

    Authors: Victor Sanh, Thomas Wolf, Yonatan Belinkov, Alexander M. Rush

    Abstract: State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended underlying task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We consider cases where the bias issues may not be explicitly identified, and show a method for t… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

    Comments: 15 pages, 6 figures, 6 tables

  16. arXiv:2011.14203  [pdf, other

    cs.AR cs.CL

    EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

    Authors: Thierry Tambe, Coleman Hooper, Lillian Pentecost, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul N. Whatmough, Alexander M. Rush, David Brooks, Gu-Yeon Wei

    Abstract: Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resource-constrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware co-design for latency-aware energy optimi… ▽ More

    Submitted 5 September, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

    Comments: 12 pages plus references. Paper to appear at the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO 2021)

  17. arXiv:2005.07683  [pdf, other

    cs.CL cs.LG

    Movement Pruning: Adaptive Sparsity by Fine-Tuning

    Authors: Victor Sanh, Thomas Wolf, Alexander M. Rush

    Abstract: Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning.… ▽ More

    Submitted 23 October, 2020; v1 submitted 15 May, 2020; originally announced May 2020.

    Comments: 14 pages, 6 figures, 3 tables. Published at NeurIPS2020. Code: \url{huggingface.co/mvp}

  18. arXiv:1910.03771  [pdf, other

    cs.CL

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    Authors: Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, Alexander M. Rush

    Abstract: Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the… ▽ More

    Submitted 13 July, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

    Comments: 8 pages, 4 figures, more details at https://github.com/huggingface/transformers

  19. arXiv:1910.01108  [pdf, other

    cs.CL

    DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

    Authors: Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf

    Abstract: As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tu… ▽ More

    Submitted 29 February, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

    Comments: February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted at the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019

  20. arXiv:1901.08149  [pdf, other

    cs.CL

    TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

    Authors: Thomas Wolf, Victor Sanh, Julien Chaumond, Clement Delangue

    Abstract: We introduce a new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model. Fine-tuning is performed by using a multi-task objective which combines several unsupervised prediction tasks. The resulting fine-tuned model shows strong improvements over the curren… ▽ More

    Submitted 4 February, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: 6 pages, 2 figures, 2 tables, NeurIPS 2018 CAI Workshop

  21. arXiv:1811.06031  [pdf, other

    cs.CL

    A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

    Authors: Victor Sanh, Thomas Wolf, Sebastian Ruder

    Abstract: Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learn… ▽ More

    Submitted 26 November, 2018; v1 submitted 14 November, 2018; originally announced November 2018.

    Comments: 8 pages, 1 figure, To appear in Proceedings of AAAI 2019