Skip to main content

Showing 1–50 of 181 results for author: Wang, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.08299  [pdf, other

    physics.ed-ph cs.CY

    Using Peer-Customers to Scalably Pair Student Teams with Customers for Hands-on Curriculum Final Projects

    Authors: Edward Jay Wang

    Abstract: Peer-customer is a mechanism to pair student teams with customers in hands-on curriculum courses. Each student pitches a problem they want someone else in the class to solve for them. The use of peer-customers provides practical and scalable access for students to work with a customer on a real-world need for their final project. The peer-customer, despite being a student in the class, do not work… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    ACM Class: K.3.2

  2. arXiv:2409.03733  [pdf, other

    cs.LG cs.AI cs.CL

    Planning In Natural Language Improves LLM Search For Code Generation

    Authors: Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang

    Abstract: While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs, leading to inefficient search due to models repeatedly sampling highly similar, yet incorrect generations. We empirically demonstrate that this lack of diversi… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2408.16629  [pdf, other

    cs.CY cs.AI cs.SI

    LLMs generate structurally realistic social networks but overestimate political homophily

    Authors: Serina Chang, Alicja Chaszczewicz, Emma Wang, Maya Josifovska, Emma Pierson, Jure Leskovec

    Abstract: Generating social networks is essential for many applications, such as epidemic modeling and social simulations. Prior approaches either involve deep learning models, which require many observed networks for training, or stylized models, which are limited in their realism and flexibility. In contrast, LLMs offer the potential for zero-shot and flexible network generation. However, two key question… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  4. arXiv:2408.16027  [pdf, other

    cs.LG cs.AI cs.NI

    Toward Time-Continuous Data Inference in Sparse Urban CrowdSensing

    Authors: Ziyu Sun, Haoyang Su, Hanqi Sun, En Wang, Wenbin Liu

    Abstract: Mobile Crowd Sensing (MCS) is a promising paradigm that leverages mobile users and their smart portable devices to perform various real-world tasks. However, due to budget constraints and the inaccessibility of certain areas, Sparse MCS has emerged as a more practical alternative, collecting data from a limited number of target subareas and utilizing inference algorithms to complete the full sensi… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 11 pages, 11 figures

  5. arXiv:2408.12534  [pdf, other

    eess.IV cs.AI cs.CV

    Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

    Abstract: Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a lar… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024 FLARE Challenge Summary

  6. arXiv:2408.04226  [pdf, other

    cs.CL

    Evaluating Language Model Math Reasoning via Grounding in Educational Curricula

    Authors: Li Lucy, Tal August, Rose E. Wang, Luca Soldaini, Courtney Allison, Kyle Lo

    Abstract: Our work presents a novel angle for evaluating language models' (LMs) mathematical abilities, by investigating whether they can discern skills and concepts enabled by math content. We contribute two datasets: one consisting of 385 fine-grained descriptions of K-12 math skills and concepts, or standards, from Achieve the Core (ATC), and another of 9.9K problems labeled with these standards (MathFis… ▽ More

    Submitted 9 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 30 pages, 23 figures

  7. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  8. arXiv:2407.14142  [pdf, other

    cs.CV cs.LG

    Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation

    Authors: Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao, Enguang Wang, Le Zhang, Xialei Liu

    Abstract: Class incremental semantic segmentation aims to preserve old knowledge while learning new tasks, however, it is impeded by catastrophic forgetting and background shift issues. Prior works indicate the pivotal importance of initializing new classifiers and mainly focus on transferring knowledge from the background classifier or preparing classifiers for future classes, neglecting the flexibility an… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  9. arXiv:2407.12366  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

    Authors: Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu

    Abstract: Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a trend underscores the potential of LLMs to generalize navigational reasoning and diverse language understanding. However, a significant discrepancy in agent performance is observed when integrating LLMs in the Vision-and-… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  10. arXiv:2407.11988  [pdf, other

    cs.CL

    Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing

    Authors: Shafiuddin Rehan Ahmed, Zhiyong Eric Wang, George Arthur Baker, Kevin Stowe, James H. Martin

    Abstract: The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two iss… ▽ More

    Submitted 5 June, 2024; originally announced July 2024.

    Comments: Short Paper, ACL 2024

  11. arXiv:2407.08650  [pdf, other

    physics.med-ph cs.CV

    Latent Spaces Enable Transformer-Based Dose Prediction in Complex Radiotherapy Plans

    Authors: Edward Wang, Ryan Au, Pencilla Lang, Sarah A. Mattonen

    Abstract: Evidence is accumulating in favour of using stereotactic ablative body radiotherapy (SABR) to treat multiple cancer lesions in the lung. Multi-lesion lung SABR plans are complex and require significant resources to create. In this work, we propose a novel two-stage latent transformer framework (LDFormer) for dose prediction of lung SABR plans with varying numbers of lesions. In the first stage, pa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  12. arXiv:2407.08219  [pdf, other

    cs.CL cs.HC

    Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People

    Authors: Zain Merchant, Abrar Anwar, Emily Wang, Souti Chattopadhyay, Jesse Thomason

    Abstract: Navigating unfamiliar environments presents significant challenges for blind and low-vision (BLV) individuals. In this work, we construct a dataset of images and goals across different scenarios such as searching through kitchens or navigating outdoors. We then investigate how grounded instruction generation methods can provide contextually-relevant navigational guidance to users in these instance… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted as RO-MAN 2024 Late Breaking Report

  13. arXiv:2406.19263  [pdf, other

    cs.CL cs.CV

    Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

    Authors: Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

    Abstract: Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  14. arXiv:2406.15762  [pdf, other

    cs.LG stat.ML

    Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow

    Authors: Zhichao Chen, Haoxuan Li, Fangyikang Wang, Odin Zhang, Hu Xu, Xiaoyu Jiang, Zhihuan Song, Eric H. Wang

    Abstract: Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  15. arXiv:2406.15575  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

    Authors: Mucong Ding, Tahseen Rabbani, Bang An, Evan Z Wang, Furong Huang

    Abstract: Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational com… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2022

  16. arXiv:2406.14288  [pdf, other

    cs.LG cs.AI

    Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective

    Authors: Yunfei Liu, Jintang Li, Yuehe Chen, Ruofan Wu, Ericbk Wang, Jing Zhou, Sheng Tian, Shuheng Shen, Xing Fu, Changhua Meng, Weiqiang Wang, Liang Chen

    Abstract: Graph clustering, a fundamental and challenging task in graph mining, aims to classify nodes in a graph into several disjoint clusters. In recent years, graph contrastive learning (GCL) has emerged as a dominant line of research in graph clustering and advances the new state-of-the-art. However, GCL-based methods heavily rely on graph augmentations and contrastive schemes, which may potentially in… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: KDD 2024 research track. Code available at https://github.com/EdisonLeeeee/MAGI

  17. arXiv:2406.12831  [pdf, other

    cs.CV cs.AI cs.MM

    VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

    Authors: Jing Gu, Yuwei Fang, Ivan Skorokhodov, Peter Wonka, Xinya Du, Sergey Tulyakov, Xin Eric Wang

    Abstract: Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  18. arXiv:2406.09305  [pdf, other

    cs.CV

    Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

    Authors: Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun

    Abstract: In subject-driven text-to-image generation, recent works have achieved superior performance by training the model on synthetic datasets containing numerous image pairs. Trained on these datasets, generative models can produce text-aligned images for specific subject from arbitrary testing image in a zero-shot manner. They even outperform methods which require additional fine-tuning on testing imag… ▽ More

    Submitted 7 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2406.08689  [pdf, other

    cs.CR cs.AI

    Security of AI Agents

    Authors: Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, Hao Chen

    Abstract: The study and development of AI agents have been boosted by large language models. AI agents can function as intelligent assistants and complete tasks on behalf of their users with access to tools and the ability to execute commands in their environments, Through studying and experiencing the workflow of typical AI agents, we have raised several concerns regarding their security. These potential v… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 29 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  21. arXiv:2406.06559  [pdf, other

    cs.CL cs.AI cs.LG

    Harnessing Business and Media Insights with Large Language Models

    Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

    Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  22. arXiv:2406.06316  [pdf, other

    cs.CL cs.AI cs.CE cs.LG

    Tx-LLM: A Large Language Model for Therapeutics

    Authors: Juan Manuel Zambrano Chaves, Eric Wang, Tao Tu, Eeshit Dhaval Vaishnav, Byron Lee, S. Sara Mahdavi, Christopher Semturs, David Fleet, Vivek Natarajan, Shekoofeh Azizi

    Abstract: Developing therapeutics is a lengthy and expensive process that requires the satisfaction of many different criteria, and AI models capable of expediting the process would be invaluable. However, the majority of current AI approaches address only a narrowly defined set of tasks, often circumscribed within a particular domain. To bridge this gap, we introduce Tx-LLM, a generalist large language mod… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  23. arXiv:2405.20421  [pdf, other

    cs.AI

    Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

    Authors: Qianqi Yan, Xuehai He, Xiang Yue, Xin Eric Wang

    Abstract: Large Multimodal Models (LMMs) have shown remarkable progress in medical Visual Question Answering (Med-VQA), achieving high accuracy on existing benchmarks. However, their reliability under robust evaluation is questionable. This study reveals that when subjected to simple probing evaluation, state-of-the-art models perform worse than random guessing on medical diagnosis questions. To address thi… ▽ More

    Submitted 10 September, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  24. arXiv:2405.07387  [pdf, other

    cs.LG

    Semantic Loss Functions for Neuro-Symbolic Structured Prediction

    Authors: Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck

    Abstract: Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which inject… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Preprint of Ch. 22 "Semantic Loss Functions for Neuro-Symbolic Structured Prediction" in "Compendium of Neurosymbolic Artificial Intelligence", https://ebooks.iospress.nl/ISBN/978-1-64368-406-2. arXiv admin note: substantial text overlap with arXiv:2201.11250, arXiv:2007.13197

  25. arXiv:2405.04834  [pdf, other

    cs.CV

    FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

    Authors: Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A Sigurdsson, Nanyun Peng, Xin Eric Wang

    Abstract: Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps. Nevertheless, current controllable T2I methods commonly face challenges related to efficiency and faithfulness, especially when conditioning on multiple inputs from either the same or diverse modalities. In this paper, we propose a novel Flexibl… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  26. arXiv:2405.03162  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Advancing Multimodal Medical Capabilities of Gemini

    Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

    Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  27. arXiv:2404.13220  [pdf

    cs.CR cs.LG

    Security and Privacy Product Inclusion

    Authors: Dave Kleidermacher, Emmanuel Arriaga, Eric Wang, Sebastian Porst, Roger Piqueras Jover

    Abstract: In this paper, we explore the challenges of ensuring security and privacy for users from diverse demographic backgrounds. We propose a threat modeling approach to identify potential risks and countermeasures for product inclusion in security and privacy. We discuss various factors that can affect a user's ability to achieve a high level of security and privacy, including low-income demographics, p… ▽ More

    Submitted 23 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  28. arXiv:2404.05717  [pdf, other

    cs.CV cs.AI

    SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

    Authors: Jing Gu, Yilin Wang, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

    Abstract: Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, w… ▽ More

    Submitted 6 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 18 pages, 16 figures, 3 tables

  29. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  30. arXiv:2403.09974  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

    Authors: Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, Ming-Ming Cheng

    Abstract: Given unlabelled datasets containing both old and new categories, generalized category discovery (GCD) aims to accurately discover new classes while correctly classifying old classes, leveraging the class concepts learned from labeled samples. Current GCD methods only use a single visual modality of information, resulting in poor classification of visually similar classes. As a different modality,… ▽ More

    Submitted 10 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  31. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  32. arXiv:2403.03956  [pdf, other

    cs.IR cs.CL

    Backtracing: Retrieving the Cause of the Query

    Authors: Rose E. Wang, Pawan Wirawarn, Omar Khattab, Noah Goodman, Dorottya Demszky

    Abstract: Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures). While information retrieval (IR) systems may provide answers for such user queries, they do not directly assist content creators -- such as lecturers who want to improve their content -- identify segments that _caused_ a user to ask those questions. We introduce the task of backtracing,… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/rosewang2008/backtracing; EACL 2024 Findings, Long Paper

  33. arXiv:2402.19441  [pdf

    cs.GR

    3D Gaussian Model for Animation and Texturing

    Authors: Xiangzhi Eric Wang, Zackary P. T. Sin

    Abstract: 3D Gaussian Splatting has made a marked impact on neural rendering by achieving impressive fidelity and performance. Despite this achievement, however, it is not readily applicable to developing interactive applications. Real-time applications like XR apps and games require functions such as animation, UV-mapping, and model editing simultaneously manipulated through the usage of a 3D model. We pro… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  34. arXiv:2402.19233  [pdf, other

    cs.CY

    Shared lightweight autonomous vehicles for urban food deliveries: A simulation study

    Authors: Ainhoa Genua Cerviño, Naroa Coretti Sanchez, Elaine Liu Wang, Arnaud Grignard, Kent Larson

    Abstract: In recent years, the rapid growth of on-demand deliveries, especially in food deliveries, has spurred the exploration of innovative mobility solutions. In this context, lightweight autonomous vehicles have emerged as a potential alternative. However, their fleet-level behavior remains largely unexplored. To address this gap, we have developed an agent-based model and an environmental impact study… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 17 pages, 25 including abstract, 16 figures, journal paper

  35. arXiv:2402.15690  [pdf, other

    cs.CL cs.AI

    Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology

    Authors: Zhenhua Wang, Wei Xie, Baosheng Wang, Enze Wang, Zhiwen Gui, Shuoyoucheng Ma, Kai Chen

    Abstract: Large Language Models (LLMs) have gradually become the gateway for people to acquire new knowledge. However, attackers can break the model's security protection ("jail") to access restricted information, which is called "jailbreaking." Previous studies have shown the weakness of current LLMs when confronted with such jailbreaking attacks. Nevertheless, comprehension of the intrinsic decision-makin… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  36. arXiv:2402.14815  [pdf

    cs.CY cs.AI cs.CV cs.LG

    Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging

    Authors: Yuzhe Yang, Yujia Liu, Xin Liu, Avanti Gulhane, Domenico Mastrodicasa, Wei Wu, Edward J Wang, Dushyant W Sahani, Shwetak Patel

    Abstract: Advances in artificial intelligence (AI) have achieved expert-level performance in medical imaging applications. Notably, self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations. However, it is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically margin… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Code and data are available at https://github.com/YyzHarry/vlm-fairness

  37. arXiv:2402.09171  [pdf, other

    cs.SE

    Automated Unit Test Improvement using Large Language Models at Meta

    Authors: Nadia Alshahwan, Jubin Chheda, Anastasia Finegenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

    Abstract: This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Ins… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 12 pages, 8 figures, 32nd ACM Symposium on the Foundations of Software Engineering (FSE 24)

  38. arXiv:2402.06111  [pdf, other

    cs.SE

    Observation-based unit test generation at Meta

    Authors: Nadia Alshahwan, Mark Harman, Alexandru Marginean, Rotem Tal, Eddy Wang

    Abstract: TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. We describe the development and deployment of TestGen at Meta. In particular, we focus on the scalability challenges overcome during development in order to deploy observation-based test carving at scale in industry. So far, TestGen has landed 518 tests into production… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 12 pages, 8 figures, FSE 2024, Mon 15 - Fri 19 July 2024, Porto de Galinhas, Brazil

  39. arXiv:2402.05111  [pdf, other

    cs.CL cs.AI

    Edu-ConvoKit: An Open-Source Library for Education Conversation Data

    Authors: Rose E. Wang, Dorottya Demszky

    Abstract: We introduce Edu-ConvoKit, an open-source library designed to handle pre-processing, annotation and analysis of conversation data in education. Resources for analyzing education conversation data are scarce, making the research challenging to perform and therefore hard to access. We address these challenges with Edu-ConvoKit. Edu-ConvoKit is open-source (https://github.com/stanfordnlp/edu-convokit… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: https://github.com/stanfordnlp/edu-convokit https://edu-convokit.readthedocs.io/en/latest/

  40. arXiv:2402.04380  [pdf, other

    cs.SE

    Assured LLM-Based Software Engineering

    Authors: Nadia Alshahwan, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

    Abstract: In this paper we address the following question: How can we use Large Language Models (LLMs) to improve code independently of a human, while ensuring that the improved code - does not regress the properties of the original code? - improves the original in a verifiable and measurable way? To address this question, we advocate Assured LLM-Based Software Engineering; a generate-and-test approac… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 6 pages, 1 figure, InteNSE 24: ACM International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, April, 2024, Lisbon, Portugal

  41. arXiv:2402.03396  [pdf, other

    cs.SE cs.AI cs.CL cs.CR cs.LG

    UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing

    Authors: Yifeng He, Jiabo Huang, Yuyang Rong, Yiwen Guo, Ethan Wang, Hao Chen

    Abstract: The remarkable capability of large language models (LLMs) in generating high-quality code has drawn increasing attention in the software testing community. However, existing code LLMs often demonstrate unsatisfactory capabilities in generating accurate and complete tests since they were trained on code snippets collected without differentiating between code for testing purposes and other code. In… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 8 pages, 5 figures

  42. arXiv:2402.00319  [pdf, other

    cs.CV cs.AI

    SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling

    Authors: Eileen Wang, Soyeon Caren Han, Josiah Poon

    Abstract: Visual storytelling aims to automatically generate a coherent story based on a given image sequence. Unlike tasks like image captioning, visual stories should contain factual descriptions, worldviews, and human social commonsense to put disjointed elements together to form a coherent and engaging human-writeable story. However, most models mainly focus on applying factual information and using tax… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  43. arXiv:2401.16645  [pdf, other

    cs.LG

    Speeding up and reducing memory usage for scientific machine learning via mixed precision

    Authors: Joel Hayford, Jacob Goldman-Wetzler, Eric Wang, Lu Lu

    Abstract: Scientific machine learning (SciML) has emerged as a versatile approach to address complex computational science and engineering problems. Within this field, physics-informed neural networks (PINNs) and deep operator networks (DeepONets) stand out as the leading techniques for solving partial differential equations by incorporating both physical equations and experimental data. However, training P… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 25 pages, 7 figures

  44. arXiv:2401.15847  [pdf, other

    cs.CV cs.AI cs.CL

    Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA

    Authors: Yue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric Wang

    Abstract: Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily lives. These images, characterized by their composition of multiple subfigures in distinct layouts, effectively convey information to people. Toward building advanced multimodal AI applications, such as agents that understand complex scenes and navigate through webpages, the skill of multipanel visual reasoning i… ▽ More

    Submitted 27 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: ACL 2024

  45. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  46. arXiv:2310.10648  [pdf, other

    cs.CL cs.AI

    Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

    Authors: Rose E. Wang, Qingyang Zhang, Carly Robinson, Susanna Loeb, Dorottya Demszky

    Abstract: Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribu… ▽ More

    Submitted 6 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: NAACL 2024. Code: https://github.com/rosewang2008/bridge

  47. arXiv:2310.10637  [pdf, other

    cs.CL

    "Mistakes Help Us Grow": Facilitating and Evaluating Growth Mindset Supportive Language in Classrooms

    Authors: Kunal Handa, Margaret Clapper, Jessica Boyle, Rose E Wang, Diyi Yang, David S Yeager, Dorottya Demszky

    Abstract: Teachers' growth mindset supportive language (GMSL)--rhetoric emphasizing that one's skills can be improved over time--has been shown to significantly reduce disparities in academic achievement and enhance students' learning outcomes. Although teachers espouse growth mindset principles, most find it difficult to adopt GMSL in their practice due the lack of effective coaching in this area. We explo… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

  48. arXiv:2310.05872  [pdf, other

    cs.CV cs.AI cs.CL

    ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

    Authors: Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang

    Abstract: In our work, we explore the synergistic capabilities of pre-trained vision-and-language models (VLMs) and large language models (LLMs) on visual commonsense reasoning (VCR) problems. We find that VLMs and LLMs-based decision pipelines are good at different kinds of VCR problems. Pre-trained VLMs exhibit strong performance for problems involving understanding the literal visual content, which we no… ▽ More

    Submitted 17 May, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  49. arXiv:2310.03903  [pdf, other

    cs.CL cs.MA

    LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

    Authors: Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang

    Abstract: The emergent reasoning and Theory of Mind (ToM) abilities demonstrated by Large Language Models (LLMs) make them promising candidates for developing coordination agents. In this study, we introduce a new LLM-Coordination Benchmark aimed at a detailed analysis of LLMs within the context of Pure Coordination Games, where participating agents need to cooperate for the most gain. This benchmark evalua… ▽ More

    Submitted 2 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  50. arXiv:2310.03675  [pdf, other

    cs.LG

    Hadamard Domain Training with Integers for Class Incremental Quantized Learning

    Authors: Martin Schiemer, Clemens JS Schaefer, Jayden Parker Vap, Mark James Horeni, Yu Emma Wang, Juan Ye, Siddharth Joshi

    Abstract: Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge p… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.