Skip to main content

Showing 1–50 of 216 results for author: Chang, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  2. arXiv:2407.20845  [pdf, other

    cs.CV cs.HC cs.LG

    Assessing Graphical Perception of Image Embedding Models using Channel Effectiveness

    Authors: Soohyun Lee, Minsuk Chang, Seokhyeon Park, Jinwook Seo

    Abstract: Recent advancements in vision models have greatly improved their ability to handle complex chart understanding tasks, like chart captioning and question answering. However, it remains challenging to assess how these models process charts. Existing benchmarks only roughly evaluate model performance without evaluating the underlying mechanisms, such as how models extract image embeddings. This limit… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: In Proceedings of the 2024 IEEE Visualization and Visual Analytics (VIS)

  3. arXiv:2407.07331  [pdf, ps, other

    cs.CV cs.AI

    Learning with Instance-Dependent Noisy Labels by Anchor Hallucination and Hard Sample Label Correction

    Authors: Po-Hsuan Huang, Chia-Ching Lin, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

    Abstract: Learning from noisy-labeled data is crucial for real-world applications. Traditional Noisy-Label Learning (NLL) methods categorize training data into clean and noisy sets based on the loss distribution of training samples. However, they often neglect that clean samples, especially those with intricate visual patterns, may also yield substantial losses. This oversight is particularly significant in… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ICIP 2024

  4. arXiv:2406.16807  [pdf, other

    cs.LG cs.CL cs.CV

    Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

    Authors: Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, Krishnamurthy Dj Dvijotham

    Abstract: Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional co… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.13121  [pdf, other

    cs.CL cs.AI cs.IR

    Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

    Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

    Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

  6. arXiv:2405.20163  [pdf, other

    cs.CL cs.AI

    Reasoning about concepts with LLMs: Inconsistencies abound

    Authors: Rosario Uceda-Sosa, Karthikeyan Natesan Ramamurthy, Maria Chang, Moninder Singh

    Abstract: The ability to summarize and organize knowledge into abstract concepts is key to learning and reasoning. Many industrial applications rely on the consistent and systematic use of concepts, especially when dealing with decision-critical knowledge. However, we demonstrate that, when methodically questioned, large language models (LLMs) often display and demonstrate significant inconsistencies in the… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 15 pages, 5 figures, 3 tables

  7. arXiv:2404.18831  [pdf, other

    cs.CV cs.AI

    ConPro: Learning Severity Representation for Medical Images using Contrastive Learning and Preference Optimization

    Authors: Hong Nguyen, Hoang Nguyen, Melinda Chang, Hieu Pham, Shrikanth Narayanan, Michael Pazzani

    Abstract: Understanding the severity of conditions shown in images in medical diagnosis is crucial, serving as a key guide for clinical assessment, treatment, as well as evaluating longitudinal progression. This paper proposes Con- PrO: a novel representation learning method for severity assessment in medical images using Contrastive learningintegrated Preference Optimization. Different from conventional co… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 8 pages

  8. arXiv:2404.09432  [pdf, other

    cs.CV cs.AI cs.LG

    The 8th AI City Challenge

    Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

    Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

  9. arXiv:2404.06609  [pdf, other

    cs.AI cs.RO

    GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

    Authors: Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani, Sriram Yenamandra, Theophile Gervet, Matthew Chang, Zsolt Kira, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi

    Abstract: The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images. However, these navigation models often handle only a single input modality as the target. With the progress achieved so far, it is time to move towards universal navigation models capable of handling various goal types, enabling more… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  10. arXiv:2404.04289  [pdf, ps, other

    cs.AI cs.HC cs.LG

    Designing for Human-Agent Alignment: Understanding what humans want from their agents

    Authors: Nitesh Goyal, Minsuk Chang, Michael Terry

    Abstract: Our ability to build autonomous agents that leverage Generative AI continues to increase by the day. As builders and users of such agents it is unclear what parameters we need to align on before the agents start performing tasks on our behalf. To discover these parameters, we ran a qualitative empirical research study about designing agents that can negotiate during a fictional yet relatable task… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Human-AI Alignment, Human-Agent Alignment, Agents, Generative AI, Large Language Models

    ACM Class: I.2.0

  11. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  12. arXiv:2403.20327  [pdf, other

    cs.CL cs.AI

    Gecko: Versatile Text Embeddings Distilled from Large Language Models

    Authors: Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

    Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 18 pages

  13. arXiv:2403.19651  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.MM

    MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

    Authors: Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang

    Abstract: Image retrieval, i.e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures. Recent works leverage text instructions to allow users to more freely express their search intents. However, they primarily focus on image pairs that are visually similar and/or can be characterized by a sm… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: ICML 2024 (Oral); Project Website: https://open-vision-language.github.io/MagicLens/

  14. arXiv:2403.16350  [pdf, other

    eess.IV cs.CV

    3D-EffiViTCaps: 3D Efficient Vision Transformer with Capsule for Medical Image Segmentation

    Authors: Dongwei Gan, Ming Chang, Juan Chen

    Abstract: Medical image segmentation (MIS) aims to finely segment various organs. It requires grasping global information from both parts and the entire image for better segmenting, and clinically there are often certain requirements for segmentation efficiency. Convolutional neural networks (CNNs) have made considerable achievements in MIS. However, they are difficult to fully collect global context inform… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 15 pages, 4 figures, submitted to ICPR2024

  15. arXiv:2403.15013  [pdf, other

    cs.CV cs.HC

    Extracting Human Attention through Crowdsourced Patch Labeling

    Authors: Minsuk Chang, Seokhyeon Park, Hyeon Jeon, Aeri Cho, Soohyun Lee, Jinwook Seo

    Abstract: In image classification, a significant problem arises from bias in the datasets. When it contains only specific types of images, the classifier begins to rely on shortcuts - simplistic and erroneous rules for decision-making. This leads to high performance on the training dataset but inferior results on new, varied images, as the classifier's generalization capability is reduced. For example, if t… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 21 pages, 11 figures

  16. arXiv:2403.11573  [pdf, other

    cs.CV

    Just Add $100 More: Augmenting NeRF-based Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem

    Authors: Mincheol Chang, Siyeong Lee, Jinkyu Kim, Namil Kim

    Abstract: Typical LiDAR-based 3D object detection models are trained in a supervised manner with real-world data collection, which is often imbalanced over classes (or long-tailed). To deal with it, augmenting minority-class examples by sampling ground truth (GT) LiDAR points from a database and pasting them into a scene of interest is often used, but challenges still remain: inflexibility in locating GT sa… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 28 pages, 12 figures, 11 tables

  17. arXiv:2403.09704  [pdf, other

    cs.CL cs.AI cs.LG

    Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

    Authors: Swapnaja Achintalwar, Ioana Baldini, Djallel Bouneffouf, Joan Byamugisha, Maria Chang, Pierre Dognin, Eitan Farchi, Ndivhuwo Makondo, Aleksandra Mojsilovic, Manish Nagireddy, Karthikeyan Natesan Ramamurthy, Inkit Padhi, Orna Raz, Jesus Rios, Prasanna Sattigeri, Moninder Singh, Siphiwe Thwala, Rosario A. Uceda-Sosa, Kush R. Varshney

    Abstract: The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentia… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures

  18. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  19. arXiv:2403.02363  [pdf, other

    cs.LG cs.AI

    Addressing Long-Tail Noisy Label Learning Problems: a Two-Stage Solution with Label Refurbishment Considering Label Rarity

    Authors: Ying-Hsuan Wu, Jun-Wei Hsieh, Li Xin, Shin-You Teng, Yi-Kuan Hsieh, Ming-Ching Chang

    Abstract: Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining s… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  20. arXiv:2402.17768  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning

    Authors: Xiaoyu Zhang, Matthew Chang, Pranav Kumar, Saurabh Gupta

    Abstract: A common failure mode for policies trained with imitation is compounding execution errors at test time. When the learned policy encounters states that are not present in the expert demonstrations, the policy fails, leading to degenerate behavior. The Dataset Aggregation, or DAgger approach to this problem simply collects more data to cover these failure states. However, in practice, this is often… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by Robotics: Science and Systems (RSS) 2024. project website with video, see https://sites.google.com/view/diffusion-meets-dagger

  21. arXiv:2402.12627  [pdf, other

    cs.LG cs.AI cs.CV

    A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective

    Authors: Jeng-Lin Li, Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

    Abstract: Recent artificial intelligence (AI) technologies show remarkable evolution in various academic fields and industries. However, in the real world, dynamic data lead to principal challenges for deploying AI models. An unexpected data change brings about severe performance degradation in AI models. We identify two major related research fields, domain shift and concept drift according to the setting… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  22. arXiv:2402.10524  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

    Authors: Minsuk Kahng, Ian Tenney, Mahima Pushkarna, Michael Xieyang Liu, James Wexler, Emily Reif, Krystal Kallarackal, Minsuk Chang, Michael Terry, Lucas Dixon

    Abstract: Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluat… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  23. arXiv:2401.08422  [pdf, other

    cs.CV

    Improving Limited Supervised Foot Ulcer Segmentation Using Cross-Domain Augmentation

    Authors: Shang-Jui Kuo, Po-Han Huang, Chia-Ching Lin, Jeng-Lin Li, Ming-Ching Chang

    Abstract: Diabetic foot ulcers pose health risks, including higher morbidity, mortality, and amputation rates. Monitoring wound areas is crucial for proper care, but manual segmentation is subjective due to complex wound features and background variation. Expert annotations are costly and time-intensive, thus hampering large dataset creation. Existing segmentation models relying on extensive annotations are… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 5 pages, 2 figures, accepted by ICASSP 2024

  24. Synthetic Information towards Maximum Posterior Ratio for deep learning on Imbalanced Data

    Authors: Hung Nguyen, Morris Chang

    Abstract: This study examines the impact of class-imbalanced data on deep learning models and proposes a technique for data balancing by generating synthetic data for the minority class. Unlike random-based oversampling, our method prioritizes balancing the informative regions by identifying high entropy samples. Generating well-placed synthetic data can enhance machine learning algorithms accuracy and effi… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transaction on Artificial Intelligence

    Journal ref: IEEE Transaction on Artificial Intelligence 2023

  25. arXiv:2401.02586  [pdf, other

    cs.LG cs.AI

    Federated Learning for distribution skewed data using sample weights

    Authors: Hung Nguyen, Peiyuan Wu, Morris Chang

    Abstract: One of the most challenging issues in federated learning is that the data is often not independent and identically distributed (nonIID). Clients are expected to contribute the same type of data and drawn from one global distribution. However, data are often collected in different ways from different resources. Thus, the data distributions among clients might be different from the underlying global… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transaction on Artificial Intelligence

    Journal ref: IEEE Transaction on Artificial Intelligence 2023

  26. arXiv:2401.01952  [pdf, other

    cs.CV cs.AI cs.CL

    Instruct-Imagen: Image Generation with Multi-modal Instruction

    Authors: Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia

    Abstract: This paper presents instruct-imagen, a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks. We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision. It uses natural language to amalgamate disparate modalities (e.g., text, edge, style, subject, etc.), such that abundant gener… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 20 pages, 18 figures

  27. arXiv:2312.16771  [pdf, other

    cs.CV

    Scale-Aware Crowd Count Network with Annotation Error Correction

    Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Li Xin

    Abstract: Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varyi… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: 7 pages, 6 figues. arXiv admin note: text overlap with arXiv:2211.06835

  28. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  29. arXiv:2312.06583  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    3D Hand Pose Estimation in Egocentric Images in the Wild

    Authors: Aditya Prakash, Ruisen Tu, Matthew Chang, Saurabh Gupta

    Abstract: We present WildHands, a method for 3D hand pose estimation in egocentric images in the wild. This is challenging due to (a) lack of 3D hand pose annotations for images in the wild, and (b) a form of perspective distortion-induced shape ambiguity that arises in the analysis of crops around hands. For the former, we use auxiliary supervision on in-the-wild data in the form of segmentation masks & gr… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: https://ap229997.github.io/projects/hands/

  30. A New Benchmark and Model for Challenging Image Manipulation Detection

    Authors: Zhenfei Zhang, Mingyang Li, Ming-Ching Chang

    Abstract: The ability to detect manipulation in multimedia data is vital in digital forensics. Existing Image Manipulation Detection (IMD) methods are mainly based on detecting anomalous features arisen from image editing or double compression artifacts. All existing IMD techniques encounter challenges when it comes to detecting small tampered regions from a large image. Moreover, compression-based IMD appr… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: 9 pages, 6 figures, 3 tabels. AAAI-24

  31. arXiv:2311.12176  [pdf, ps, other

    cs.IT

    Covert Online Decision Making: From Sequential Hypothesis Testing to Stochastic Bandits

    Authors: Meng-Che Chang, Matthieu R. Bloch

    Abstract: We study the problem of covert online decision-making in which an agent attempts to identify a parameter governing a system by probing the system while escaping detection from an adversary. The system is modeled as a Markov kernel whose input is controlled by the agent and whose two outputs are observed by the agent and the adversary, respectively. This problem is motivated by applications such as… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  32. arXiv:2311.06430  [pdf, other

    cs.RO

    GOAT: GO to Any Thing

    Authors: Matthew Chang, Theophile Gervet, Mukul Khanna, Sriram Yenamandra, Dhruv Shah, So Yeon Min, Kavit Shah, Chris Paxton, Saurabh Gupta, Dhruv Batra, Roozbeh Mottaghi, Jitendra Malik, Devendra Singh Chaplot

    Abstract: In deployment scenarios such as homes and warehouses, mobile robots are expected to autonomously navigate for extended periods, seamlessly executing tasks articulated in terms that are intuitively understandable by human operators. We present GO To Any Thing (GOAT), a universal navigation system capable of tackling these requirements with three key features: a) Multimodal: it can tackle goals spec… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  33. arXiv:2310.11257  [pdf, other

    cs.CV cs.MM cs.RO

    An empirical study of automatic wildlife detection using drone thermal imaging and object detection

    Authors: Miao Chang, Tan Vuong, Manas Palaparthi, Lachlan Howell, Alessio Bonti, Mohamed Abdelrazek, Duc Thanh Nguyen

    Abstract: Artificial intelligence has the potential to make valuable contributions to wildlife management through cost-effective methods for the collection and interpretation of wildlife data. Recent advances in remotely piloted aircraft systems (RPAS or ``drones'') and thermal imaging technology have created new approaches to collect wildlife data. These emerging technologies could provide promising altern… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  34. arXiv:2310.10021  [pdf, other

    cs.RO cs.AI cs.LG

    Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

    Authors: Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren, Minsuk Chang, Shao-Hua Sun, Joseph J. Lim

    Abstract: We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by growing a learned skill library with minimal supervision. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accompli… ▽ More

    Submitted 17 October, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

    Comments: CoRL 2023 (Oral); 24 pages, 11 figures

  35. arXiv:2310.01505  [pdf, other

    cs.AI

    Towards Automatic Design of Factorio Blueprints

    Authors: Sean Patterson, Joan Espasa, Mun See Chang, Ruth Hoffmann

    Abstract: Factorio is a 2D construction and management simulation video game about building automated factories to produce items of increasing complexity. A core feature of the game is its blueprint system, which allows players to easily save and replicate parts of their designs. Blueprints can reproduce any layout of objects in the game, but are typically used to encapsulate a complex behaviour, such as th… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  36. arXiv:2309.02640  [pdf, other

    cs.LG cs.CL

    Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine Translation

    Authors: Keyu Chen, Di Zhuang, Mingchen Li, J. Morris Chang

    Abstract: Neural Machine Translation (NMT) models have become successful, but their performance remains poor when translating on new domains with a limited number of data. In this paper, we present a novel approach Epi-Curriculum to address low-resource domain adaptation (DA), which contains a new episodic training framework along with denoised curriculum learning. Our episodic training framework enhances t… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  37. arXiv:2308.12817  [pdf, other

    cs.CV

    MixNet: Toward Accurate Detection of Challenging Scene Text in the Wild

    Authors: Yu-Xiang Zeng, Jun-Wei Hsieh, Xin Li, Ming-Ching Chang

    Abstract: Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors. We present MixNet, a hybrid architecture that combines the strengths of CNNs and Transformers, capable of accurately detecting small text from challenging natural scenes, regardless of the orientations, styles, and lighting… ▽ More

    Submitted 27 August, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

  38. arXiv:2308.07897  [pdf, other

    cond-mat.mtrl-sci cs.AI

    Probabilistic Phase Labeling and Lattice Refinement for Autonomous Material Research

    Authors: Ming-Chiang Chang, Sebastian Ament, Maximilian Amsler, Duncan R. Sutherland, Lan Zhou, John M. Gregoire, Carla P. Gomes, R. Bruce van Dover, Michael O. Thompson

    Abstract: X-ray diffraction (XRD) is an essential technique to determine a material's crystal structure in high-throughput experimentation, and has recently been incorporated in artificially intelligent agents in autonomous scientific discovery processes. However, rapid, automated and reliable analysis method of XRD data matching the incoming data rate remains a major challenge. To address these issues, we… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 13 pages, 6 figures

  39. arXiv:2307.07096  [pdf, other

    eess.AS cs.SD

    Low Rank Properties for Estimating Microphones Start Time and Sources Emission Time

    Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhijing Yang, S. M. Ahsan Kazmiand Yingxiu Chang

    Abstract: Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LR… ▽ More

    Submitted 21 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: 13 pages for main content; 9 pages for proof of proposed low rank properties; 13 figures

  40. arXiv:2306.10376  [pdf, other

    cs.RO cs.AI

    CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents

    Authors: Jeongeun Park, Seungwon Lim, Joonhyung Lee, Sangbeom Park, Minsuk Chang, Youngjae Yu, Sungjoon Choi

    Abstract: In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified a… ▽ More

    Submitted 26 June, 2024; v1 submitted 17 June, 2023; originally announced June 2023.

    Comments: Project Page: https://clararobot.github.io/

  41. arXiv:2306.06542  [pdf, ps, other

    cs.HC cs.CY cs.LG

    Investigating Practices and Opportunities for Cross-functional Collaboration around AI Fairness in Industry Practice

    Authors: Wesley Hanwen Deng, Nur Yildirim, Monica Chang, Motahhare Eslami, Ken Holstein, Michael Madaio

    Abstract: An emerging body of research indicates that ineffective cross-functional collaboration -- the interdisciplinary work done by industry practitioners across roles -- represents a major barrier to addressing issues of fairness in AI design and development. In this research, we sought to better understand practitioners' current practices and tactics to enact cross-functional collaboration for AI fairn… ▽ More

    Submitted 10 June, 2023; originally announced June 2023.

    Comments: In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23)

  42. arXiv:2305.17449  [pdf, ps, other

    cs.CV

    FishEye8K: A Benchmark and Dataset for Fisheye Camera Object Detection

    Authors: Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Erkhembayar Ganbold, Jun-Wei Hsieh, Ming-Ching Chang, Ping-Yang Chen, Byambaa Dorj, Hamad Al Jassmi, Ganzorig Batnasan, Fady Alnajjar, Mohammed Abduljabbar, Fang-Pang Lin

    Abstract: With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open… ▽ More

    Submitted 6 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: CVPR Workshops 2023

  43. arXiv:2305.17262  [pdf, other

    cs.CV cs.AI

    Im-Promptu: In-Context Composition from Image Prompts

    Authors: Bhishma Dedhia, Michael Chang, Jake C. Snell, Thomas L. Griffiths, Niraj K. Jha

    Abstract: Large language models are few-shot learners that can solve diverse tasks from a handful of demonstrations. This implicit understanding of tasks suggests that the attention mechanisms over word tokens may play a role in analogical reasoning. In this work, we investigate whether analogical reasoning can enable in-context composition over composable elements of visual stimuli. First, we introduce a s… ▽ More

    Submitted 22 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  44. arXiv:2305.16301  [pdf, other

    cs.CV cs.LG cs.RO

    Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos

    Authors: Matthew Chang, Aditya Prakash, Saurabh Gupta

    Abstract: The analysis and use of egocentric videos for robotic tasks is made challenging by occlusion due to the hand and the visual mismatch between the human hand and a robot end-effector. In this sense, the human hand presents a nuisance. However, often hands also provide a valuable signal, e.g. the hand pose may suggest what kind of object is being held. In this work, we propose to extract a factored r… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: for project website with video, see https://matthewchang.github.io/vidm/

  45. arXiv:2305.11694  [pdf, other

    cs.CL

    QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations

    Authors: Chaitanya Malaviya, Peter Shaw, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

    Abstract: Formulating selective information needs results in queries that implicitly specify set operations, such as intersection, union, and difference. For instance, one might search for "shorebirds that are not sandpipers" or "science-fiction films shot in England". To study the ability of retrieval systems to meet such information needs, we construct QUEST, a dataset of 3357 natural language queries wit… ▽ More

    Submitted 31 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: ACL 2023; Dataset available at https://github.com/google-research/language/tree/master/language/quest

  46. arXiv:2305.03036  [pdf, other

    cs.CV cs.AI cs.LG

    Learning Hand-Held Object Reconstruction from In-The-Wild Videos

    Authors: Aditya Prakash, Matthew Chang, Matthew Jin, Saurabh Gupta

    Abstract: Prior works for reconstructing hand-held objects from a single image rely on direct 3D shape supervision which is challenging to gather in real world at scale. Consequently, these approaches do not generalize well when presented with novel objects in in-the-wild settings. While 3D supervision is a major bottleneck, there is an abundance of in-the-wild raw video data showing hand-object interaction… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: Project Webpage: https://ap229997.github.io/projects/horse/

  47. arXiv:2304.09823  [pdf, other

    cs.CY cs.AI

    The Future of ChatGPT-enabled Labor Market: A Preliminary Study in China

    Authors: Lan Chen, Xi Chen, Shiyu Wu, Yaqi Yang, Meng Chang, Hengshu Zhu

    Abstract: As a phenomenal large language model, ChatGPT has achieved unparalleled success in various real-world tasks and increasingly plays an important role in our daily lives and work. However, extensive concerns are also raised about the potential ethical issues, especially about whether ChatGPT-like artificial general intelligence (AGI) will replace human jobs. To this end, in this paper, we introduce… ▽ More

    Submitted 4 November, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

  48. arXiv:2304.07500  [pdf, other

    cs.CV

    The 7th AI City Challenge

    Authors: Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Sanjita Prajapati, Alice Li, Shangru Li, Krishna Kunadharaju, Shenxin Jiang, Rama Chellappa

    Abstract: The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential. The 2023 challenge had five tracks, which drew a record-breaking number of participation requests from 508 teams across 46 countries. Track 1 was a brand new track that… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: Summary of the 7th AI City Challenge Workshop in conjunction with CVPR 2023

  49. arXiv:2304.04947  [pdf, other

    cs.CL

    Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

    Authors: Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li, Yu Zhang, Ming-Wei Chang

    Abstract: We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-w… ▽ More

    Submitted 26 November, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: NeurIPS camera ready version

  50. arXiv:2304.01982  [pdf, other

    cs.CL cs.IR

    Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

    Authors: Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Y. Zhao

    Abstract: Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks. However, their non-linear scoring function cannot be scaled to millions of documents, necessitating a three-stage process for inference: retrieving initial candidates via token retrieval,… ▽ More

    Submitted 8 April, 2024; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023. Code available at https://github.com/google-deepmind/xtr