Skip to main content

Showing 1–50 of 375 results for author: Cheng, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04557  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates

    Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

    Abstract: Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 512 pages total, 4 main figures + 218 supplementary figures

  2. arXiv:2407.04305  [pdf, other

    cs.CV

    Towards Stable 3D Object Detection

    Authors: Jiabao Wang, Qiang Meng, Guochao Liu, Liujiang Yan, Ke Wang, Ming-Ming Cheng, Qibin Hou

    Abstract: In autonomous driving, the temporal stability of 3D object detection greatly impacts the driving safety. However, the detection stability cannot be accessed by existing metrics such as mAP and MOTA, and consequently is less explored by the community. To bridge this gap, this work proposes Stability Index (SI), a new metric that can comprehensively evaluate the stability of 3D detectors in terms of… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  3. arXiv:2407.04179  [pdf, other

    cs.CL

    Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

    Authors: Xinglin Li, Xianwen He, Yao Li, Minhao Cheng

    Abstract: Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  4. arXiv:2407.00256  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

    Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

    Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

    MSC Class: 68T01

    Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

  5. arXiv:2406.17806  [pdf, other

    cs.CL cs.AI cs.CR cs.CV cs.LG

    MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

    Authors: Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh

    Abstract: Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  6. arXiv:2406.04727  [pdf, other

    cs.LG cond-mat.soft cs.AI

    Predicting Polymer Properties Based on Multimodal Multitask Pretraining

    Authors: Fanmeng Wang, Wentao Guo, Minjie Cheng, Shen Yuan, Hongteng Xu, Zhifeng Gao

    Abstract: In the past few decades, polymers, high-molecular-weight compounds formed by bonding numerous identical or similar monomers covalently, have played an essential role in various scientific fields. In this context, accurate prediction of their properties is becoming increasingly crucial. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highl… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  7. arXiv:2406.02965  [pdf, other

    cs.CV

    Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

    Abstract: The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  8. arXiv:2406.01970  [pdf, other

    cs.CV cs.AI

    The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Boqing Gong, Cho-Jui Hsieh, Minhao Cheng

    Abstract: Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  9. arXiv:2406.00816  [pdf, other

    cs.LG cs.CR cs.CV

    Invisible Backdoor Attacks on Diffusion Models

    Authors: Sen Li, Junchi Ma, Minhao Cheng

    Abstract: In recent years, diffusion models have achieved remarkable success in the realm of high-quality image generation, garnering increased attention. This surge in interest is paralleled by a growing concern over the security threats associated with diffusion models, largely attributed to their susceptibility to malicious exploitation. Notably, recent research has brought to light the vulnerability of… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/invisibleTriggerDiffusion/invisible_triggers_for_diffusion

  10. arXiv:2406.00670  [pdf, other

    cs.CV

    Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

    Authors: Yunheng Li, ZhongYu Li, Quansheng Zeng, Qibin Hou, Ming-Ming Cheng

    Abstract: Pre-trained vision-language models, e.g., CLIP, have been successfully applied to zero-shot semantic segmentation. Existing CLIP-based approaches primarily utilize visual features from the last layer to align with text embeddings, while they neglect the crucial information in intermediate layers that contain rich object details. However, we find that directly aggregating the multi-level visual fea… ▽ More

    Submitted 6 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  11. arXiv:2405.18991  [pdf, other

    cs.CV cs.CL cs.MM

    EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

    Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

    Abstract: This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the producti… ▽ More

    Submitted 5 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures

  12. arXiv:2405.11430  [pdf, other

    cs.CL

    MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

    Authors: Jianbo Dai, Jianqiao Lu, Yunlong Feng, Rongju Ruan, Ming Cheng, Haochen Tan, Zhijiang Guo

    Abstract: Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and fo… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 39 pages, dataset and code are available at https://github.com/SparksofAGI/MHPP

  13. arXiv:2405.06975  [pdf, other

    cs.LG

    Input Snapshots Fusion for Scalable Discrete Dynamic Graph Nerual Networks

    Authors: QingGuo Qi, Hongyang Chen, Minhao Cheng, Han Liu

    Abstract: Dynamic graphs are ubiquitous in the real world, yet there is a lack of suitable theoretical frameworks to effectively extend existing static graph models into the temporal domain. Additionally, for link prediction tasks on discrete dynamic graphs, the requirement of substantial GPU memory to store embeddings of all nodes hinders the scalability of existing models. In this paper, we introduce an I… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  14. arXiv:2405.01434  [pdf, other

    cs.CV

    StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

    Authors: Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou

    Abstract: For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  15. arXiv:2405.00390  [pdf, other

    cs.CL

    CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

    Authors: Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen

    Abstract: Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed… ▽ More

    Submitted 20 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: ACL 2024

  16. arXiv:2404.12605  [pdf, other

    cs.AI

    GluMarker: A Novel Predictive Modeling of Glycemic Control Through Digital Biomarkers

    Authors: Ziyi Zhou, Ming Cheng, Xingjian Diao, Yanjun Cui, Xiangling Li

    Abstract: The escalating prevalence of diabetes globally underscores the need for diabetes management. Recent research highlights the growing focus on digital biomarkers in diabetes management, with innovations in computational frameworks and noninvasive monitoring techniques using personalized glucose metrics. However, they predominantly focus on insulin dosing and specific glucose values, or with limited… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  17. arXiv:2404.12400  [pdf, other

    cs.LG

    Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning

    Authors: Ming Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, Xingjian Diao

    Abstract: In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  18. arXiv:2404.11924  [pdf, other

    cs.AI

    Toward Short-Term Glucose Prediction Solely Based on CGM Time Series

    Authors: Ming Cheng, Xingjian Diao, Ziyi Zhou, Yanjun Cui, Wenjun Liu, Shitong Cheng

    Abstract: The global diabetes epidemic highlights the importance of maintaining good glycemic control. Glucose prediction is a fundamental aspect of diabetes management, facilitating real-time decision-making. Recent research has introduced models focusing on long-term glucose trend prediction, which are unsuitable for real-time decision-making and result in delayed responses. Conversely, models designed to… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  19. arXiv:2404.10901  [pdf, other

    cs.AI

    CrossGP: Cross-Day Glucose Prediction Excluding Physiological Information

    Authors: Ziyi Zhou, Ming Cheng, Yanjun Cui, Xingjian Diao, Zhaorui Ma

    Abstract: The increasing number of diabetic patients is a serious issue in society today, which has significant negative impacts on people's health and the country's financial expenditures. Because diabetes may develop into potential serious complications, early glucose prediction for diabetic patients is necessary for timely medical treatment. Existing glucose prediction methods typically utilize patients'… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  20. arXiv:2404.09419  [pdf

    cs.CE

    Predicting Accurate Hot Spots in a More Than Ten-Thousand-Core GPU with a Million-Time Speedup over FEM Enabled by a Physics-based Learning Algorithm

    Authors: Lin Jian, Yu Liu, Ming-Cheng Cheng

    Abstract: The classical proper orthogonal decomposition (POD) with the Galerkin projection (GP) has been revised for chip-level thermal simulation of microprocessors with a large number of cores. An ensemble POD-GP methodology (EnPOD-GP) is introduced to significantly improve the training effectiveness and prediction accuracy by dividing a large number of heat sources into heat source blocks (HSBs) each of… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 8 pages, 8 figures

  21. arXiv:2404.09403  [pdf, other

    cs.LG

    Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

    Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

    Abstract: Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most tra… ▽ More

    Submitted 22 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: Accepted by ICLR 2024. Camera Ready Version

  22. arXiv:2404.08021  [pdf, other

    cs.LG cs.AI cs.RO

    VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning

    Authors: Ming Cheng, Bowen Zhang, Ziyu Wang, Ziyi Zhou, Weiqi Feng, Yi Lyu, Xingjian Diao

    Abstract: Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments. Existing work on the trajectory similarity search task primarily utilizes sequence-processing algorithms or Recurrent Neural Networks (RNNs), which suffer f… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  23. arXiv:2404.01651  [pdf, other

    cs.CL cs.CY cs.HC cs.SI

    NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps

    Authors: Kristina Gligoric, Myra Cheng, Lucia Zheng, Esin Durmus, Dan Jurafsky

    Abstract: The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word. Here we show that computationally modeling this use-mention distinction is crucial for dealing with counterspeech online. Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself (… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 (Main conference)

  24. arXiv:2404.00146  [pdf, ps, other

    cs.CV math.OC

    Fast OMP for Exact Recovery and Sparse Approximation

    Authors: Huiyuan Yu, Jia He, Maggie Cheng

    Abstract: Orthogonal Matching Pursuit (OMP) has been a powerful method in sparse signal recovery and approximation. However OMP suffers computational issue when the signal has large number of non-zeros. This paper advances OMP in two fronts: it offers a fast algorithm for the orthogonal projection of the input signal at each iteration, and a new selection criterion for making the greedy choice, which reduce… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  25. arXiv:2403.18469  [pdf, other

    cs.CV cs.AI

    Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds

    Authors: Zhimin Yuan, Wankang Zeng, Yanfei Su, Weiquan Liu, Ming Cheng, Yulan Guo, Cheng Wang

    Abstract: 3D synthetic-to-real unsupervised domain adaptive segmentation is crucial to annotating new domains. Self-training is a competitive approach for this task, but its performance is limited by different sensor sampling patterns (i.e., variations in point density) and incomplete training strategies. In this work, we propose a density-guided translator (DGT), which translates point density between doma… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  26. arXiv:2403.18383  [pdf, other

    cs.CV cs.AI cs.LG

    Generative Multi-modal Models are Good Class-Incremental Learners

    Authors: Xusheng Cao, Haori Lu, Linlan Huang, Xialei Liu, Ming-Ming Cheng

    Abstract: In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of discriminative models. With the growing popularity of the generative multi-modal models, we would explore replacing discriminative models with generative ones for CIL. H… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  27. arXiv:2403.12372  [pdf, other

    cs.LG

    Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model

    Authors: Mingyue Cheng, Xiaoyu Tao, Qi Liu, Hao Zhang, Yiheng Chen, Chenyi Lei

    Abstract: Advancements in self-supervised pre-training (SSL) have significantly advanced the field of learning transferable time series representations, which can be very useful in enhancing the downstream task. Despite being effective, most existing works struggle to achieve cross-domain SSL pre-training, missing valuable opportunities to integrate patterns and features from different domains. The main cha… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  28. arXiv:2403.12371  [pdf, other

    cs.LG

    Advancing Time Series Classification with Multimodal Language Modeling

    Authors: Mingyue Cheng, Yiheng Chen, Qi Liu, Zhiding Liu, Yucong Luo

    Abstract: For the advancements of time series classification, scrutinizing previous studies, most existing methods adopt a common learning-to-classify paradigm - a time series classifier model tries to learn the relation between sequence inputs and target label encoded by one-hot distribution. Although effective, this paradigm conceals two inherent limitations: (1) encoding target categories with one-hot di… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  29. arXiv:2403.11735  [pdf, other

    cs.CV cs.LG

    LSKNet: A Foundation Lightweight Backbone for Remote Sensing

    Authors: Yuxuan Li, Xiang Li, Yimian Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming-Ming Cheng, Jian Yang

    Abstract: Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote se… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.09030

  30. arXiv:2403.10893  [pdf, other

    cs.CR

    A Watermark-Conditioned Diffusion Model for IP Protection

    Authors: Rui Min, Sen Li, Hongyang Chen, Minhao Cheng

    Abstract: The ethical need to protect AI-generated content has been a significant concern in recent years. While existing watermarking strategies have demonstrated success in detecting synthetic content (detection), there has been limited exploration in identifying the users responsible for generating these outputs from a single model (owner identification). In this paper, we focus on both practical scenari… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  31. arXiv:2403.09974  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

    Authors: Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, Ming-Ming Cheng

    Abstract: Given unlabelled datasets containing both old and new categories, generalized category discovery (GCD) aims to accurately discover new classes while correctly classifying old classes, leveraging the class concepts learned from labeled samples. Current GCD methods only use a single visual modality of information, resulting in poor classification of visually similar classes. As a different modality,… ▽ More

    Submitted 10 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  32. Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform

    Authors: Mingyue Cheng, Hao Zhang, Jiqian Yang, Qi Liu, Li Li, Xin Huang, Liwei Song, Zhi Li, Zhenya Huang, Enhong Chen

    Abstract: Large language model evaluation plays a pivotal role in the enhancement of its capacity. Previously, numerous methods for evaluating large language models have been proposed in this area. Despite their effectiveness, these existing works mainly focus on assessing objective questions, overlooking the capability to evaluate subjective questions which is extremely common for large language models. Ad… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  33. arXiv:2403.07623  [pdf, other

    cs.IR

    Empowering Sequential Recommendation from Collaborative Signals and Semantic Relatedness

    Authors: Mingyue Cheng, Hao Zhang, Qi Liu, Fajie Yuan, Zhi Li, Zhenya Huang, Enhong Chen, Jun Zhou, Longfei Li

    Abstract: Sequential recommender systems (SRS) could capture dynamic user preferences by modeling historical behaviors ordered in time. Despite effectiveness, focusing only on the \textit{collaborative signals} from behaviors does not fully grasp user interests. It is also significant to model the \textit{semantic relatedness} reflected in content features, e.g., images and text. Towards that end, in this p… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  34. arXiv:2403.07032  [pdf, other

    cs.CV cs.AI

    STARFlow: Spatial Temporal Feature Re-embedding with Attentive Learning for Real-world Scene Flow

    Authors: Zhiyang Lu, Qinghan Chen, Ming Cheng

    Abstract: Scene flow prediction is a crucial underlying task in understanding dynamic scenes as it offers fundamental motion information. However, contemporary scene flow methods encounter three major challenges. Firstly, flow estimation solely based on local receptive fields lacks long-dependency matching of point pairs. To address this issue, we propose global attentive flow embedding to match all-to-all… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 10 pages, 8 figures, CVPR template

  35. arXiv:2403.06534  [pdf, other

    cs.CV cs.AI cs.CE cs.LG

    SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection

    Authors: Yuxuan Li, Xiang Li, Weijie Li, Qibin Hou, Li Liu, Ming-Ming Cheng, Jian Yang

    Abstract: Synthetic Aperture Radar (SAR) object detection has gained significant attention recently due to its irreplaceable all-weather imaging capabilities. However, this research field suffers from both limited public datasets (mostly comprising <2K images with only mono-category objects) and inaccessible source code. To tackle these challenges, we establish a new benchmark dataset and an open-source met… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 22 Pages, 10 Figures, 9 Tables

  36. arXiv:2403.05738  [pdf, other

    cs.LG cs.GT

    Provable Policy Gradient Methods for Average-Reward Markov Potential Games

    Authors: Min Cheng, Ruida Zhou, P. R. Kumar, Chao Tian

    Abstract: We study Markov potential games under the infinite horizon average reward criterion. Most previous studies have been for discounted rewards. We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion. To set the stage for gradient-based methods, we first establish that the avera… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 38 pages, 7 figures, published to AISTAT-24

  37. arXiv:2403.01971  [pdf, other

    cs.SE

    ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs

    Authors: Jiaolong Kong, Mingfei Cheng, Xiaofei Xie, Shangqing Liu, Xiaoning Du, Qi Guo

    Abstract: Automated Program Repair (APR) aims to automatically generate patches for rectifying software bugs. Recent strides in Large Language Models (LLM), such as ChatGPT, have yielded encouraging outcomes in APR, especially within the conversation-driven APR framework. Nevertheless, the efficacy of conversation-driven APR is contingent on the quality of the feedback information. In this paper, we propose… ▽ More

    Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  38. arXiv:2403.01966  [pdf, other

    cs.CV

    Enhancing Information Maximization with Distance-Aware Contrastive Learning for Source-Free Cross-Domain Few-Shot Learning

    Authors: Huali Xu, Li Liu, Shuaifeng Zhi, Shaojing Fu, Zhuo Su, Ming-Ming Cheng, Yongxiang Liu

    Abstract: Existing Cross-Domain Few-Shot Learning (CDFSL) methods require access to source domain data to train a model in the pre-training phase. However, due to increasing concerns about data privacy and the desire to reduce data transmission and training costs, it is necessary to develop a CDFSL solution without accessing source data. For this reason, this paper explores a Source-Free CDFSL (SF-CDFSL) pr… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by TIP, 16 pages, 11 figures, 8 tables

  39. arXiv:2403.01700  [pdf, other

    cs.SD cs.MM eess.AS

    Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

    Authors: Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

    Abstract: In recent years, neural network-based Wake Word Spotting achieves good performance on clean audio samples but struggles in noisy environments. Audio-Visual Wake Word Spotting (AVWWS) receives lots of attention because visual lip movement information is not affected by complex acoustic scenes. Previous works usually use simple addition or concatenation for multi-modal fusion. The inter-modal correl… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted by ICASSP 2024

  40. arXiv:2403.01493  [pdf, other

    cs.LG

    ConvTimeNet: A Deep Hierarchical Fully Convolutional Model for Multivariate Time Series Analysis

    Authors: Mingyue Cheng, Jiqian Yang, Tingyue Pan, Qi Liu, Zhi Li

    Abstract: This paper introduces ConvTimeNet, a novel deep hierarchical fully convolutional network designed to serve as a general-purpose model for time series analysis. The key design of this network is twofold, designed to overcome the limitations of traditional convolutional networks. Firstly, we propose an adaptive segmentation of time series into sub-series level patches, treating these as fundamental… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  41. arXiv:2402.18393  [pdf, other

    cs.AI cs.NE cs.RO cs.SE

    Evaluating Decision Optimality of Autonomous Driving via Metamorphic Testing

    Authors: Mingfei Cheng, Yuan Zhou, Xiaofei Xie, Junjie Wang, Guozhu Meng, Kairui Yang

    Abstract: Autonomous Driving System (ADS) testing is crucial in ADS development, with the current primary focus being on safety. However, the evaluation of non-safety-critical performance, particularly the ADS's ability to make optimal decisions and produce optimal paths for autonomous vehicles (AVs), is equally vital to ensure the intelligence and reduce risks of AVs. Currently, there is little work dedica… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  42. arXiv:2402.17403  [pdf, other

    cs.CV

    Sora Generates Videos with Stunning Geometrical Consistency

    Authors: Xuanyi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou, Ming-Ming Cheng

    Abstract: The recently developed Sora model [1] has exhibited remarkable capabilities in video generation, sparking intense discussions regarding its ability to simulate real-world phenomena. Despite its growing popularity, there is a lack of established metrics to evaluate its fidelity to real-world physics quantitatively. In this paper, we introduce a new benchmark that assesses the quality of the generat… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 5 pages, 3 figures

  43. arXiv:2402.16914  [pdf, other

    cs.CR cs.AI cs.CL

    DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers

    Authors: Xirui Li, Ruochen Wang, Minhao Cheng, Tianyi Zhou, Cho-Jui Hsieh

    Abstract: The safety alignment of Large Language Models (LLMs) is vulnerable to both manual and automated jailbreak attacks, which adversarially trigger LLMs to output harmful content. However, current methods for jailbreaking LLMs, which nest entire harmful prompts, are not effective at concealing malicious intent and can be easily identified and rejected by well-aligned LLMs. This paper discovers that dec… ▽ More

    Submitted 1 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  44. Generative Pretrained Hierarchical Transformer for Time Series Forecasting

    Authors: Zhiding Liu, Jiqian Yang, Mingyue Cheng, Yucong Luo, Zhi Li

    Abstract: Recent efforts have been dedicated to enhancing time series forecasting accuracy by introducing advanced network architectures and self-supervised pretraining strategies. Nevertheless, existing approaches still exhibit two critical drawbacks. Firstly, these methods often rely on a single dataset for training, limiting the model's generalizability due to the restricted scale of the training data. S… ▽ More

    Submitted 17 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted by KDD'24 Research Track

  45. arXiv:2402.15751  [pdf, other

    cs.LG cs.AI cs.CL

    Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

    Authors: Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You

    Abstract: While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, the quality of gradient est… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  46. arXiv:2402.12928  [pdf, other

    cs.DL cs.AI cs.CV

    A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence

    Authors: Penghai Zhao, Xin Zhang, Ming-Ming Cheng, Jian Yang, Xiang Li

    Abstract: By consolidating scattered knowledge, the literature review provides a comprehensive understanding of the investigated topic. However, reading, conducting, or peer-reviewing review papers generally demands a significant investment of time and effort from researchers. To improve efficiency, this paper aims to provide a thorough review of reviews in the PAMI field from diverse perspectives. First, t… ▽ More

    Submitted 24 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: IEEE version v1. [February 19, 2024] IEEE version v2 with typos fixed. [February 23, 2024] IEEE version v3 with errors fixed. [February 29, 2024] IEEE version v4 with improved quaility. [February 29, 2024]

  47. arXiv:2402.12741  [pdf, other

    cs.CV

    MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion

    Authors: Sen Li, Ruochen Wang, Cho-Jui Hsieh, Minhao Cheng, Tianyi Zhou

    Abstract: Existing text-to-image models still struggle to generate images of multiple objects, especially in handling their spatial positions, relative sizes, overlapping, and attribute bindings. To efficiently address these challenges, we develop a training-free Multimodal-LLM agent (MuLan), as a human painter, that can progressively generate multi-object with intricate planning and feedback control. MuLan… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Added the application to human-agent interaction; added discussion with concurrent work

  48. arXiv:2402.11241  [pdf, other

    cs.CV cs.AI

    DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model

    Authors: Yu Feng, Xing Shi, Mengli Cheng, Yun Xiong

    Abstract: As the task of 2D-to-3D reconstruction has gained significant attention in various real-world scenarios, it becomes crucial to be able to generate high-quality point clouds. Despite the recent success of deep learning models in generating point clouds, there are still challenges in producing high-fidelity results due to the disparities between images and point clouds. While vision transformers (Vi… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  49. arXiv:2402.11129  [pdf, other

    cs.CL

    BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

    Authors: Haoyu Wang, Ruirui Li, Haoming Jiang, Jinjin Tian, Zhengyang Wang, Chen Luo, Xianfeng Tang, Monica Cheng, Tuo Zhao, Jing Gao

    Abstract: Retrieval-augmented Large Language Models (LLMs) offer substantial benefits in enhancing performance across knowledge-intensive scenarios. However, these methods often face challenges with complex inputs and encounter difficulties due to noisy knowledge retrieval, notably hindering model effectiveness. To address this issue, we introduce BlendFilter, a novel approach that elevates retrieval-augmen… ▽ More

    Submitted 11 July, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  50. arXiv:2402.02056  [pdf, other

    cs.CL cs.AI cs.CY

    AnthroScore: A Computational Linguistic Measure of Anthropomorphism

    Authors: Myra Cheng, Kristina Gligoric, Tiziano Piccardi, Dan Jurafsky

    Abstract: Anthropomorphism, or the attribution of human-like characteristics to non-human entities, has shaped conversations about the impacts and possibilities of technology. We present AnthroScore, an automatic metric of implicit anthropomorphism in language. We use a masked language model to quantify how non-human entities are implicitly framed as human by the surrounding context. We show that AnthroScor… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: EACL 2024 Main Conference