Skip to main content

Showing 1–50 of 665 results for author: Zhou, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02810  [pdf, other

    math.NA cs.AI

    A hybrid FEM-PINN method for time-dependent partial differential equations

    Authors: Xiaodong Feng, Haojiong Shangguan, Tao Tang, Xiaoliang Wan, Tao Zhou

    Abstract: In this work, we present a hybrid numerical method for solving evolution partial differential equations (PDEs) by merging the time finite element method with deep neural networks. In contrast to the conventional deep learning-based formulation where the neural network is defined on a spatiotemporal domain, our methodology utilizes finite element basis functions in the time direction where the spac… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 25pages

  2. arXiv:2408.16403  [pdf, other

    cs.LG

    DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

    Authors: Kai Du, Yongle Xie, Tao Zhou, Yuancheng Zhou

    Abstract: Sequential propagation of chaos (SPoC) is a recently developed tool to solve mean-field stochastic differential equations and their related nonlinear Fokker-Planck equations. Based on the theory of SPoC, we present a new method (deepSPoC) that combines the interacting particle system of SPoC and deep learning. Under the framework of deepSPoC, two classes of frequently used deep models include full… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  3. arXiv:2408.12957  [pdf, other

    cs.CV

    Image Segmentation in Foundation Model Era: A Survey

    Authors: Tianfei Zhou, Fei Zhang, Boyu Chang, Wenguan Wang, Ye Yuan, Ender Konukoglu, Daniel Cremers

    Abstract: Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image segmentation or developing dedicate… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: A comprehensive survey of image segmentation in foundation model era (work in progress)

  4. arXiv:2408.11843  [pdf, other

    cs.CL cs.AI

    Editable Fairness: Fine-Grained Bias Mitigation in Language Models

    Authors: Ruizhe Chen, Yichen Li, Jianfei Yang, Joey Tianyi Zhou, Zuozhu Liu

    Abstract: Generating fair and accurate predictions plays a pivotal role in deploying large language models (LLMs) in the real world. However, existing debiasing methods inevitably generate unfair or incorrect predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.09341

  5. arXiv:2408.10006  [pdf, other

    cs.LG

    Unlocking the Power of LSTM for Long Term Time Series Forecasting

    Authors: Yaxuan Kong, Zepu Wang, Yuqi Nie, Tian Zhou, Stefan Zohren, Yuxuan Liang, Peng Sun, Qingsong Wen

    Abstract: Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  6. arXiv:2408.09406  [pdf, other

    cs.SI physics.soc-ph

    Uncovering multi-order Popularity and Similarity Mechanisms in Link Prediction by graphlet predictors

    Authors: Yong-Jian He, Yijun Ran, Zengru Di, Tao Zhou, Xiao-Ke Xu

    Abstract: Link prediction has become a critical problem in network science and has thus attracted increasing research interest. Popularity and similarity are two primary mechanisms in the formation of real networks. However, the roles of popularity and similarity mechanisms in link prediction across various domain networks remain poorly understood. Accordingly, this study used orbit degrees of graphlets to… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 40 pages, 9 figures

  7. arXiv:2408.08931  [pdf, other

    cs.IR cs.AI cs.LG

    Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

    Authors: Zhiwei Li, Guodong Long, Tianyi Zhou, Jing Jiang, Chengqi Zhang

    Abstract: Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy in a federated setting. Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector. However, the user embedding is usu… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures, 4 tables, conference

  8. arXiv:2408.08567  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

    Authors: Xue Wang, Tian Zhou, Jianqing Zhu, Jialin Liu, Kun Yuan, Tao Yao, Wotao Yin, Rong Jin, HanQin Cai

    Abstract: Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challengin… ▽ More

    Submitted 23 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  9. arXiv:2408.07897  [pdf, other

    cs.LG cs.IR cs.MA eess.SY

    The Nah Bandit: Modeling User Non-compliance in Recommendation Systems

    Authors: Tianyue Zhou, Jung-Hoon Cho, Cathy Wu

    Abstract: Recommendation systems now pervade the digital world, ranging from advertising to entertainment. However, it remains challenging to implement effective recommendation systems in the physical world, such as in mobility or health. This work focuses on a key challenge: in the physical world, it is often easy for the user to opt out of taking any recommendation if they are not to her liking, and to fa… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures, under review

  10. Flexible 3D Lane Detection by Hierarchical Shape MatchingFlexible 3D Lane Detection by Hierarchical Shape Matching

    Authors: Zhihao Guan, Ruixin Liu, Zejian Yuan, Ao Liu, Kun Tang, Tong Zhou, Erlong Li, Chao Zheng, Shuqi Mei

    Abstract: As one of the basic while vital technologies for HD map construction, 3D lane detection is still an open problem due to varying visual conditions, complex typologies, and strict demands for precision. In this paper, an end-to-end flexible and hierarchical lane detector is proposed to precisely predict 3D lane lines from point clouds. Specifically, we design a hierarchical network predicting flexib… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  11. arXiv:2408.06927  [pdf, other

    cs.CV cs.LG

    Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

    Authors: Xin Zhang, Jiawei Du, Ping Liu, Joey Tianyi Zhou

    Abstract: Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an i… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  12. arXiv:2408.04662  [pdf, other

    cs.CL cs.AI

    Citekit: A Modular Toolkit for Large Language Model Citation Generation

    Authors: Jiajun Shen, Tong Zhou, Suifeng Zhao, Yubo Chen, Kang Liu

    Abstract: Enabling Large Language Models (LLMs) to generate citations in Question-Answering (QA) tasks is an emerging paradigm aimed at enhancing the verifiability of their responses when LLMs are utilizing external references to generate an answer. However, there is currently no unified framework to standardize and fairly compare different citation generation methods, leading to difficulties in reproducing… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 7 pages, 13 figures

  13. arXiv:2408.04170  [pdf

    cs.CV

    M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction

    Authors: Hui Luo, Jiashuang Huang, Hengrong Ju, Tianyi Zhou, Weiping Ding

    Abstract: Accurate cancer survival prediction is crucial for assisting clinical doctors in formulating treatment plans. Multimodal data, including histopathological images and genomic data, offer complementary and comprehensive information that can greatly enhance the accuracy of this task. However, the current methods, despite yielding promising results, suffer from two notable limitations: they do not eff… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  14. FDiff-Fusion:Denoising diffusion fusion network based on fuzzy learning for 3D medical image segmentation

    Authors: Weiping Ding, Sheng Geng, Haipeng Wang, Jiashuang Huang, Tianyi Zhou

    Abstract: In recent years, the denoising diffusion model has achieved remarkable success in image segmentation modeling. With its powerful nonlinear modeling capabilities and superior generalization performance, denoising diffusion models have gradually been applied to medical image segmentation tasks, bringing new perspectives and methods to this field. However, existing methods overlook the uncertainty of… ▽ More

    Submitted 21 July, 2024; originally announced August 2024.

    Comments: This paper has been accepted by Information Fusion. Permission from Elsevier must be obtained for all other uses, in any current or future media. The final version is available at [doi:10.1016/J.INFFUS.2024.102540]

    Journal ref: Information Fusion, 2024: 102540

  15. FMDNN: A Fuzzy-guided Multi-granular Deep Neural Network for Histopathological Image Classification

    Authors: Weiping Ding, Tianyi Zhou, Jiashuang Huang, Shu Jiang, Tao Hou, Chin-Teng Lin

    Abstract: Histopathological image classification constitutes a pivotal task in computer-aided diagnostics. The precise identification and categorization of histopathological images are of paramount significance for early disease detection and treatment. In the diagnostic process of pathologists, a multi-tiered approach is typically employed to assess abnormalities in cell regions at different magnifications… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by IEEE Transactions on Fuzzy Systems for publication. Permission from IEEE must be obtained for all other uses, in any current or future media. The final version is available at [doi: 10.1109/TFUZZ.2024.3410929]

    Journal ref: IEEE Transactions on Fuzzy Systems ( Early Access ) 2024

  16. arXiv:2407.14746  [pdf, other

    cs.CV eess.IV

    Difflare: Removing Image Lens Flare with Latent Diffusion Model

    Authors: Tianwen Zhou, Qihao Duan, Zitong Yu

    Abstract: The recovery of high-quality images from images corrupted by lens flare presents a significant challenge in low-level vision. Contemporary deep learning methods frequently entail training a lens flare removing model from scratch. However, these methods, despite their noticeable success, fail to utilize the generative prior learned by pre-trained models, resulting in unsatisfactory performance in l… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: Accepted by BMVC 2024

  17. arXiv:2407.14504  [pdf, other

    cs.LG

    Nonlinear Schrödinger Network

    Authors: Yiming Zhou, Callen MacPhee, Tingyi Zhou, Bahram Jalali

    Abstract: Deep neural networks (DNNs) have achieved exceptional performance across various fields by learning complex nonlinear mappings from large-scale datasets. However, they encounter challenges such as high computational costs and limited interpretability. To address these issues, hybrid approaches that integrate physics with AI are gaining interest. This paper introduces a novel physics-based AI model… ▽ More

    Submitted 24 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  18. arXiv:2407.08133  [pdf, other

    cs.CV cs.AI

    Nonverbal Interaction Detection

    Authors: Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang

    Abstract: This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the l… ▽ More

    Submitted 14 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/weijianan1/NVI

  19. arXiv:2407.05633  [pdf, other

    cs.LG cs.CR

    AdaPI: Facilitating DNN Model Adaptivity for Efficient Private Inference in Edge Computing

    Authors: Tong Zhou, Jiahui Zhao, Yukui Luo, Xi Xie, Wujie Wen, Caiwen Ding, Xiaolin Xu

    Abstract: Private inference (PI) has emerged as a promising solution to execute computations on encrypted data, safeguarding user privacy and model parameters in edge computing. However, existing PI methods are predominantly developed considering constant resource constraints, overlooking the varied and dynamic resource constraints in diverse edge devices, like energy budgets. Consequently, model providers… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ICCAD 2024 accepted publication

  20. arXiv:2407.05267  [pdf, other

    cs.CV

    DTR: A Unified Deep Tensor Representation Framework for Multimedia Data Recovery

    Authors: Ting-Wei Zhou, Xi-Le Zhao, Jian-Li Wang, Yi-Si Luo, Min Wang, Xiao-Xuan Bai, Hong Yan

    Abstract: Recently, the transform-based tensor representation has attracted increasing attention in multimedia data (e.g., images and videos) recovery problems, which consists of two indispensable components, i.e., transform and characterization. Previously, the development of transform-based tensor representation mainly focuses on the transform aspect. Although several attempts consider using shallow matri… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  21. arXiv:2407.03089  [pdf, other

    eess.SP cs.LG q-bio.NC

    Spatio-Temporal Adaptive Diffusion Models for EEG Super-Resolution in Epilepsy Diagnosis

    Authors: Tong Zhou, Shuqiang Wang

    Abstract: Electroencephalogram (EEG) technology, particularly high-density EEG (HD EEG) devices, is widely used in fields such as neuroscience. HD EEG devices improve the spatial resolution of EEG by placing more electrodes on the scalp, meeting the requirements of clinical diagnostic applications such as epilepsy focus localization. However, this technique faces challenges such as high acquisition costs an… ▽ More

    Submitted 6 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  22. arXiv:2407.02764  [pdf, other

    cs.OS

    Data-driven Software-based Power Estimation for Embedded Devices

    Authors: Haoyu Wang, Xinyi Li, Ti Zhou, Man Lin

    Abstract: Energy measurement of computer devices, which are widely used in the Internet of Things (IoT), is an important yet challenging task. Most of these IoT devices lack ready-to-use hardware or software for power measurement. A cost-effective solution is to use low-end consumer-grade power meters. However, these low-end power meters cannot provide accurate instantaneous power measurements. In this pape… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  23. arXiv:2407.02408  [pdf, other

    cs.CL cs.LG

    CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models

    Authors: Song Wang, Peng Wang, Tong Zhou, Yushun Dong, Zhen Tan, Jundong Li

    Abstract: As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. However, existing bias evaluation efforts often focus on only a particular type o… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 37 pages, 32 figures

  24. arXiv:2407.00995  [pdf, other

    cs.CY eess.SY physics.app-ph

    Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense

    Authors: Yi Yu, Shengyue Yao, Tianchen Zhou, Yexuan Fu, Jingru Yu, Ding Wang, Xuhong Wang, Cen Chen, Yilun Lin

    Abstract: In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  25. arXiv:2407.00629  [pdf, ps, other

    cs.MA

    Identification of LFT Structured Descriptor Systems with Slow and Non-uniform Sampling

    Authors: Tong Zhou

    Abstract: Time domain identification is studied in this paper for parameters of a continuous-time multi-input multi-output descriptor system, with these parameters affecting system matrices through a linear fractional transformation. Sampling is permitted to be slow and non-uniform, and there are no necessities to satisfy the Nyquist frequency restrictions. This model can be used to described the behaviors… ▽ More

    Submitted 31 August, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 15 pages

  26. arXiv:2407.00256  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

    Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

    Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

    MSC Class: 68T01

    Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

  27. arXiv:2406.19364  [pdf, other

    cs.CV

    SimTxtSeg: Weakly-Supervised Medical Image Segmentation with Simple Text Cues

    Authors: Yuxin Xie, Tao Zhou, Yi Zhou, Geng Chen

    Abstract: Weakly-supervised medical image segmentation is a challenging task that aims to reduce the annotation cost while keep the segmentation performance. In this paper, we present a novel framework, SimTxtSeg, that leverages simple text cues to generate high-quality pseudo-labels and study the cross-modal fusion in training segmentation models, simultaneously. Our contribution consists of two key compon… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: accepted by MICCAI 2024

  28. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 22 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  29. arXiv:2406.18313  [pdf, other

    cs.SD cs.CL eess.AS

    Advancing Airport Tower Command Recognition: Integrating Squeeze-and-Excitation and Broadcasted Residual Learning

    Authors: Yuanxi Lin, Tonglin Zhou, Yang Xiao

    Abstract: Accurate recognition of aviation commands is vital for flight safety and efficiency, as pilots must follow air traffic control instructions precisely. This paper addresses challenges in speech command recognition, such as noisy environments and limited computational resources, by advancing keyword spotting technology. We create a dataset of standardized airport tower commands, including routine an… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by IALP 2024

  30. arXiv:2406.17806  [pdf, other

    cs.CL cs.AI cs.CR cs.CV cs.LG

    MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

    Authors: Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh

    Abstract: Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  31. arXiv:2406.17231  [pdf, other

    cs.CL

    CogMG: Collaborative Augmentation Between Large Language Model and Knowledge Graph

    Authors: Tong Zhou, Yubo Chen, Kang Liu, Jun Zhao

    Abstract: Large language models have become integral to question-answering applications despite their propensity for generating hallucinations and factually inaccurate content. Querying knowledge graphs to reduce hallucinations in LLM meets the challenge of incomplete knowledge coverage in knowledge graphs. On the other hand, updating knowledge graphs by information extraction and knowledge graph completion… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  32. arXiv:2406.16229  [pdf, other

    cs.CL

    Multi-Objective Linguistic Control of Large Language Models

    Authors: Dang Nguyen, Jiuhai Chen, Tianyi Zhou

    Abstract: Large language models (LLMs), despite their breakthroughs on many challenging benchmark tasks, lean to generate verbose responses and lack the controllability of output complexity, which is usually preferred by human users in practice. In this paper, we study how to precisely control multiple linguistic complexities of LLM output by finetuning using off-the-shelf data. To this end, we propose mult… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  33. arXiv:2406.15938  [pdf, other

    cs.CL cs.AI cs.LG

    RuleR: Improving LLM Controllability by Rule-based Data Recycling

    Authors: Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou

    Abstract: Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR),… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  34. arXiv:2406.14721  [pdf, other

    cs.CL

    1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

    Authors: Yue Huang, Chenrui Fan, Yuan Li, Siyuan Wu, Tianyi Zhou, Xiangliang Zhang, Lichao Sun

    Abstract: Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement. This paper introduces a method to enhance the multilingual performance of LLMs by aggregating kn… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  35. arXiv:2406.12463  [pdf, other

    cs.CV eess.IV

    LFMamba: Light Field Image Super-Resolution with State Space Model

    Authors: Wang xia, Yao Lu, Shunzhou Wang, Ziqi Wang, Peiqi Xia, Tianfei Zhou

    Abstract: Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scan… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  36. arXiv:2406.10900  [pdf, other

    cs.CV cs.CL

    AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

    Authors: Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, Tianyi Zhou, Dinesh Manocha

    Abstract: Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate LVLM hallucinations, they mainly rely on hand-crafted corner cases whose fail patterns may hardly generalize, and finetuning on them could undermine… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  37. arXiv:2406.10819  [pdf, other

    cs.CV cs.AI cs.CL

    GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

    Authors: Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding code. However, current agents primarily exhibit excellent understanding capabilities in static environments and are predominantly applied in relatively simple domains, such as Web or mobile interfaces… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  38. arXiv:2406.10323  [pdf, other

    cs.CL

    GenQA: Generating Millions of Instructions from a Handful of Prompts

    Authors: Jiuhai Chen, Rifaa Qadri, Yuxin Wen, Neel Jain, John Kirchenbauer, Tianyi Zhou, Tom Goldstein

    Abstract: Most public instruction finetuning datasets are relatively small compared to the closed source datasets used to train industry models. To study questions about finetuning at scale, such as curricula and learning rate cooldown schedules, there is a need for industrial-scale datasets. However, this scale necessitates a data generation process that is almost entirely automated. In this work, we study… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9.5 pages, 6 Figures, and 3 tables in the main body. Dataset available at https://huggingface.co/datasets/tomg-group-umd/GenQA

  39. arXiv:2406.07657  [pdf, other

    cs.LG cs.CL

    OPTune: Efficient Online Preference Tuning

    Authors: Lichang Chen, Jiuhai Chen, Chenxi Liu, John Kirchenbauer, Davit Soselia, Chen Zhu, Tom Goldstein, Tianyi Zhou, Heng Huang

    Abstract: Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have shown that the online variants achieve even better alignment. However, online alignment requires on-the-fly generation of new training data, which is… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures

  40. arXiv:2406.06965  [pdf, other

    cs.CV

    Evolving from Single-modal to Multi-modal Facial Deepfake Detection: A Survey

    Authors: Ping Liu, Qiqi Tao, Joey Tianyi Zhou

    Abstract: This survey addresses the critical challenge of deepfake detection amidst the rapid advancements in artificial intelligence. As AI-generated media, including video, audio and text, become more realistic, the risk of misuse to spread misinformation and commit identity fraud increases. Focused on face-centric deepfakes, this work traces the evolution from traditional single-modality methods to sophi… ▽ More

    Submitted 14 August, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: P. Liu is with the Department of Computer Science and Engineering, University of Nevada, Reno, NV, 89512. Q. Tao and J. Zhou are with Centre for Frontier AI Research (CFAR), and Institute of High Performance Computing (IHPC), A*STAR, Singapore. J. Zhou is also with Centre for Advanced Technologies in Online Safety (CATOS), A*STAR, Singapore. J. Zhou is the corresponding author

  41. Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification

    Authors: Yuxin Hong, Xiao Zhang, Xin Zhang, Joey Tianyi Zhou

    Abstract: In the medical field, managing high-dimensional massive medical imaging data and performing reliable medical analysis from it is a critical challenge, especially in resource-limited environments such as remote medical facilities and mobile devices. This necessitates effective dataset compression techniques to reduce storage, transmission, and computational cost. However, existing coreset selection… ▽ More

    Submitted 2 September, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM Multimedia 2024 (oral), see: https://openreview.net/forum?id=m1qrB9KSYD

  42. arXiv:2406.03445  [pdf, other

    cs.LG cs.CL

    Pre-trained Large Language Models Use Fourier Features to Compute Addition

    Authors: Tianyi Zhou, Deqing Fu, Vatsal Sharan, Robin Jia

    Abstract: Pre-trained large language models (LLMs) exhibit impressive mathematical reasoning capabilities, yet how they compute basic arithmetic, such as addition, remains unclear. This paper shows that pre-trained LLMs add numbers using Fourier features -- dimensions in the hidden state that represent numbers via a set of features sparse in the frequency domain. Within the model, MLP and attention layers u… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  43. arXiv:2406.02965  [pdf, other

    cs.CV

    Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

    Abstract: The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  44. arXiv:2406.01970  [pdf, other

    cs.CV cs.AI

    The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

    Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Boqing Gong, Cho-Jui Hsieh, Minhao Cheng

    Abstract: Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  45. arXiv:2406.01946  [pdf, other

    cs.CR cs.CL

    Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature

    Authors: Tong Zhou, Xuandong Zhao, Xiaolin Xu, Shaolei Ren

    Abstract: Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  46. arXiv:2406.00956  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

    Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

    Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project Link: https://sam-auxol.github.io/AuxOL/

  47. arXiv:2405.20910  [pdf, other

    physics.app-ph cs.AI cs.CV physics.data-an

    Predicting ptychography probe positions using single-shot phase retrieval neural network

    Authors: Ming Du, Tao Zhou, Junjing Deng, Daniel J. Ching, Steven Henke, Mathew J. Cherukara

    Abstract: Ptychography is a powerful imaging technique that is used in a variety of fields, including materials science, biology, and nanotechnology. However, the accuracy of the reconstructed ptychography image is highly dependent on the accuracy of the recorded probe positions which often contain errors. These errors are typically corrected jointly with phase retrieval through numerical optimization appro… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    MSC Class: 94A08 ACM Class: I.4.0

  48. arXiv:2405.16472  [pdf, other

    cs.LG

    Multi-Level Additive Modeling for Structured Non-IID Federated Learning

    Authors: Shutong Chen, Tianyi Zhou, Guodong Long, Jie Ma, Jing Jiang, Chengqi Zhang

    Abstract: The primary challenge in Federated Learning (FL) is to model non-IID distributions across clients, whose fine-grained structure is important to improve knowledge sharing. For example, some knowledge is globally shared across all clients, some is only transferable within a subgroup of clients, and some are client-specific. To capture and exploit this structure, we train models organized in a multi-… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  49. arXiv:2405.15973  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

    Authors: Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, Yuhang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Furong Huang, Cao Xiao

    Abstract: Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 15 pages, 8 figures

  50. arXiv:2405.14767  [pdf, other

    q-fin.ST cs.CL cs.LG q-fin.TR

    FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models

    Authors: Hongyang Yang, Boyu Zhang, Neng Wang, Cheng Guo, Xiaoli Zhang, Likun Lin, Junlin Wang, Tianyu Zhou, Mao Guan, Runjia Zhang, Christina Dan Wang

    Abstract: As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community. These challenges impede the AI community's ability to enhance financial tasks effectively. Acknowledging financial analysis's critical role, we aim… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: FinRobot Whitepaper V1.0