Skip to main content

Showing 1–50 of 10,030 results for author: Wang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03757  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

    Authors: Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong Wang

    Abstract: Complex 3D scene understanding has gained increasing attention, with scene encoding strategies playing a crucial role in this success. However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understandi… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Project page: https://yunzeman.github.io/lexicon3d , Github: https://github.com/YunzeMan/Lexicon3D

  2. arXiv:2409.03752  [pdf, other

    cs.CL

    Attention Heads of Large Language Models: A Survey

    Authors: Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Bo Tang, Feiyu Xiong, Zhiyu Li

    Abstract: Since the advent of ChatGPT, Large Language Models (LLMs) have excelled in various tasks but remain largely as black-box systems. Consequently, their development relies heavily on data-driven approaches, limiting performance enhancement through changes in internal architecture and reasoning pathways. As a result, many researchers have begun exploring the potential internal mechanisms of LLMs, aimi… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 20 pages, 11 figures, 4 tables

  3. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  4. arXiv:2409.03393  [pdf, other

    cs.NI

    VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication

    Authors: Yongyi Miao, Zhongdang Li, Yang Wang, Die Hu, Jun Yan, Youfang Wang

    Abstract: In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over wireless channels. In the first stage, we design the adaptive keyframe extractor and interpolator, deployed respectively at the transmitter and receiver, which i… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  5. arXiv:2409.03386  [pdf, other

    cs.IT eess.SP

    Movable Antennas: Channel Measurement, Modeling, and Performance Evaluation

    Authors: Yiqin Wang, Heyin Shen, Chong Han, Meixia Tao

    Abstract: Since decades ago, multi-antenna has become a key enabling technology in the evolution of wireless communication systems. In contrast to conventional multi-antenna systems that contain antennas at fixed positions, position-flexible antenna systems have been proposed to fully utilize the spatial variation of wireless channels. In this paper, movable antenna (MA) systems are analyzed from channel me… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 12 pages, 31 figures

  6. arXiv:2409.03365  [pdf, other

    cs.DC cs.LG

    Efficient Multi-Task Large Model Training via Data Heterogeneity-aware Model Management

    Authors: Yujie Wang, Shenhan Zhu, Fangcheng Fu, Xupeng Miao, Jie Zhang, Juan Zhu, Fan Hong, Yong Li, Bin Cui

    Abstract: Recent foundation models are capable of handling multiple machine learning (ML) tasks and multiple data modalities with the unified base model structure and several specialized model components. However, the development of such multi-task (MT) multi-modal (MM) models poses significant model management challenges to existing training systems. Due to the sophisticated model architecture and the hete… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  7. arXiv:2409.03363  [pdf, other

    cs.CL

    Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding

    Authors: Cheng Wang, Yiwei Wang, Bryan Hooi, Yujun Cai, Nanyun Peng, Kai-Wei Chang

    Abstract: The training data in large language models is key to their success, but it also presents privacy and security risks, as it may contain sensitive information. Detecting pre-training data is crucial for mitigating these concerns. Existing methods typically analyze target text in isolation or solely with non-member contexts, overlooking potential insights from simultaneously considering both member a… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  8. arXiv:2409.03346  [pdf, other

    cs.CL cs.AI

    Sketch: A Toolkit for Streamlining LLM Operations

    Authors: Xin Jiang, Xiang Li, Wenjia Ma, Xuezhi Fang, Yiqun Yao, Naitong Yu, Xuying Meng, Peng Han, Jing Li, Aixin Sun, Yequan Wang

    Abstract: Large language models (LLMs) represented by GPT family have achieved remarkable success. The characteristics of LLMs lie in their ability to accommodate a wide range of tasks through a generative approach. However, the flexibility of their output format poses challenges in controlling and harnessing the model's outputs, thereby constraining the application of LLMs in various domains. In this work,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  9. arXiv:2409.03271  [pdf, other

    cs.AI cs.CL cs.HC

    Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

    Authors: Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu

    Abstract: The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs). However, despite their widespread adoption and success, CoT methods often exhibit instability due to their inability to consistently ensure the quality of generated reasoning paths, leading to sub-optimal reasoning performance. To address this challenge,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  10. arXiv:2409.03223  [pdf, other

    cs.CV

    Why mamba is effective? Exploit Linear Transformer-Mamba Network for Multi-Modality Image Fusion

    Authors: Chenguang Zhu, Shan Gao, Huafeng Chen, Guangqian Guo, Chaowei Wang, Yaoxing Wang, Chen Shu Lei, Quanjiang Fan

    Abstract: Multi-modality image fusion aims to integrate the merits of images from different sources and render high-quality fusion images. However, existing feature extraction and fusion methods are either constrained by inherent local reduction bias and static parameters during inference (CNN) or limited by quadratic computational complexity (Transformers), and cannot effectively extract and fuse features.… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  11. arXiv:2409.03179  [pdf, other

    eess.IV cs.CV

    Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem

    Authors: Qiwen Zhu, Yanjie Wang, Shilv Cai, Liqun Chen, Jiahuan Zhou, Luxin Yan, Sheng Zhong, Xu Zou

    Abstract: Training Single-Image Super-Resolution (SISR) models using pixel-based regression losses can achieve high distortion metrics scores (e.g., PSNR and SSIM), but often results in blurry images due to insufficient recovery of high-frequency details. Conversely, using GAN or perceptual losses can produce sharp images with high perceptual metric scores (e.g., LPIPS), but may introduce artifacts and inco… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  12. arXiv:2409.03021  [pdf, other

    cs.CL cs.LG

    CLUE: Concept-Level Uncertainty Estimation for Large Language Models

    Authors: Yu-Hsiang Wang, Andrew Bai, Che-Ping Tsai, Cho-Jui Hsieh

    Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in various natural language generation (NLG) tasks. Previous studies suggest that LLMs' generation process involves uncertainty. However, existing approaches to uncertainty estimation mainly focus on sequence-level uncertainty, overlooking individual pieces of information within sequences. These methods fall short in separately… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  13. arXiv:2409.02813  [pdf, other

    cs.CL cs.CV

    MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

    Authors: Xiang Yue, Tianyu Zheng, Yuansheng Ni, Yubo Wang, Kai Zhang, Shengbang Tong, Yuxuan Sun, Ming Yin, Botao Yu, Ge Zhang, Huan Sun, Yu Su, Wenhu Chen, Graham Neubig

    Abstract: This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark. MMMU-Pro rigorously assesses multimodal models' true understanding and reasoning capabilities through a three-step process based on MMMU: (1) filtering out questions answerable by text-only models, (2) augmenting candidate options, and (3) introducing a vision-o… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  14. arXiv:2409.02751  [pdf, other

    cs.CL

    A Comparative Study of Pre-training and Self-training

    Authors: Yiheng Wang, Jiayu Lin, Zuoquan Lin

    Abstract: Pre-training and self-training are two approaches to semi-supervised learning. The comparison between pre-training and self-training has been explored. However, the previous works led to confusing findings: self-training outperforms pre-training experienced on some tasks in computer vision, and contrarily, pre-training outperforms self-training experienced on some tasks in natural language process… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 19 pages, 2 figures, 9 tables

  15. arXiv:2409.02728  [pdf, ps, other

    cs.LG cs.SI eess.SP

    Task-Oriented Communication for Graph Data: A Graph Information Bottleneck Approach

    Authors: Shujing Li, Yanhu Wang, Shuaishuai Guo, Chenyuan Feng

    Abstract: Graph data, essential in fields like knowledge representation and social networks, often involves large networks with many nodes and edges. Transmitting these graphs can be highly inefficient due to their size and redundancy for specific tasks. This paper introduces a method to extract a smaller, task-focused subgraph that maintains key information while reducing communication overhead. Our approa… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  16. arXiv:2409.02718  [pdf, other

    cs.CR cs.CL

    Alignment-Aware Model Extraction Attacks on Large Language Models

    Authors: Zi Liang, Qingqing Ye, Yanyun Wang, Sen Zhang, Yaxin Xiao, Ronghua Li, Jianliang Xu, Haibo Hu

    Abstract: Model extraction attacks (MEAs) on large language models (LLMs) have received increasing research attention lately. Existing attack methods on LLMs inherit the extraction strategies from those designed for deep neural networks (DNNs) yet neglect the inconsistency of training tasks between MEA and LLMs' alignments. As such, they result in poor attack performances. To tackle this issue, we present L… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Source code: https://github.com/liangzid/alignmentExtraction

  17. arXiv:2409.02657  [pdf, other

    cs.CV cs.AI cs.MM

    PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation

    Authors: Jun Ling, Yiwen Wang, Han Xue, Rong Xie, Li Song

    Abstract: While previous audio-driven talking head generation (THG) methods generate head poses from driving audio, the generated poses or lips cannot match the audio well or are not editable. In this study, we propose \textbf{PoseTalk}, a THG system that can freely generate lip-synchronized talking head videos with free head poses conditioned on text prompts and audio. The core insight of our method is usi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 7+5 pages, 15 figures

  18. arXiv:2409.02611  [pdf, other

    cs.CV

    GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering

    Authors: Lingling Zhang, Muye Huang, QianYing Wang, Yaxian Wang, Wenjun Wu, Jun Liu

    Abstract: Chart Question Answering (CQA) aims at answering questions based on the visual chart content, which plays an important role in chart sumarization, business data analysis, and data report generation. CQA is a challenging multi-modal task because of the strong context dependence and complex reasoning requirement. The former refers to answering this question strictly based on the analysis of the visu… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  19. arXiv:2409.02599  [pdf, other

    cs.IR cs.CV cs.LG

    A Fashion Item Recommendation Model in Hyperbolic Space

    Authors: Ryotaro Shimizu, Yu Wang, Masanari Kimura, Yuki Hirakawa, Takashi Wada, Yuki Saito, Julian McAuley

    Abstract: In this work, we propose a fashion item recommendation model that incorporates hyperbolic geometry into user and item representations. Using hyperbolic space, our model aims to capture implicit hierarchies among items based on their visual data and users' purchase history. During training, we apply a multi-task learning framework that considers both hyperbolic and Euclidean distances in the loss f… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: This work was presented at the CVFAD Workshop at CVPR 2024

  20. arXiv:2409.02595  [pdf, ps, other

    cs.LO

    Computation and Concurrency

    Authors: Yong Wang

    Abstract: We try to clarify the relationship between computation and concurrency. Base on the so-called truly concurrent automata, we introduce communication and more operators, and establish the algebras modulo language equivalence and bisimilarity.

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.04406

  21. arXiv:2409.02565  [pdf, other

    eess.AS cs.SD

    Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models

    Authors: Jakob Poncelet, Yujun Wang, Hugo Van hamme

    Abstract: Continuous speech can be converted into a discrete sequence by deriving discrete units from the hidden features of self-supervised learned (SSL) speech models. Although SSL models are becoming larger and trained on more data, they are often sensitive to real-life distortions like additive noise or reverberation, which translates to a shift in discrete units. We propose a parameter-efficient approa… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted at SLT2024

  22. arXiv:2409.02395  [pdf, other

    physics.med-ph cs.RO

    Deep Brain Ultrasound Ablation Thermal Dose Modeling with in Vivo Experimental Validation

    Authors: Zhanyue Zhao, Benjamin Szewczyk, Matthew Tarasek, Charles Bales, Yang Wang, Ming Liu, Yiwei Jiang, Chitresh Bhushan, Eric Fiveland, Zahabiya Campwala, Rachel Trowbridge, Phillip M. Johansen, Zachary Olmsted, Goutam Ghoshal, Tamas Heffter, Katie Gandomi, Farid Tavakkolmoghaddam, Christopher Nycz, Erin Jeannotte, Shweta Mane, Julia Nalwalk, E. Clif Burdette, Jiang Qian, Desmond Yeo, Julie Pilitsis , et al. (1 additional authors not shown)

    Abstract: Intracorporeal needle-based therapeutic ultrasound (NBTU) is a minimally invasive option for intervening in malignant brain tumors, commonly used in thermal ablation procedures. This technique is suitable for both primary and metastatic cancers, utilizing a high-frequency alternating electric field (up to 10 MHz) to excite a piezoelectric transducer. The resulting rapid deformation of the transduc… ▽ More

    Submitted 4 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: 9 pages, 9 figures, 7 tables

  23. arXiv:2409.02382  [pdf, other

    cs.CV

    GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving

    Authors: Huasong Han, Kaixuan Zhou, Xiaoxiao Long, Yusen Wang, Chunxia Xiao

    Abstract: We propose GGS, a Generalizable Gaussian Splatting method for Autonomous Driving which can achieve realistic rendering under large viewpoint changes. Previous generalizable 3D gaussian splatting methods are limited to rendering novel views that are very close to the original pair of images, which cannot handle large differences in viewpoint. Especially in autonomous driving scenarios, images are t… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  24. arXiv:2409.02139  [pdf, other

    cs.LG cs.AI cs.CR

    The Role of Transformer Models in Advancing Blockchain Technology: A Systematic Survey

    Authors: Tianxu Liu, Yanbin Wang, Jianguo Sun, Ye Tian, Yanyu Huang, Tao Xue, Peiyue Li, Yiwei Liu

    Abstract: As blockchain technology rapidly evolves, the demand for enhanced efficiency, security, and scalability grows.Transformer models, as powerful deep learning architectures,have shown unprecedented potential in addressing various blockchain challenges. However, a systematic review of Transformer applications in blockchain is lacking. This paper aims to fill this research gap by surveying over 200 rel… ▽ More

    Submitted 5 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

  25. arXiv:2409.02078  [pdf, other

    cs.CL

    Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text

    Authors: Michael Burnham, Kayla Kahn, Ryan Yank Wang, Rachel X. Peng

    Abstract: Social scientists quickly adopted large language models due to their ability to annotate documents without supervised training, an ability known as zero-shot learning. However, due to their compute demands, cost, and often proprietary nature, these models are often at odds with replication and open science standards. This paper introduces the Political DEBATE (DeBERTa Algorithm for Textual Entailm… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 26 pages, 5 figures

  26. arXiv:2409.01944  [pdf, other

    cs.CL

    FuzzCoder: Byte-level Fuzzing Test via Large Language Model

    Authors: Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli Wang, Linzheng ChaI, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li

    Abstract: Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to p… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 11 pages

  27. arXiv:2409.01931  [pdf, other

    physics.chem-ph cs.AI cs.LG physics.bio-ph physics.comp-ph

    On the design space between molecular mechanics and machine learning force fields

    Authors: Yuanqing Wang, Kenichiro Takaba, Michael S. Chen, Marcus Wieder, Yuzhi Xu, Tong Zhu, John Z. H. Zhang, Arnav Nagle, Kuang Yu, Xinyan Wang, Daniel J. Cole, Joshua A. Rackers, Kyunghyun Cho, Joe G. Greener, Peter Eastman, Stefano Martiniani, Mark E. Tuckerman

    Abstract: A force field as accurate as quantum mechanics (QM) and as fast as molecular mechanics (MM), with which one can simulate a biomolecular system efficiently enough and meaningfully enough to get quantitative insights, is among the most ardent dreams of biophysicists -- a dream, nevertheless, not to be fulfilled any time soon. Machine learning force fields (MLFFs) represent a meaningful endeavor towa… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  28. arXiv:2409.01867  [pdf, other

    cs.HC

    ASD-Chat: An Innovative Dialogue Intervention System for Children with Autism based on LLM and VB-MAPP

    Authors: Chengyun Deng, Shuzhong Lai, Chi Zhou, Mengyi Bao, Jingwen Yan, Haifeng Li, Lin Yao, Yueming Wang

    Abstract: Early diagnosis and professional intervention can help children with autism spectrum disorder (ASD) return to normal life. However, the scarcity and imbalance of professional medical resources currently prevent many autistic children from receiving the necessary diagnosis and intervention. Therefore, numerous paradigms have been proposed that use computer technology to assist or independently cond… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  29. arXiv:2409.01816  [pdf, other

    cs.CV

    GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection

    Authors: Jinqing Zhang, Yanan Zhang, Yunlong Qi, Zehua Fu, Qingjie Liu, Yunhong Wang

    Abstract: Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection, demonstrating impressive perceptual capabilities. However, existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state and failing to restore the authentic geometric information of the scene. In this paper, we identify the reasons why previou… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  30. arXiv:2409.01787  [pdf, other

    cs.CL

    LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection

    Authors: Yifeng Wang, Zhouhong Gu, Siwei Zhang, Suhang Zheng, Tao Wang, Tianyu Li, Hongwei Feng, Yanghua Xiao

    Abstract: Explainable fake news detection predicts the authenticity of news items with annotated explanations. Today, Large Language Models (LLMs) are known for their powerful natural language understanding and explanation generation abilities. However, presenting LLMs for explainable fake news detection remains two main challenges. Firstly, fake news appears reasonable and could easily mislead LLMs, leavin… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  31. arXiv:2409.01780  [pdf, other

    cs.CL

    State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research

    Authors: Yihao Wang, Ru Zhang, Yifan Tang, Jianyi Liu

    Abstract: With the evolution of generative linguistic steganography techniques, conventional steganalysis falls short in robustly quantifying the alterations induced by steganography, thereby complicating detection. Consequently, the research paradigm has pivoted towards deep-learning-based linguistic steganalysis. This study offers a comprehensive review of existing contributions and evaluates prevailing d… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by 2023 International Conference on Data, Information and Computing Science

    Report number: no. 316

  32. arXiv:2409.01557  [pdf, other

    cs.CV

    TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video

    Authors: Chengqian Zhao, Zhao Yao, Zhaoyu Hu, Yuanxin Xie, Yafang Zhang, Yuanyuan Wang, Shuo Li, Jianhua Zhou, Jianqiao Zhou, Yin Wang, Jinhua Yu

    Abstract: In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  33. It is Time to Develop an Auditing Framework to Promote Value Aware Chatbots

    Authors: Yanchen Wang, Lisa Singh

    Abstract: The launch of ChatGPT in November 2022 marked the beginning of a new era in AI, the availability of generative AI tools for everyone to use. ChatGPT and other similar chatbots boast a wide range of capabilities from answering student homework questions to creating music and art. Given the large amounts of human data chatbots are built on, it is inevitable that they will inherit human errors and bi… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07500

    Journal ref: 13th International Conference on Data Science, Technology and Applications (DATA 2024), pages 460-470

  34. arXiv:2409.01502  [pdf, other

    cs.CV cs.AI cs.GR

    AMG: Avatar Motion Guided Video Generation

    Authors: Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang

    Abstract: Human video generation task has gained significant attention with the advancement of deep generative models. Generating realistic videos with human movements is challenging in nature, due to the intricacies of human body topology and sensitivity to visual artifacts. The extensively studied 2D media generation methods take advantage of massive human media datasets, but struggle with 3D-aware contro… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: The project page is at https://github.com/zshyang/amg

  35. arXiv:2409.01151  [pdf, other

    cs.CV cs.LG

    Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

    Authors: Yueqian Wang, Jianxin Liang, Yuxuan Wang, Huishuai Zhang, Dongyan Zhao

    Abstract: Hallucination is a common issue in Multimodal Large Language Models (MLLMs), yet the underlying principles remain poorly understood. In this paper, we investigate which components of MLLMs contribute to object hallucinations. To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free represe… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  36. arXiv:2409.01071  [pdf, other

    cs.CV cs.CL

    VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

    Authors: Yuxuan Wang, Cihang Xie, Yang Liu, Zilong Zheng

    Abstract: Recent advancements in large-scale video-language models have shown significant potential for real-time planning and detailed interactions. However, their high computational demands and the scarcity of annotated datasets limit their practicality for academic researchers. In this work, we introduce VideoLLaMB, a novel framework that utilizes temporal memory tokens within bridge layers to allow for… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  37. arXiv:2409.01060  [pdf

    cs.CE

    Multiagent Reinforcement Learning Enhanced Decision-making of Crew Agents During Floor Construction Process

    Authors: Bin Yang, Boda Liu, Yilong Han, Xin Meng, Yifan Wang, Hansi Yang, Jianzhuang Xia

    Abstract: Fine-grained simulation of floor construction processes is essential for supporting lean management and the integration of information technology. However, existing research does not adequately address the on-site decision-making of constructors in selecting tasks and determining their sequence within the entire construction process. Moreover, decision-making frameworks from computer science and r… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  38. arXiv:2409.00920  [pdf, other

    cs.LG cs.AI cs.CL

    ToolACE: Winning the Points of LLM Function Calling

    Authors: Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian , et al. (2 additional authors not shown)

    Abstract: Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic ag… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 21 pages, 22 figures

  39. arXiv:2409.00917  [pdf, other

    cs.CV

    Large Scale Unsupervised Brain MRI Image Registration Solution for Learn2Reg 2024

    Authors: Yuxi Zhang, Xiang Chen, Jiazheng Wang, Min Liu, Yaonan Wang, Dongdong Liu, Renjiu Hu, Hang Zhang

    Abstract: In this paper, we summarize the methods and experimental results we proposed for Task 2 in the learn2reg 2024 Challenge. This task focuses on unsupervised registration of anatomical structures in brain MRI images between different patients. The difficulty lies in: (1) without segmentation labels, and (2) a large amount of data. To address these challenges, we built an efficient backbone network an… ▽ More

    Submitted 4 September, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: MICCAI Learn2Reg 2024 Challenge & WBIR 2024 Workshop on Biomedical Imaging Registration

  40. arXiv:2409.00904  [pdf, other

    cs.CV cs.AI

    Multi-scale Temporal Fusion Transformer for Incomplete Vehicle Trajectory Prediction

    Authors: Zhanwen Liu, Chao Li, Yang Wang, Nan Yang, Xing Fan, Jiaqi Ma, Xiangmo Zhao

    Abstract: Motion prediction plays an essential role in autonomous driving systems, enabling autonomous vehicles to achieve more accurate local-path planning and driving decisions based on predictions of the surrounding vehicles. However, existing methods neglect the potential missing values caused by object occlusion, perception failures, etc., which inevitably degrades the trajectory prediction performance… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  41. arXiv:2409.00854  [pdf, other

    cs.PF

    Scaler: Efficient and Effective Cross Flow Analysis

    Authors: Steven, Tang, Mingcan Xiang, Yang Wang, Bo Wu, Jianjun Chen, Tongping Liu

    Abstract: Performance analysis is challenging as different components (e.g.,different libraries, and applications) of a complex system can interact with each other. However, few existing tools focus on understanding such interactions. To bridge this gap, we propose a novel analysis method "Cross Flow Analysis (XFA)" that monitors the interactions/flows across these components. We also built the Scaler profi… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Paper has been accepted by ASE'24

  42. arXiv:2409.00787  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs

    Authors: Bocheng Chen, Hanqing Guo, Guangjing Wang, Yuanda Wang, Qiben Yan

    Abstract: Large Language Models (LLMs) have demonstrated great capabilities in natural language understanding and generation, largely attributed to the intricate alignment process using human feedback. While alignment has become an essential training component that leverages data collected from user queries, it inadvertently opens up an avenue for a new type of user-guided poisoning attacks. In this paper,… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  43. arXiv:2409.00750  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

    Authors: Yuancheng Wang, Haoyue Zhan, Liwei Liu, Ruihong Zeng, Haotian Guo, Jiachen Zheng, Qiang Zhang, Shunsi Zhang, Zhizheng Wu

    Abstract: Nowadays, large-scale text-to-speech (TTS) systems are primarily divided into two types: autoregressive and non-autoregressive. The autoregressive systems have certain deficiencies in robustness and cannot control speech duration. In contrast, non-autoregressive systems require explicit prediction of phone-level duration, which may compromise their naturalness. We introduce the Masked Generative C… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  44. arXiv:2409.00727  [pdf, other

    cs.AI cs.CL cs.IR

    Hound: Hunting Supervision Signals for Few and Zero Shot Node Classification on Text-attributed Graph

    Authors: Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuanhui Yang, Yuanyuan Zhu, Chuang Hu, Bo Du, Jiawei Jiang

    Abstract: Text-attributed graph (TAG) is an important type of graph structured data with text descriptions for each node. Few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. However, the two tasks are challenging due to the lack of supervision signals, and existing methods only use the contrastive loss to align graph-based node embedding and… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  45. arXiv:2409.00643  [pdf, other

    cs.RO

    Learning to Singulate Objects in Packed Environments using a Dexterous Hand

    Authors: Hao Jiang, Yuhai Wang, Hanyang Zhou, Daniel Seita

    Abstract: Robotic object singulation, where a robot must isolate, grasp, and retrieve a target object in a cluttered environment, is a fundamental challenge in robotic manipulation. This task is difficult due to occlusions and how other objects act as obstacles for manipulation. A robot must also reason about the effect of object-object interactions as it tries to singulate the target. Prior work has explor… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  46. arXiv:2409.00509  [pdf, other

    cs.CL

    LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

    Authors: Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi

    Abstract: Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training s… ▽ More

    Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Work in Progress

  47. arXiv:2409.00426  [pdf, other

    cs.CR

    Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

    Authors: Yu He, Boheng Li, Yao Wang, Mengda Yang, Juan Wang, Hongxin Hu, Xingyu Zhao

    Abstract: The vulnerability of machine learning models to Membership Inference Attacks (MIAs) has garnered considerable attention in recent years. These attacks determine whether a data sample belongs to the model's training set or not. Recent research has focused on reference-based attacks, which leverage difficulty calibration with independently trained reference models. While empirical studies have demon… ▽ More

    Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Accepted by ACM CCS 2024

  48. arXiv:2409.00387  [pdf, other

    eess.AS cs.SD

    Progressive Residual Extraction based Pre-training for Speech Representation Learning

    Authors: Tianrui Wang, Jin Li, Ziyang Ma, Rui Cao, Xie Chen, Longbiao Wang, Meng Ge, Xiaobao Wang, Yuguang Wang, Jianwu Dang, Nyima Tashi

    Abstract: Self-supervised learning (SSL) has garnered significant attention in speech processing, excelling in linguistic tasks such as speech recognition. However, jointly improving the performance of pre-trained models on various downstream tasks, each requiring different speech information, poses significant challenges. To this purpose, we propose a progressive residual extraction based self-supervised l… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  49. arXiv:2409.00342  [pdf, other

    cs.CV

    AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

    Authors: Zanlin Ni, Yulin Wang, Renping Zhou, Rui Lu, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Yuan Yao, Gao Huang

    Abstract: Recent studies have demonstrated the effectiveness of token-based methods for visual content generation. As a representative work, non-autoregressive Transformers (NATs) are able to synthesize images with decent quality in a small number of steps. However, NATs usually necessitate configuring a complicated generation policy comprising multiple manually-designed scheduling rules. These heuristic-dr… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: Accepted by ECCV2024

  50. arXiv:2409.00206  [pdf, other

    cs.CV cs.RO

    RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning

    Authors: Sha Lu, Xuecheng Xu, Yuxuan Wu, Haojian Lu, Xieyuanli Chen, Rong Xiong, Yue Wang

    Abstract: Global localization using onboard perception sensors, such as cameras and LiDARs, is crucial in autonomous driving and robotics applications when GPS signals are unreliable. Most approaches achieve global localization by sequential place recognition and pose estimation. Some of them train separate models for each task, while others employ a single model with dual heads, trained jointly with separa… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: 23 pages, 19 figures