Skip to main content

Showing 1–50 of 5,796 results for author: Chen, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03715  [pdf, other

    cs.SD cs.AI eess.AS

    Applications and Advances of Artificial Intelligence in Music Generation:A Review

    Authors: Yanxu Chen, Linshu Huang, Tian Gou

    Abstract: In recent years, artificial intelligence (AI) has made significant progress in the field of music generation, driving innovation in music creation and applications. This paper provides a systematic review of the latest research advancements in AI music generation, covering key technologies, models, datasets, evaluation methods, and their practical applications across various fields. The main contr… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  2. arXiv:2409.03576  [pdf, ps, other

    cs.IT

    Weight enumerators of self-dual quantum codes

    Authors: Yin Chen, Shan Ren

    Abstract: We use algebraic invariant theory to study three weight enumerators of self-dual quantum codes over finite fields. We show that the weight enumerators of self-dual quantum codes can be expressed algebraically by two polynomials and the double weight enumerators of self-dual quantum codes can be expressed algebraically by five polynomials. We also explicitly compute the complete weight enumerators… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 17 pages

    MSC Class: 94B50; 13A50

  3. arXiv:2409.03403  [pdf, other

    cs.RO

    RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning

    Authors: Lawrence Yunliang Chen, Chenfeng Xu, Karthik Dharmarajan, Zubair Irshad, Richard Cheng, Kurt Keutzer, Masayoshi Tomizuka, Quan Vuong, Ken Goldberg

    Abstract: Scaling up robot learning requires large and diverse datasets, and how to efficiently reuse collected data and transfer policies to new embodiments remains an open question. Emerging research such as the Open-X Embodiment (OXE) project has shown promise in leveraging skills by combining datasets including different robots. However, imbalances in the distribution of robot types and camera angles in… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: CoRL 2024 (Oral)

  4. arXiv:2409.03251  [pdf, other

    cs.HC cs.LG eess.SY

    Dual-TSST: A Dual-Branch Temporal-Spectral-Spatial Transformer Model for EEG Decoding

    Authors: Hongqi Li, Haodong Zhang, Yitong Chen

    Abstract: The decoding of electroencephalography (EEG) signals allows access to user intentions conveniently, which plays an important role in the fields of human-machine interaction. To effectively extract sufficient characteristics of the multichannel EEG, a novel decoding architecture network with a dual-branch temporal-spectral-spatial transformer (Dual-TSST) is proposed in this study. Specifically, by… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  5. arXiv:2409.03218  [pdf, other

    cs.PF cs.LG

    Application Research On Real-Time Perception Of Device Performance Status

    Authors: Zhe Wang, Zhen Wang, Jianwen Wu, Wangzhong Xiao, Yidong Chen, Zihua Feng, Dian Yang, Hongchen Liu, Bo Liang, Jiaojiao Fu

    Abstract: In order to accurately identify the performance status of mobile devices and finely adjust the user experience, a real-time performance perception evaluation method based on TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) combined with entropy weighting method and time series model construction was studied. After collecting the performance characteristics of various mobile… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  6. arXiv:2409.03193  [pdf, other

    cs.RO

    Upper-Limb Rehabilitation with a Dual-Mode Individualized Exoskeleton Robot: A Generative-Model-Based Solution

    Authors: Yu Chen, Shu Miao, Jing Ye, Gong Chen, Jianghua Cheng, Ketao Du, Xiang Li

    Abstract: Several upper-limb exoskeleton robots have been developed for stroke rehabilitation, but their rather low level of individualized assistance typically limits their effectiveness and practicability. Individualized assistance involves an upper-limb exoskeleton robot continuously assessing feedback from a stroke patient and then meticulously adjusting interaction forces to suit specific conditions an… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  7. arXiv:2409.02977  [pdf, other

    cs.SE cs.AI

    Large Language Model-Based Agents for Software Engineering: A Survey

    Authors: Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, Yiling Lou

    Abstract: The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the versatility and expertise of LLMs by enhancing LLMs with the capabilities of perceiving and utilizing external resources and tools. To date, LLM-based agents have been applied and shown remarkable effectiveness in… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  8. arXiv:2409.02908  [pdf, other

    cs.LG cs.AI cs.CL

    Masked Diffusion Models are Secretly Time-Agnostic Masked Models and Exploit Inaccurate Categorical Sampling

    Authors: Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang

    Abstract: Masked diffusion models (MDMs) have emerged as a popular research topic for generative modeling of discrete data, thanks to their superior performance over other discrete diffusion models, and are rivaling the auto-regressive models (ARMs) for language modeling tasks. The recent effort in simplifying the masked diffusion framework further leads to alignment with continuous-space diffusion models a… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 40 pages

  9. arXiv:2409.02877  [pdf, other

    cs.AI cs.CL cs.LG

    Configurable Foundation Models: Building LLMs from a Modular Perspective

    Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

    Abstract: Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  10. arXiv:2409.02616  [pdf, other

    cs.IT

    Group Information Geometry Approach for Ultra-Massive MIMO Signal Detection

    Authors: Jiyuan Yang, Yan Chen, Xiqi Gao, Xiang-Gen Xia, Dirk Slock

    Abstract: We propose a group information geometry approach (GIGA) for ultra-massive multiple-input multiple-output (MIMO) signal detection. The signal detection task is framed as computing the approximate marginals of the a posteriori distribution of the transmitted data symbols of all users. With the approximate marginals, we perform the maximization of the {\textsl{a posteriori}} marginals (MPM) detection… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  11. arXiv:2409.02492  [pdf

    cs.CV cs.LG eess.IV

    Reliable Deep Diffusion Tensor Estimation: Rethinking the Power of Data-Driven Optimization Routine

    Authors: Jialong Li, Zhicheng Zhang, Yunwei Chen, Qiqi Lu, Ye Wu, Xiaoming Liu, QianJin Feng, Yanqiu Feng, Xinyuan Zhang

    Abstract: Diffusion tensor imaging (DTI) holds significant importance in clinical diagnosis and neuroscience research. However, conventional model-based fitting methods often suffer from sensitivity to noise, leading to decreased accuracy in estimating DTI parameters. While traditional data-driven deep learning methods have shown potential in terms of accuracy and efficiency, their limited generalization to… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  12. arXiv:2409.02438  [pdf, other

    cs.CV

    Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation

    Authors: Yilong Chen, Zongyi Xu, Xiaoshui Huang, Shanshan Zhao, Xinqi Jiang, Xinyu Gao, Xinbo Gao

    Abstract: Compared to single-modal knowledge distillation, cross-modal knowledge distillation faces more severe challenges due to domain gaps between modalities. Although various methods have proposed various solutions to overcome these challenges, there is still limited research on how domain gaps affect cross-modal knowledge distillation. This paper provides an in-depth analysis and evaluation of this iss… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  13. arXiv:2409.02375  [pdf, other

    cs.CL

    How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review

    Authors: Xichou Zhu, Yang Liu, Zhou Shen, Yi Liu, Min Li, Yujun Chen, Benzi John, Zhenzhen Ma, Tao Hu, Bolong Yang, Manman Wang, Zongxing Xie, Peng Liu, Dan Cai, Junhui Wang

    Abstract: The recent advances in large language models (LLMs) have significantly expanded their applications across various fields such as language generation, summarization, and complex question answering. However, their application to privacy compliance and technical privacy reviews remains under-explored, raising critical concerns about their ability to adhere to global privacy standards and protect sens… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 8 pages, 4 figures

  14. arXiv:2409.02370  [pdf, other

    cs.CL cs.AI

    Do Large Language Models Possess Sensitive to Sentiment?

    Authors: Yang Liu, Xichou Zhu, Zhou Shen, Yi Liu, Min Li, Yujun Chen, Benzi John, Zhenzhen Ma, Tao Hu, Zhiyang Xu, Wei Luo, Junhui Wang

    Abstract: Large Language Models (LLMs) have recently displayed their extraordinary capabilities in language understanding. However, how to comprehensively assess the sentiment capabilities of LLMs continues to be a challenge. This paper investigates the ability of LLMs to detect and react to sentiment in text modal. As the integration of LLMs into diverse applications is on the rise, it becomes highly criti… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 10 pages, 2 figures

  15. arXiv:2409.02123  [pdf, other

    cs.LG cs.AI physics.ao-ph

    PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

    Authors: Shengchen Zhu, Yiming Chen, Peiying Yu, Xiang Qu, Yuxiao Zhou, Yiming Ma, Zhizhan Zhao, Yukai Liu, Hao Mi, Bin Wang

    Abstract: Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechani… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  16. arXiv:2409.02074  [pdf, other

    cs.CR cs.HC cs.LG cs.SE

    RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer

    Authors: Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, Wenyuan Xu

    Abstract: Malicious shell commands are linchpins to many cyber-attacks, but may not be easy to understand by security analysts due to complicated and often disguised code structures. Advances in large language models (LLMs) have unlocked the possibility of generating understandable explanations for shell commands. However, existing general-purpose LLMs suffer from a lack of expert knowledge and a tendency t… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by NDSS Symposium 2025. Please cite this paper as "Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, Wenyuan Xu. RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer. In the 32nd Annual Network and Distributed System Security Symposium (NDSS 2025)."

  17. arXiv:2409.01966  [pdf, other

    cs.CV

    MetaFood3D: Large 3D Food Object Dataset with Nutrition Values

    Authors: Yuhao Chen, Jiangpeng He, Chris Czarnecki, Gautham Vinod, Talha Ibn Mahmud, Siddeshwar Raghavan, Jinge Ma, Dayou Mao, Saeejith Nair, Pengcheng Xi, Alexander Wong, Edward Delp, Fengqing Zhu

    Abstract: Food computing is both important and challenging in computer vision (CV). It significantly contributes to the development of CV algorithms due to its frequent presence in datasets across various applications, ranging from classification and instance segmentation to 3D reconstruction. The polymorphic shapes and textures of food, coupled with high variation in forms and vast multimodal information,… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Dataset is coming soon

  18. arXiv:2409.01695  [pdf, other

    cs.SD cs.AI eess.AS

    USTC-KXDIGIT System Description for ASVspoof5 Challenge

    Authors: Yihao Chen, Haochen Wu, Nan Jiang, Xiang Xia, Qing Gu, Yunqi Hao, Pengfei Cai, Yu Guan, Jialong Wang, Weilin Xie, Lei Fang, Sian Fang, Yan Song, Wu Guo, Lin Liu, Minqiang Xu

    Abstract: This paper describes the USTC-KXDIGIT system submitted to the ASVspoof5 Challenge for Track 1 (speech deepfake detection) and Track 2 (spoofing-robust automatic speaker verification, SASV). Track 1 showcases a diverse range of technical qualities from potential processing algorithms and includes both open and closed conditions. For these conditions, our system consists of a cascade of a frontend f… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: ASVspoof5 workshop paper

  19. arXiv:2409.01585  [pdf, other

    cs.LG cs.DC

    Buffer-based Gradient Projection for Continual Federated Learning

    Authors: Shenghong Dai, Jy-yong Sohn, Yicong Chen, S M Iftekharul Alam, Ravikumar Balakrishnan, Suman Banerjee, Nageen Himayat, Kangwook Lee

    Abstract: Continual Federated Learning (CFL) is essential for enabling real-world applications where multiple decentralized clients adaptively learn from continuous data streams. A significant challenge in CFL is mitigating catastrophic forgetting, where models lose previously acquired knowledge when learning new information. Existing approaches often face difficulties due to the constraints of device stora… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: A preliminary version of this work was presented at the Federated Learning Systems (FLSys) Workshop @ Sixth Conference on Machine Learning and Systems, June 2023

  20. arXiv:2409.01502  [pdf, other

    cs.CV cs.AI cs.GR

    AMG: Avatar Motion Guided Video Generation

    Authors: Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang

    Abstract: Human video generation task has gained significant attention with the advancement of deep generative models. Generating realistic videos with human movements is challenging in nature, due to the intricacies of human body topology and sensitivity to visual artifacts. The extensively studied 2D media generation methods take advantage of massive human media datasets, but struggle with 3D-aware contro… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: The project page is at https://github.com/zshyang/amg

  21. arXiv:2409.01410  [pdf, other

    cs.LG stat.CO

    Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning

    Authors: Vyacheslav Kungurtsev, Yuanfang Peng, Jianyang Gu, Saeed Vahidian, Anthony Quinn, Fadwa Idlahcen, Yiran Chen

    Abstract: Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather t… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  22. arXiv:2409.01348  [pdf, other

    cs.CV cs.CE cs.LG

    PatternPaint: Generating Layout Patterns Using Generative AI and Inpainting Techniques

    Authors: Guanglei Zhou, Bhargav Korrapati, Gaurav Rajavendra Reddy, Jiang Hu, Yiran Chen, Dipto G. Thakurta

    Abstract: Generation of VLSI layout patterns is essential for a wide range of Design For Manufacturability (DFM) studies. In this study, we investigate the potential of generative machine learning models for creating design rule legal metal layout patterns. Our results demonstrate that the proposed model can generate legal patterns in complex design rule settings and achieves a high diversity score. The des… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  23. arXiv:2409.01195  [pdf, other

    eess.IV cs.CV physics.med-ph

    Ground-truth effects in learning-based fiber orientation distribution estimation in neonatal brains

    Authors: Rizhong Lin, Hamza Kebiri, Ali Gholipour, Yufei Chen, Jean-Philippe Thiran, Davood Karimi, Meritxell Bach Cuadra

    Abstract: Diffusion Magnetic Resonance Imaging (dMRI) is a non-invasive method for depicting brain microstructure in vivo. Fiber orientation distributions (FODs) are mathematical representations extensively used to map white matter fiber configurations. Recently, FOD estimation with deep neural networks has seen growing success, in particular, those of neonates estimated with fewer diffusion measurements. T… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 11 pages, 4 figures; accepted as an Oral Presentation at the MICCAI 2024 Workshop on Computational Diffusion MRI (CDMRI) in Marrakech, Morocco

  24. arXiv:2409.01179  [pdf, other

    cs.CV

    Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

    Authors: Yi Chen, Jian Xu, Xu-Yao Zhang, Wen-Zhuo Liu, Yang-Yang Liu, Cheng-Lin Liu

    Abstract: With the advancement of large-scale language modeling techniques, large multimodal models combining visual encoders with large language models have demonstrated exceptional performance in various visual tasks. Most of the current large-scale multimodal models achieve this by mapping visual features obtained from the visual encoder into a large language model and using them as inputs alongside text… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  25. arXiv:2409.01162  [pdf, other

    cs.CV

    Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction

    Authors: Gaotong Yu, Yi Chen, Jian Xu

    Abstract: Recently, multimodal large language models (MM-LLMs) have achieved great success in many multimodal tasks, but their high computational costs limit their further promotion and application. In the MM-LLMs framework, the main computational consumption step is the processing of concatenated text and visual tokens at the LLM layer. The length of the input token for LLM directly affects the overall tra… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  26. arXiv:2409.01014  [pdf, other

    cs.CV cs.AI

    From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model

    Authors: Xiaojie Xu, Tianshuo Xu, Fulong Ma, Yingcong Chen

    Abstract: We explore Bird's-Eye View (BEV) generation, converting a BEV map into its corresponding multi-view street images. Valued for its unified spatial representation aiding multi-sensor fusion, BEV is pivotal for various autonomous driving applications. Creating accurate street-view images from BEV maps is essential for portraying complex traffic scenarios and enhancing driving algorithms. Concurrently… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted at International Conference on Robotics and Automation(ICRA)

  27. arXiv:2409.01011  [pdf, other

    cs.CL cs.CV

    Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts

    Authors: Yingfa Chen, Chenlong Hu, Cong Feng, Chenyang Song, Shi Yu, Xu Han, Zhiyuan Liu, Maosong Sun

    Abstract: This study presents a multi-modal multi-granularity tokenizer specifically designed for analyzing ancient Chinese scripts, focusing on the Chu bamboo slip (CBS) script used during the Spring and Autumn and Warring States period (771-256 BCE) in Ancient China. Considering the complex hierarchical structure of ancient Chinese scripts, where a single character may be a combination of multiple sub-cha… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 12 pages, 3 figures

  28. arXiv:2409.00924  [pdf, other

    cs.CV

    MedSAM-U: Uncertainty-Guided Auto Multi-Prompt Adaptation for Reliable MedSAM

    Authors: Nan Zhou, Ke Zou, Kai Ren, Mengting Luo, Linchao He, Meng Wang, Yidi Chen, Yi Zhang, Hu Chen, Huazhu Fu

    Abstract: The Medical Segment Anything Model (MedSAM) has shown remarkable performance in medical image segmentation, drawing significant attention in the field. However, its sensitivity to varying prompt types and locations poses challenges. This paper addresses these challenges by focusing on the development of reliable prompts that enhance MedSAM's accuracy. We introduce MedSAM-U, an uncertainty-guided f… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 10 pages, 4 figures

  29. arXiv:2409.00895  [pdf, other

    cs.RO

    Whole-Body Control Through Narrow Gaps From Pixels To Action

    Authors: Tianyue Wu, Yeke Chen, Tianyang Chen, Guangyu Zhao, Fei Gao

    Abstract: Flying through body-size narrow gaps in the environment is one of the most challenging moments for an underactuated multirotor. We explore a purely data-driven method to master this flight skill in simulation, where a neural network directly maps pixels and proprioception to continuous low-level control commands. This learned policy enables whole-body control through gaps with different geometries… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 9 pages, 8 figures, 2 tables

  30. arXiv:2409.00843  [pdf, other

    econ.GN cs.CE cs.CY q-fin.CP stat.ML

    Global Public Sentiment on Decentralized Finance: A Spatiotemporal Analysis of Geo-tagged Tweets from 150 Countries

    Authors: Yuqi Chen, Yifan Li, Kyrie Zhixuan Zhou, Xiaokang Fu, Lingbo Liu, Shuming Bao, Daniel Sui, Luyao Zhang

    Abstract: In the digital era, blockchain technology, cryptocurrencies, and non-fungible tokens (NFTs) have transformed financial and decentralized systems. However, existing research often neglects the spatiotemporal variations in public sentiment toward these technologies, limiting macro-level insights into their global impact. This study leverages Twitter data to explore public attention and sentiment acr… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  31. arXiv:2409.00726  [pdf, other

    cs.CV cs.AI

    LPUWF-LDM: Enhanced Latent Diffusion Model for Precise Late-phase UWF-FA Generation on Limited Dataset

    Authors: Zhaojie Fang, Xiao Yu, Guanyu Zhou, Ke Zhuang, Yifei Chen, Ruiquan Ge, Changmiao Wang, Gangyong Jia, Qing Wu, Juan Ye, Maimaiti Nuliqiman, Peifang Xu, Ahmed Elazab

    Abstract: Ultra-Wide-Field Fluorescein Angiography (UWF-FA) enables precise identification of ocular diseases using sodium fluorescein, which can be potentially harmful. Existing research has developed methods to generate UWF-FA from Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) to reduce the adverse reactions associated with injections. However, these methods have been less effective in producin… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 13 pages, 7 figures

  32. arXiv:2409.00597  [pdf, other

    cs.MM cs.CL

    Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model

    Authors: Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, Bowen Zhang

    Abstract: Stance detection, which aims to identify public opinion towards specific targets using social media data, is an important yet challenging task. With the proliferation of diverse multimodal social media content including text, and images multimodal stance detection (MSD) has become a crucial research area. However, existing MSD studies have focused on modeling stance within individual text-image pa… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: ACM MM2024

  33. arXiv:2408.17424  [pdf, other

    cs.CV cs.HC

    CinePreGen: Camera Controllable Video Previsualization via Engine-powered Diffusion

    Authors: Yiran Chen, Anyi Rao, Xuekun Jiang, Shishi Xiao, Ruiqing Ma, Zeyu Wang, Hui Xiong, Bo Dai

    Abstract: With advancements in video generative AI models (e.g., SORA), creators are increasingly using these techniques to enhance video previsualization. However, they face challenges with incomplete and mismatched AI workflows. Existing methods mainly rely on text descriptions and struggle with camera placement, a key component of previsualization. To address these issues, we introduce CinePreGen, a visu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  34. arXiv:2408.17347  [pdf, other

    cs.CV

    LSMS: Language-guided Scale-aware MedSegmentor for Medical Image Referring Segmentation

    Authors: Shuyi Ouyang, Jinyang Zhang, Xiangye Lin, Xilai Wang, Qingqing Chen, Yen-Wei Chen, Lanfen Lin

    Abstract: Conventional medical image segmentation methods have been found inadequate in facilitating physicians with the identification of specific lesions for diagnosis and treatment. Given the utility of text as an instructional format, we introduce a novel task termed Medical Image Referring Segmentation (MIRS), which requires segmenting specified lesions in images based on the given language expressions… ▽ More

    Submitted 2 September, 2024; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: 14 pages, 5 figures

    ACM Class: I.4.6

  35. arXiv:2408.17180  [pdf, other

    cs.AI cs.GT cs.IR cs.LG cs.MA

    Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis

    Authors: Chiu-Chou Lin, Yu-Wei Shih, Kuei-Ting Kuo, Yu-Cheng Chen, Chien-Hua Chen, Wei-Chen Chiu, I-Chen Wu

    Abstract: How can balance be quantified in game settings? This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions-such as hero combinations in multiplayer online battle arena (MOBA) games or decks in card games-is essential for enhancing gameplay and achieving balance. We have developed two advance… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: TMLR 09/2024 https://openreview.net/forum?id=2D36otXvBE

  36. arXiv:2408.17009  [pdf, other

    cs.SD eess.AS

    Utilizing Speaker Profiles for Impersonation Audio Detection

    Authors: Hao Gu, JiangYan Yi, Chenglong Wang, Yong Ren, Jianhua Tao, Xinrui Yan, Yujie Chen, Xiaohui Zhang

    Abstract: Fake audio detection is an emerging active topic. A growing number of literatures have aimed to detect fake utterance, which are mostly generated by Text-to-speech (TTS) or voice conversion (VC). However, countermeasures against impersonation remain an underexplored area. Impersonation is a fake type that involves an imitator replicating specific traits and speech style of a target speaker. Unlike… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM2024

  37. arXiv:2408.16862  [pdf, other

    stat.ML cs.LG

    Probabilistic Decomposed Linear Dynamical Systems for Robust Discovery of Latent Neural Dynamics

    Authors: Yenho Chen, Noga Mudrik, Kyle A. Johnsen, Sankaraleengam Alagapan, Adam S. Charles, Christopher J. Rozell

    Abstract: Time-varying linear state-space models are powerful tools for obtaining mathematically interpretable representations of neural signals. For example, switching and decomposed models describe complex systems using latent variables that evolve according to simple locally linear dynamics. However, existing methods for latent variable estimation are not robust to dynamical noise and system nonlinearity… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  38. An Effective Information Theoretic Framework for Channel Pruning

    Authors: Yihao Chen, Zefang Wang

    Abstract: Channel pruning is a promising method for accelerating and compressing convolutional neural networks. However, current pruning algorithms still remain unsolved problems that how to assign layer-wise pruning ratios properly and discard the least important channels with a convincing criterion. In this paper, we present a novel channel pruning approach via information theory and interpretability of n… ▽ More

    Submitted 2 September, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  39. arXiv:2408.16532  [pdf, other

    eess.AS cs.LG cs.MM cs.SD eess.SP

    WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

    Authors: Shengpeng Ji, Ziyue Jiang, Xize Cheng, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Wen Wang, Zhou Zhao

    Abstract: Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domai… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Working in progress. arXiv admin note: text overlap with arXiv:2402.12208

  40. arXiv:2408.16463  [pdf, other

    cs.LG

    An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines

    Authors: Changwei Song, Qing Zhao, Jianqiang Li, Yining Chen, Yongsheng Tong, Guanghui Fu

    Abstract: Psychological support hotlines are an effective suicide prevention measure that typically relies on professionals using suicide risk assessment scales to predict individual risk scores. However, the accuracy of scale-based predictive methods for suicide risk assessment can vary widely depending on the expertise of the operator. This limitation underscores the need for more reliable methods, prompt… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  41. arXiv:2408.16343  [pdf, other

    cs.CV cs.AI

    Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach

    Authors: Yifei Chen, Shenghao Zhu, Zhaojie Fang, Chang Liu, Binfeng Zou, Yuhe Wang, Shuo Chang, Fan Jia, Feiwei Qin, Jin Fan, Yong Peng, Changmiao Wang

    Abstract: Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures

  42. arXiv:2408.16308  [pdf, other

    cs.SI

    AdaMotif: Graph Simplification via Adaptive Motif Design

    Authors: Hong Zhou, Peifeng Lai, Zhida Sun, Xiangyuan Chen, Yang Chen, Huisi Wu, Yong Wang

    Abstract: With the increase of graph size, it becomes difficult or even impossible to visualize graph structures clearly within the limited screen space. Consequently, it is crucial to design effective visual representations for large graphs. In this paper, we propose AdaMotif, a novel approach that can capture the essential structure patterns of large graphs and effectively reveal the overall structures vi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  43. PACiM: A Sparsity-Centric Hybrid Compute-in-Memory Architecture via Probabilistic Approximation

    Authors: Wenlun Zhang, Shimpei Ando, Yung-Chin Chen, Satomi Miyagi, Shinya Takamaeda-Yamazaki, Kentaro Yoshioka

    Abstract: Approximate computing emerges as a promising approach to enhance the efficiency of compute-in-memory (CiM) systems in deep neural network processing. However, traditional approximate techniques often significantly trade off accuracy for power efficiency, and fail to reduce data transfer between main memory and CiM banks, which dominates power consumption. This paper introduces a novel probabilisti… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Journal ref: IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2024)

  44. arXiv:2408.16208  [pdf, other

    cs.LG cs.CL

    ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics

    Authors: Oishi Banerjee, Agustina Saenz, Kay Wu, Warren Clements, Adil Zia, Dominic Buensalido, Helen Kavnoudias, Alain S. Abi-Ghanem, Nour El Ghawi, Cibele Luna, Patricia Castillo, Khaled Al-Surimi, Rayyan A. Daghistani, Yuh-Min Chen, Heng-sheng Chao, Lars Heiliger, Moon Kim, Johannes Haubold, Frederic Jonske, Pranav Rajpurkar

    Abstract: Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First,… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  45. arXiv:2408.15980  [pdf, other

    cs.RO cs.AI

    In-Context Imitation Learning via Next-Token Prediction

    Authors: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg

    Abstract: We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor traj… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  46. arXiv:2408.15925  [pdf, ps, other

    cs.IT math.CO

    Explicit Folded Reed-Solomon and Multiplicity Codes Achieve Relaxed Generalized Singleton Bound

    Authors: Yeyuan Chen, Zihan Zhang

    Abstract: In this paper, we prove that any `appropriate' folded Reed-Solomon and univariate multiplicity codes achieve relaxed generalized Singleton bound for list size $L\ge1.$ More concretely, we show the following: (1) Any $(s,γ)$-folded RS code over the alphabet $\mathbb{F}_q^s$ of block length $n$ and rate $R$ with pair-wise distinct evaluation points… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  47. arXiv:2408.15609  [pdf, other

    cs.NI cs.LG

    Statistical QoS Provision in Business-Centric Networks

    Authors: Chang Wu, Yuang Chen, Hancheng Lu

    Abstract: More refined resource management and Quality of Service (QoS) provisioning is a critical goal of wireless communication technologies. In this paper, we propose a novel Business-Centric Network (BCN) aimed at enabling scalable QoS provisioning, based on a cross-layer framework that captures the relationship between application, transport parameters, and channels. We investigate both continuous flow… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages

  48. arXiv:2408.15524  [pdf, other

    cs.CV

    Ray-Distance Volume Rendering for Neural Scene Reconstruction

    Authors: Ruihong Yin, Yunlu Chen, Sezer Karaoglu, Theo Gevers

    Abstract: Existing methods in neural scene reconstruction utilize the Signed Distance Function (SDF) to model the density function. However, in indoor scenes, the density computed from the SDF for a sampled point may not consistently reflect its real importance in volume rendering, often due to the influence of neighboring objects. To tackle this issue, our work proposes a novel approach for indoor scene re… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024

  49. arXiv:2408.15488  [pdf, other

    cs.CL

    Legilimens: Practical and Unified Content Moderation for Large Language Model Services

    Authors: Jialin Wu, Jiangyi Deng, Shengyuan Pang, Yanjiao Chen, Jiayang Xu, Xinfeng Li, Wenyuan Xu

    Abstract: Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we… ▽ More

    Submitted 5 September, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024

  50. arXiv:2408.15242  [pdf, other

    cs.CV

    Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

    Authors: Saining Zhang, Baijun Ye, Xiaoxue Chen, Yuantao Chen, Zongzheng Zhang, Cheng Peng, Yongliang Shi, Hao Zhao

    Abstract: Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuiti… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: BMVC2024 Project Page: https://sainingzhang.github.io/project/uc-gs/ Code: https://github.com/SainingZhang/uc-gs/