Skip to main content

Showing 1–50 of 3,289 results for author: Wu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03514  [pdf, other

    cs.CV

    Blended Latent Diffusion under Attention Control for Real-World Video Editing

    Authors: Deyin Liu, Lin Yuanbo Wu, Xianghua Xie

    Abstract: Due to lack of fully publicly available text-to-video models, current video editing methods tend to build on pre-trained text-to-image generation models, however, they still face grand challenges in dealing with the local editing of video with temporal information. First, although existing methods attempt to focus on local area editing by a pre-defined mask, the preservation of the outside-area ba… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.03283  [pdf, other

    cs.SD eess.AS

    FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications

    Authors: Hao-Han Guo, Kun Liu, Fei-Yu Shen, Yi-Chen Wu, Feng-Long Xie, Kun Xie, Kai-Tuo Xu

    Abstract: This work proposes FireRedTTS, a foundation text-to-speech framework, to meet the growing demands for personalized and diverse generative speech applications. The framework comprises three parts: data processing, foundation system, and downstream applications. First, we comprehensively present our data processing pipeline, which transforms massive raw audio into a large-scale high-quality TTS data… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.03203  [pdf, other

    cs.CL cs.AI

    An Effective Deployment of Diffusion LM for Data Augmentation in Low-Resource Sentiment Classification

    Authors: Zhuowei Chen, Lianxi Wang, Yuben Wu, Xinfeng Liao, Yujia Tian, Junyang Zhong

    Abstract: Sentiment classification (SC) often suffers from low-resource challenges such as domain-specific contexts, imbalanced label distributions, and few-shot scenarios. The potential of the diffusion language model (LM) for textual data augmentation (DA) remains unexplored, moreover, textual DA methods struggle to balance the diversity and consistency of new samples. Most DA methods either perform logic… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.02492  [pdf

    cs.CV cs.LG eess.IV

    Reliable Deep Diffusion Tensor Estimation: Rethinking the Power of Data-Driven Optimization Routine

    Authors: Jialong Li, Zhicheng Zhang, Yunwei Chen, Qiqi Lu, Ye Wu, Xiaoming Liu, QianJin Feng, Yanqiu Feng, Xinyuan Zhang

    Abstract: Diffusion tensor imaging (DTI) holds significant importance in clinical diagnosis and neuroscience research. However, conventional model-based fitting methods often suffer from sensitivity to noise, leading to decreased accuracy in estimating DTI parameters. While traditional data-driven deep learning methods have shown potential in terms of accuracy and efficiency, their limited generalization to… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  5. arXiv:2409.02076  [pdf, other

    cs.CL

    Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models

    Authors: Yuhao Wu, Ming Shan Hee, Zhiqing Hu, Roy Ka-Wei Lee

    Abstract: The abilities of long-context language models (LMs) are often evaluated using the "Needle-in-a-Haystack" (NIAH) test, which comprises tasks designed to assess a model's ability to identify specific information ("needle") within large text sequences ("haystack"). While these benchmarks measure how well models understand long-context input sequences, they do not effectively gauge the quality of long… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  6. arXiv:2409.02070  [pdf, other

    eess.IV cs.CV

    Explicit Differentiable Slicing and Global Deformation for Cardiac Mesh Reconstruction

    Authors: Yihao Luo, Dario Sesia, Fanwen Wang, Yinzhe Wu, Wenhao Ding, Jiahao Huang, Fadong Shi Anoop Shah, Amit Kaural, Jamil Mayet, Guang Yang, ChoonHwai Yap

    Abstract: Mesh reconstruction of the cardiac anatomy from medical images is useful for shape and motion measurements and biophysics simulations to facilitate the assessment of cardiac function and health. However, 3D medical images are often acquired as 2D slices that are sparsely sampled and noisy, and mesh reconstruction on such data is a challenging task. Traditional voxel-based approaches rely on pre- a… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  7. arXiv:2409.01563  [pdf, other

    cs.IR

    Blockchain-based Federated Recommendation with Incentive Mechanism

    Authors: Jianhai Chen, Yanlin Wu, Dazhong Rong, Guoyao Yu, Lingqi Jiang, Zhenguang Liu, Peng Zhou, Rui Shen

    Abstract: Nowadays, federated recommendation technology is rapidly evolving to help multiple organisations share data and train models while meeting user privacy, data security and government regulatory requirements. However, federated recommendation increases customer system costs such as power, computational and communication resources. Besides, federated recommendation systems are also susceptible to mod… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: This paper has been accepted on 2024 Blockchain and Web3 Technology Innovation and Application Exchange Conference (BWTAC 2024)

  8. arXiv:2409.01100  [pdf, other

    cs.CV

    OCMG-Net: Neural Oriented Normal Refinement for Unstructured Point Clouds

    Authors: Yingrui Wu, Mingyang Zhao, Weize Quan, Jian Shi, Xiaohong Jia, Dong-Ming Yan

    Abstract: We present a robust refinement method for estimating oriented normals from unstructured point clouds. In contrast to previous approaches that either suffer from high computational complexity or fail to achieve desirable accuracy, our novel framework incorporates sign orientation and data augmentation in the feature space to refine the initial oriented normals, striking a balance between efficiency… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 18 pages, 16 figures

    ACM Class: I.2; I.3

  9. arXiv:2409.01037  [pdf, other

    cs.CL

    NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset

    Authors: Ke Chang, Hao Li, Junzhao Zhang, Yunfang Wu

    Abstract: Metaphor and sarcasm are common figurative expressions in people's communication, especially on the Internet or the memes popular among teenagers. We create a new benchmark named NYK-MS (NewYorKer for Metaphor and Sarcasm), which contains 1,583 samples for metaphor understanding tasks and 1,578 samples for sarcasm understanding tasks. These tasks include whether it contains metaphor/sarcasm, which… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 13 pages, 6 figures

  10. arXiv:2409.00985  [pdf, other

    cs.SE cs.AI cs.CL

    Co-Learning: Code Learning for Multi-Agent Reinforcement Collaborative Framework with Conversational Natural Language Interfaces

    Authors: Jiapeng Yu, Yuqian Wu, Yajing Zhan, Wenhao Guo, Zhou Xu, Raymond Lee

    Abstract: Online question-and-answer (Q\&A) systems based on the Large Language Model (LLM) have progressively diverged from recreational to professional use. This paper proposed a Multi-Agent framework with environmentally reinforcement learning (E-RL) for code correction called Code Learning (Co-Learning) community, assisting beginners to correct code errors independently. It evaluates the performance of… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 12 pages, 8 figures

  11. arXiv:2409.00719  [pdf

    cs.NI

    Reliability-considered Multi-platoon's Groupcasting using the Resource Sharing Method

    Authors: Chung-Ming Huang, Yen-Hung Wu, Duy-Tuan Dao

    Abstract: In the context of 5G platoon communications, the Platoon Leader Vehicle (PLV) employs groupcasting to transmit control messages to Platoon Member Vehicles (PMVs). Due to the restricted transmission power for groupcasting, it may need to pick one PMV as the Platoon Relay Vehicle (PRV) to be responsible for re-groupcasting messages of PLVs. To optimize the usage of limited spectrum resources, resour… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 9 Pages, 11 Figures

    Report number: ETE-DUT-2024 ACM Class: C.2

  12. arXiv:2409.00690  [pdf, other

    cs.CV

    Decoupled and Interactive Regression Modeling for High-performance One-stage 3D Object Detection

    Authors: Weiping Xiao, Yiqiang Wu, Chang Liu, Yu Qin, Xiaomao Li, Liming Xin

    Abstract: Inadequate bounding box modeling in regression tasks constrains the performance of one-stage 3D object detection. Our study reveals that the primary reason lies in two aspects: (1) The limited center-offset prediction seriously impairs the bounding box localization since many highest response positions significantly deviate from object centers. (2) The low-quality sample ignored in regression task… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  13. arXiv:2409.00381  [pdf, other

    cs.CV

    3D Gaussian Splatting for Large-scale 3D Surface Reconstruction from Aerial Images

    Authors: YuanZheng Wu, Jin Liu, Shunping Ji

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has garnered significant attention. However, the unstructured nature of 3DGS poses challenges for large-scale surface reconstruction from aerial images. To address this gap, we propose the first large-scale surface reconstruction method for multi-view stereo (MVS) aerial images based on 3DGS, named Aerial Gaussian Splatting (AGS). Initially, we introduce a da… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 11 pages

  14. arXiv:2409.00206  [pdf, other

    cs.CV cs.RO

    RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning

    Authors: Sha Lu, Xuecheng Xu, Yuxuan Wu, Haojian Lu, Xieyuanli Chen, Rong Xiong, Yue Wang

    Abstract: Global localization using onboard perception sensors, such as cameras and LiDARs, is crucial in autonomous driving and robotics applications when GPS signals are unreliable. Most approaches achieve global localization by sequential place recognition and pose estimation. Some of them train separate models for each task, while others employ a single model with dual heads, trained jointly with separa… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: 23 pages, 19 figures

  15. arXiv:2408.17285  [pdf, other

    cs.CR cs.LG

    Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution

    Authors: Yixin Wu, Yun Shen, Michael Backes, Yang Zhang

    Abstract: Text-to-image models, such as Stable Diffusion (SD), undergo iterative updates to improve image quality and address concerns such as safety. Improvements in image quality are straightforward to assess. However, how model updates resolve existing concerns and whether they raise new questions remain unexplored. This study takes an initial step in investigating the evolution of text-to-image models f… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: To Appear in the ACM Conference on Computer and Communications Security, October 14-18, 2024

  16. arXiv:2408.17162  [pdf, other

    cs.LG cs.AI

    Deep Feature Embedding for Tabular Data

    Authors: Yuqian Wu, Hengyi Luo, Raymond S. T. Lee

    Abstract: Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper proposes a novel deep embedding framework with leverages lightweight deep neural networks to generate effective feature embeddings for tabular data in machine lear… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 15 pages, 2figures, accepted to ICONIP 2024, Paper ID: 1399

  17. arXiv:2408.17097  [pdf, other

    cs.NI

    Reasoning AI Performance Degradation in 6G Networks with Large Language Models

    Authors: Liming Huang, Yulei Wu, Dimitra Simeonidou

    Abstract: The integration of Artificial Intelligence (AI) within 6G networks is poised to revolutionize connectivity, reliability, and intelligent decision-making. However, the performance of AI models in these networks is crucial, as any decline can significantly impact network efficiency and the services it supports. Understanding the root causes of performance degradation is essential for maintaining opt… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  18. arXiv:2408.17017  [pdf, other

    cs.CL cs.AI

    Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling

    Authors: Guangya Wan, Yuqi Wu, Jie Chen, Sheng Li

    Abstract: Self-Consistency (SC) is a widely used method to mitigate hallucinations in Large Language Models (LLMs) by sampling the LLM multiple times and outputting the most frequent solution. Despite its benefits, SC results in significant computational costs proportional to the number of samples generated. Previous early-stopping approaches, such as Early Stopping Self Consistency and Adaptive Consistency… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  19. arXiv:2408.16965  [pdf, other

    cs.CV

    Contrastive Learning with Synthetic Positives

    Authors: Dewen Zeng, Yawen Wu, Xinrong Hu, Xiaowei Xu, Yiyu Shi

    Abstract: Contrastive learning with the nearest neighbor has proved to be one of the most efficient self-supervised learning (SSL) techniques by utilizing the similarity of multiple instances within the same class. However, its efficacy is constrained as the nearest neighbor algorithm primarily identifies ``easy'' positive pairs, where the representations are already closely located in the embedding space.… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 8 pages, conference

  20. arXiv:2408.16684  [pdf, other

    cs.CV

    PartFormer: Awakening Latent Diverse Representation from Vision Transformer for Object Re-Identification

    Authors: Lei Tan, Pingyang Dai, Jie Chen, Liujuan Cao, Yongjian Wu, Rongrong Ji

    Abstract: Extracting robust feature representation is critical for object re-identification to accurately identify objects across non-overlapping cameras. Although having a strong representation ability, the Vision Transformer (ViT) tends to overfit on most distinct regions of training data, limiting its generalizability and attention to holistic object features. Meanwhile, due to the structural difference… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  21. arXiv:2408.15978  [pdf, other

    cs.AI

    WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

    Authors: Yao Zhang, Zijian Ma, Yunpu Ma, Zhen Han, Yu Wu, Volker Tresp

    Abstract: LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  22. arXiv:2408.15663  [pdf, other

    cs.RO

    NeuroVE: Brain-inspired Linear-Angular Velocity Estimation with Spiking Neural Networks

    Authors: Xiao Li, Xieyuanli Chen, Ruibin Guo, Yujie Wu, Zongtan Zhou, Fangwen Yu, Huimin Lu

    Abstract: Vision-based ego-velocity estimation is a fundamental problem in robot state estimation. However, the constraints of frame-based cameras, including motion blur and insufficient frame rates in dynamic settings, readily lead to the failure of conventional velocity estimation techniques. Mammals exhibit a remarkable ability to accurately estimate their ego-velocity during aggressive movement. Hence,… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  23. arXiv:2408.15563  [pdf, other

    cs.DB

    Order-preserving pattern mining with forgetting mechanism

    Authors: Yan Li, Chenyu Ma, Rong Gao, Youxi Wu, Jinyan Li, Wenjian Wang, Xindong Wu

    Abstract: Order-preserving pattern (OPP) mining is a type of sequential pattern mining method in which a group of ranks of time series is used to represent an OPP. This approach can discover frequent trends in time series. Existing OPP mining algorithms consider data points at different time to be equally important; however, newer data usually have a more significant impact, while older data have a weaker i… ▽ More

    Submitted 2 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  24. arXiv:2408.14925  [pdf, other

    cs.NE cs.AI

    Distance-Forward Learning: Enhancing the Forward-Forward Algorithm Towards High-Performance On-Chip Learning

    Authors: Yujie Wu, Siyuan Xu, Jibin Wu, Lei Deng, Mingkun Xu, Qinghao Wen, Guoqi Li

    Abstract: The Forward-Forward (FF) algorithm was recently proposed as a local learning method to address the limitations of backpropagation (BP), offering biological plausibility along with memory-efficient and highly parallelized computational benefits. However, it suffers from suboptimal performance and poor generalization, largely due to inadequate theoretical support and a lack of effective learning str… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  25. arXiv:2408.14917  [pdf, other

    cs.NE

    PMSN: A Parallel Multi-compartment Spiking Neuron for Multi-scale Temporal Processing

    Authors: Xinyi Chen, Jibin Wu, Chenxiang Ma, Yinsong Yan, Yujie Wu, Kay Chen Tan

    Abstract: Spiking Neural Networks (SNNs) hold great potential to realize brain-inspired, energy-efficient computational systems. However, current SNNs still fall short in terms of multi-scale temporal processing compared to their biological counterparts. This limitation has resulted in poor performance in many pattern recognition tasks with information that varies across different timescales. To address thi… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  26. arXiv:2408.14523  [pdf, other

    cs.LG cs.AI

    Retrieval Augmented Generation for Dynamic Graph Modeling

    Authors: Yuxia Wu, Yuan Fang, Lizi Liao

    Abstract: Dynamic graph modeling is crucial for analyzing evolving patterns in various applications. Existing approaches often integrate graph neural networks with temporal modules or redefine dynamic graph modeling as a generative sequence task. However, these methods typically rely on isolated historical contexts of the target nodes from a narrow perspective, neglecting occurrences of similar patterns or… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Under review

  27. arXiv:2408.14134  [pdf, other

    cs.LG cs.AI cs.CL cs.SI

    Exploring the Potential of Large Language Models for Heterophilic Graphs

    Authors: Yuxia Wu, Shujie Li, Yuan Fang, Chuan Shi

    Abstract: Graph Neural Networks (GNNs) are essential for various graph-based learning tasks. Notably, classical GNN architectures operate under the assumption of homophily, which posits that connected nodes are likely to share similar features. However, this assumption limits the effectiveness of GNNs in handling heterophilic graphs where connected nodes often exhibit dissimilar characteristics. Existing ap… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Under review

  28. arXiv:2408.13991  [pdf, other

    cs.LG cs.AI

    Dual-CBA: Improving Online Continual Learning via Dual Continual Bias Adaptors from a Bi-level Optimization Perspective

    Authors: Quanziang Wang, Renzhen Wang, Yichen Wu, Xixi Jia, Minghao Zhou, Deyu Meng

    Abstract: In online continual learning (CL), models trained on changing distributions easily forget previously learned knowledge and bias toward newly received tasks. To address this issue, we present Continual Bias Adaptor (CBA), a bi-level framework that augments the classification network to adapt to catastrophic distribution shifts during training, enabling the network to achieve a stable consolidation… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  29. arXiv:2408.13940  [pdf, other

    cs.CL

    CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection and Correction

    Authors: Guangya Wan, Yuqi Wu, Jie Chen, Sheng Li

    Abstract: Chain-of-Thought (CoT) prompting enhances Large Language Models (LLMs) complex reasoning abilities by generating intermediate steps. However, these steps can introduce hallucinations and accumulate errors. We propose the CoT Rerailer to address these challenges, employing self-consistency and multi-agent debate systems to identify and rectify errors in the reasoning process. The CoT Rerailer first… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  30. arXiv:2408.13741  [pdf, other

    cs.CR

    CAMH: Advancing Model Hijacking Attack in Machine Learning

    Authors: Xing He, Jiahao Chen, Yuwen Pu, Qingming Li, Chunyi Zhou, Yingcai Wu, Jinbao Li, Shouling Ji

    Abstract: In the burgeoning domain of machine learning, the reliance on third-party services for model training and the adoption of pre-trained models have surged. However, this reliance introduces vulnerabilities to model hijacking attacks, where adversaries manipulate models to perform unintended tasks, leading to significant security and ethical concerns, like turning an ordinary image classifier into a… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 9 pages

  31. arXiv:2408.13247  [pdf, other

    cs.CR cs.AI cs.CL cs.CY cs.LG

    Data Exposure from LLM Apps: An In-depth Investigation of OpenAI's GPTs

    Authors: Evin Jaff, Yuhao Wu, Ning Zhang, Umar Iqbal

    Abstract: LLM app ecosystems are quickly maturing and supporting a wide range of use cases, which requires them to collect excessive user data. Given that the LLM apps are developed by third-parties and that anecdotal evidence suggests LLM platforms currently do not strictly enforce their policies, user data shared with arbitrary third-parties poses a significant privacy risk. In this paper we aim to bring… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  32. arXiv:2408.13236  [pdf, other

    cs.SI

    Large-scale Collective Dynamics in the Three Iterations of the Reddit r/place Experiment

    Authors: Yutong Wu, Arlei Silva

    Abstract: The Reddit r/place experiments were a series of online social experiments hosted by Reddit in 2017, 2022, and 2023, where users were allowed to update the colors of pixels in a large shared canvas. The largest of these experiments (in 2022) has attracted over 100 million users who collaborated and competed to produce elaborate artworks that together provide a unique view of the shared interests co… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 8 pages, 8 figures

  33. arXiv:2408.13040  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

    Authors: Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address va… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3730-3744, 2024

  34. arXiv:2408.12658  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music

    Authors: Nithya Shikarpur, Krishna Maneesha Dendukuri, Yusong Wu, Antoine Caillon, Cheng-Zhi Anna Huang

    Abstract: Hindustani music is a performance-driven oral tradition that exhibits the rendition of rich melodic patterns. In this paper, we focus on generative modeling of singers' vocal melodies extracted from audio recordings, as the voice is musically prominent within the tradition. Prior generative work in Hindustani music models melodies as coarse discrete symbols which fails to capture the rich expressi… ▽ More

    Submitted 26 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted at International Society for Music Information Retrieval (ISMIR) 2024

  35. arXiv:2408.12285  [pdf, other

    cs.RO

    Tactile-Morph Skills: Energy-Based Control Meets Data-Driven Learning

    Authors: Anran Zhang, Kübra Karacan, Hamid Sadeghian, Yansong Wu, Fan Wu, Sami Haddadin

    Abstract: Robotic manipulation is essential for modernizing factories and automating industrial tasks like polishing, which require advanced tactile abilities. These robots must be easily set up, safely work with humans, learn tasks autonomously, and transfer skills to similar tasks. Addressing these needs, we introduce the tactile-morph skill framework, which integrates unified force-impedance control with… ▽ More

    Submitted 23 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: 15 pages, 7 figures,updated footnote

  36. arXiv:2408.12249  [pdf, other

    cs.CL cs.AI cs.LG

    LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction

    Authors: Aishik Nagar, Viktor Schlegel, Thanh-Tung Nguyen, Hao Li, Yuping Wu, Kuluhan Binici, Stefan Winkler

    Abstract: Large Language Models (LLMs) are increasingly adopted for applications in healthcare, reaching the performance of domain experts on tasks such as question answering and document summarisation. Despite their success on these tasks, it is unclear how well LLMs perform on tasks that are traditionally pursued in the biomedical domain, such as structured information extration. To breach this gap, in th… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 11 pages

  37. arXiv:2408.12214  [pdf, other

    cs.AI

    UNCO: Towards Unifying Neural Combinatorial Optimization through Large Language Model

    Authors: Xia Jiang, Yaoxin Wu, Yuan Wang, Yingqian Zhang

    Abstract: Recently, applying neural networks to address combinatorial optimization problems (COPs) has attracted considerable research attention. The prevailing methods always train deep models independently on specific problems, lacking a unified framework for concurrently tackling various COPs. To this end, we propose a unified neural combinatorial optimization (UNCO) framework to solve different types of… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  38. arXiv:2408.11609  [pdf, other

    cs.CL cs.AI

    Xinyu: An Efficient LLM-based System for Commentary Generation

    Authors: Yiquan Wu, Bo Tang, Chenyang Xi, Yu Yu, Pengyu Wang, Yifei Liu, Kun Kuang, Haiying Deng, Zhiyu Li, Feiyu Xiong, Jie Hu, Peng Cheng, Zhonghao Wang, Yi Wang, Yi Luo, Mingchuan Yang

    Abstract: Commentary provides readers with a deep understanding of events by presenting diverse arguments and evidence. However, creating commentary is a time-consuming task, even for skilled commentators. Large language models (LLMs) have simplified the process of natural language generation, but their direct application in commentary creation still faces challenges due to unique task requirements. These r… ▽ More

    Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    ACM Class: I.2.7

  39. arXiv:2408.11358  [pdf, ps, other

    cs.CY

    Gender Bias Evaluation in Text-to-image Generation: A Survey

    Authors: Yankun Wu, Yuta Nakashima, Noa Garcia

    Abstract: The rapid development of text-to-image generation has brought rising ethical considerations, especially regarding gender bias. Given a text prompt as input, text-to-image models generate images according to the prompt. Pioneering models such as Stable Diffusion and DALL-E 2 have demonstrated remarkable capabilities in producing high-fidelity images from natural language prompts. However, these mod… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  40. arXiv:2408.11311  [pdf, other

    cs.AR quant-ph

    HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

    Authors: Qi Zhou, Zi-Hao Mei, Han-Qing Shi, Liang-Liang Guo, Xiao-Yan Yang, Yun-Jie Wang, Xiao-Fan Xu, Cheng Xue, Wei-Cheng Kong, Jun-Chao Wang, Yu-Chun Wu, Zhao-Yun Chen, Guo-Ping Guo

    Abstract: Quantum computing holds immense potential for addressing a myriad of intricate challenges, which is significantly amplified when scaled to thousands of qubits. However, a major challenge lies in developing an efficient and scalable quantum control system. To address this, we propose a novel Hierarchical MicroArchitecture (HiMA) designed to facilitate qubit scaling and exploit quantum process-level… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  41. arXiv:2408.11306  [pdf, other

    cs.LG cs.AI

    KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting?

    Authors: Xiao Han, Xinfeng Zhang, Yiling Wu, Zhenduo Zhang, Zhe Wu

    Abstract: Time series forecasting is a crucial task that predicts the future values of variables based on historical data. Time series forecasting techniques have been developing in parallel with the machine learning community, from early statistical learning methods to current deep learning methods. Although existing methods have made significant progress, they still suffer from two challenges. The mathema… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  42. arXiv:2408.11227  [pdf

    eess.IV cs.AI cs.CV

    OCTCube: A 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis

    Authors: Zixuan Liu, Hanwen Xu, Addie Woicik, Linda G. Shapiro, Marian Blazes, Yue Wu, Cecilia S. Lee, Aaron Y. Lee, Sheng Wang

    Abstract: Optical coherence tomography (OCT) has become critical for diagnosing retinal diseases as it enables 3D images of the retina and optic nerve. OCT acquisition is fast, non-invasive, affordable, and scalable. Due to its broad applicability, massive numbers of OCT images have been accumulated in routine exams, making it possible to train large-scale foundation models that can generalize to various di… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  43. arXiv:2408.10826  [pdf, other

    cs.DC

    NeuLite: Memory-Efficient Federated Learning via Elastic Progressive Training

    Authors: Yebo Wu, Li Li, Chunlin Tian, Dubing Chen, Chengzhong Xu

    Abstract: Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while preserving data privacy. However, intensive memory footprint during the training process severely bottlenecks the deployment of FL on resource-constrained devices in real-world cases. In this paper, we propose NeuLite, a framework that breaks the memory wall throug… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  44. arXiv:2408.10666  [pdf, other

    cs.IR

    Accelerating the Surrogate Retraining for Poisoning Attacks against Recommender Systems

    Authors: Yunfan Wu, Qi Cao, Shuchang Tao, Kaike Zhang, Fei Sun, Huawei Shen

    Abstract: Recent studies have demonstrated the vulnerability of recommender systems to data poisoning attacks, where adversaries inject carefully crafted fake user interactions into the training data of recommenders to promote target items. Current attack methods involve iteratively retraining a surrogate recommender on the poisoned data with the latest fake users to optimize the attack. However, this repet… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by RecSys 2024

  45. arXiv:2408.10624  [pdf, other

    cs.CV cs.AI

    WRIM-Net: Wide-Ranging Information Mining Network for Visible-Infrared Person Re-Identification

    Authors: Yonggan Wu, Ling-Chao Meng, Yuan Zichao, Sixian Chan, Hong-Qiang Wang

    Abstract: For the visible-infrared person re-identification (VI-ReID) task, one of the primary challenges lies in significant cross-modality discrepancy. Existing methods struggle to conduct modality-invariant information mining. They often focus solely on mining singular dimensions like spatial or channel, and overlook the extraction of specific-modality multi-dimension information. To fully mine modality-… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 18 pages, 5 figures

  46. arXiv:2408.10609  [pdf, other

    cs.LG q-bio.GN stat.ML

    PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

    Authors: Yan Wu, Esther Wershof, Sebastian M Schmon, Marcel Nassar, Błażej Osiński, Ridvan Eksi, Kun Zhang, Thore Graepel

    Abstract: We present a comprehensive framework for predicting the effects of perturbations in single cells, designed to standardize benchmarking in this rapidly evolving field. Our framework, PerturBench, includes a user-friendly platform, diverse datasets, metrics for fair model comparison, and detailed performance analysis. Extensive evaluations of published and baseline models reveal limitations like mod… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 9 pages plus 19 pages supplementary material. Code is available at https://github.com/altoslabs/perturbench

  47. arXiv:2408.10567  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG

    Prompt Your Brain: Scaffold Prompt Tuning for Efficient Adaptation of fMRI Pre-trained Model

    Authors: Zijian Dong, Yilei Wu, Zijiao Chen, Yichi Zhang, Yueming Jin, Juan Helen Zhou

    Abstract: We introduce Scaffold Prompt Tuning (ScaPT), a novel prompt-based framework for adapting large-scale functional magnetic resonance imaging (fMRI) pre-trained models to downstream tasks, with high parameter efficiency and improved performance compared to fine-tuning and baselines for prompt tuning. The full fine-tuning updates all pre-trained parameters, which may distort the learned feature space… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024

  48. arXiv:2408.10368  [pdf, other

    cs.LG cs.CE q-fin.CP

    Deep-MacroFin: Informed Equilibrium Neural Network for Continuous Time Economic Models

    Authors: Yuntao Wu, Jiayuan Guo, Goutham Gopalakrishna, Zisis Poulos

    Abstract: In this paper, we present Deep-MacroFin, a comprehensive framework designed to solve partial differential equations, with a particular focus on models in continuous time economics. This framework leverages deep learning methodologies, including conventional Multi-Layer Perceptrons and the newly developed Kolmogorov-Arnold Networks. It is optimized using economic information encapsulated by Hamilto… ▽ More

    Submitted 3 September, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: 25 pages, 8 figures

    ACM Class: I.0; J.4

  49. arXiv:2408.10116  [pdf, other

    cs.SE

    Vulseye: Detect Smart Contract Vulnerabilities via Stateful Directed Graybox Fuzzing

    Authors: Ruichao Liang, Jing Chen, Cong Wu, Kun He, Yueming Wu, Ruochen Cao, Ruiying Du, Yang Liu, Ziming Zhao

    Abstract: Smart contracts, the cornerstone of decentralized applications, have become increasingly prominent in revolutionizing the digital landscape. However, vulnerabilities in smart contracts pose great risks to user assets and undermine overall trust in decentralized systems. But current smart contract fuzzers fall short of expectations in testing efficiency for two primary reasons. Firstly, smart contr… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Submitted to TIFS

  50. arXiv:2408.09974  [pdf, other

    cs.LG

    The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective

    Authors: Renye Yan, Yaozhong Gan, You Wu, Ling Liang, Junliang Xing, Yimao Cai, Ru Huang

    Abstract: The imbalance of exploration and exploitation has long been a significant challenge in reinforcement learning. In policy optimization, excessive reliance on exploration reduces learning efficiency, while over-dependence on exploitation might trap agents in local optima. This paper revisits the exploration-exploitation dilemma from the perspective of entropy by revealing the relationship between en… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.