Skip to main content

Showing 1–50 of 175 results for author: Cheng, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.11146  [pdf, other

    cs.RO

    MI-HGNN: Morphology-Informed Heterogeneous Graph Neural Network for Legged Robot Contact Perception

    Authors: Daniel Butterfield, Sandilya Sai Garimella, Nai-Jen Cheng, Lu Gan

    Abstract: We present a Morphology-Informed Heterogeneous Graph Neural Network (MI-HGNN) for learning-based contact perception. The architecture and connectivity of the MI-HGNN are constructed from the robot morphology, in which nodes and edges are robot joints and links, respectively. By incorporating the morphology-informed constraints into a neural network, we improve a learning-based approach using model… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 6 pages, 5 figures; This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    ACM Class: E.1; I.2.6; I.2.9; J.2

  2. arXiv:2409.06402  [pdf, other

    cs.LG cs.AI math-ph

    Symmetry Breaking in Neural Network Optimization: Insights from Input Dimension Expansion

    Authors: Jun-Jie Zhang, Nan Cheng, Fu-Peng Li, Xiu-Cheng Wang, Jian-Nan Chen, Long-Gang Pang, Deyu Meng

    Abstract: Understanding the mechanisms behind neural network optimization is crucial for improving network design and performance. While various optimization techniques have been developed, a comprehensive understanding of the underlying principles that govern these techniques remains elusive. Specifically, the role of symmetry breaking, a fundamental concept in physics, has not been fully explored in neura… ▽ More

    Submitted 12 September, 2024; v1 submitted 10 September, 2024; originally announced September 2024.

    Comments: 29 pages, 8 figures

  3. arXiv:2409.00324  [pdf, other

    cs.NI

    User-centric Service Provision for Edge-assisted Mobile AR: A Digital Twin-based Approach

    Authors: Conghao Zhou, Jie Gao, Yixiang Liu, Shisheng Hu, Nan Cheng, Xuemin Shen

    Abstract: Future 6G networks are envisioned to support mobile augmented reality (MAR) applications and provide customized immersive experiences for users via advanced service provision. In this paper, we investigate user-centric service provision for edge-assisted MAR to support the timely camera frame uploading of an MAR device by optimizing the spectrum resource reservation. To address the challenge of no… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  4. arXiv:2409.00036  [pdf, other

    cs.IT cs.LG cs.MA eess.SY

    GNN-Empowered Effective Partial Observation MARL Method for AoI Management in Multi-UAV Network

    Authors: Yuhao Pan, Xiucheng Wang, Zhiyao Xu, Nan Cheng, Wenchao Xu, Jun-jie Zhang

    Abstract: Unmanned Aerial Vehicles (UAVs), due to their low cost and high flexibility, have been widely used in various scenarios to enhance network performance. However, the optimization of UAV trajectories in unknown areas or areas without sufficient prior information, still faces challenges related to poor planning performance and low distributed execution. These challenges arise when UAVs rely solely on… ▽ More

    Submitted 17 August, 2024; originally announced September 2024.

  5. arXiv:2408.15339  [pdf, other

    cs.LG cs.CL

    UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function

    Authors: Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu, Sitaram Asur, Na Claire Cheng

    Abstract: An LLM is pretrained on trillions of tokens, but the pretrained LLM may still generate undesired responses. To solve this problem, alignment techniques such as RLHF, DPO and KTO are proposed. However, these alignment techniques have limitations. For example, RLHF requires training the reward model and policy separately, which is complex, time-consuming, memory intensive and unstable during trainin… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  6. arXiv:2408.14831  [pdf, other

    cs.LG cs.DC cs.NI

    DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing

    Authors: Xueying Gu, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

    Abstract: Intelligent Transportation Systems (ITS) leverage Integrated Sensing and Communications (ISAC) to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles (IoV). This integration inevitably increases computing demands, risking real-time system stability. Vehicle Edge Computing (VEC) addresses this by offloading tasks to Road Side Unit (RSU), ensuring timely services. O… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: This paper has been submitted to Digital Communications and Networks. The source code has been released at: https://github.com/qiongwu86/Federated-SSL-task-offloading-and-resource-allocation

  7. arXiv:2408.09194  [pdf, other

    cs.CV cs.LG cs.NI

    DRL-Based Resource Allocation for Motion Blur Resistant Federated Self-Supervised Learning in IoV

    Authors: Xueying Gu, Qiong Wu, Pingyi Fan, Qiang Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

    Abstract: In the Internet of Vehicles (IoV), Federated Learning (FL) provides a privacy-preserving solution by aggregating local models without sharing data. Traditional supervised learning requires image data with labels, but data labeling involves significant manual effort. Federated Self-Supervised Learning (FSSL) utilizes Self-Supervised Learning (SSL) for local training in FL, eliminating the need for… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/DRL-BFSSL

  8. arXiv:2408.08593  [pdf, other

    cs.LG eess.SY

    RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction

    Authors: Xiucheng Wang, Keda Tao, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin Shen

    Abstract: Radio map (RM) is a promising technology that can obtain pathloss based on only location, which is significant for 6G network applications to reduce the communication costs for pathloss estimation. However, the construction of RM in traditional is either computationally intensive or depends on costly sampling-based pathloss measurements. Although the neural network (NN)-based method can efficientl… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  9. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  10. arXiv:2407.13123  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation

    Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

    Abstract: Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or offload them to nearby edge devices. However, the quality of communication links may be severely deteriorated due to obstacles such as buildings, impeding the offloading process. To address this challen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/DDPG-RIS-MADDPG-POWER. arXiv admin note: text overlap with arXiv:2406.11318

  11. arXiv:2407.07575  [pdf, other

    cs.LG cs.NI

    Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network

    Authors: Yu Xie, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: As a promising technology, vehicular edge computing (VEC) can provide computing and caching services by deploying VEC servers near vehicles. However, VEC networks still face challenges such as high vehicle mobility. Digital twin (DT), an emerging technology, can predict, estimate, and analyze real-time states by digitally modeling objects in the physical world. By integrating DT with VEC, a virtua… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at:https://github.com/qiongwu86/Resource-allocation-for-twin-maintenance-and-computing-tasks-in-digital-twin-mobile-edge-network

  12. arXiv:2407.06767  [pdf, other

    cs.IT eess.SP

    Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

    Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  13. arXiv:2407.06518  [pdf, other

    cs.LG cs.NI

    Graph Neural Networks and Deep Reinforcement Learning Based Resource Allocation for V2X Communications

    Authors: Maoxin Ji, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: In the rapidly evolving landscape of Internet of Vehicles (IoV) technology, Cellular Vehicle-to-Everything (C-V2X) communication has attracted much attention due to its superior performance in coverage, latency, and throughput. Resource allocation within C-V2X is crucial for ensuring the transmission of safety information and meeting the stringent requirements for ultra-low latency and high reliab… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 14 pages, 11 figures. This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/GNN-and-DRL-Based-Resource-Allocation-for-V2X-Communications

  14. arXiv:2407.05261  [pdf, other

    math.OC cs.LG cs.MS stat.ML

    Disciplined Geodesically Convex Programming

    Authors: Andrew Cheng, Vaibhav Dixit, Melanie Weber

    Abstract: Convex programming plays a fundamental role in machine learning, data science, and engineering. Testing convexity structure in nonlinear programs relies on verifying the convexity of objectives and constraints. \citet{grant2006disciplined} introduced a framework, Disciplined Convex Programming (DCP), for automating this verification task for a wide range of convex functions that can be decomposed… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  15. arXiv:2407.03668  [pdf, other

    cs.LG eess.SY

    Reliable Projection Based Unsupervised Learning for Semi-Definite QCQP with Application of Beamforming Optimization

    Authors: Xiucheng Wang, Qi Qiu, Nan Cheng

    Abstract: In this paper, we investigate a special class of quadratic-constrained quadratic programming (QCQP) with semi-definite constraints. Traditionally, since such a problem is non-convex and N-hard, the neural network (NN) is regarded as a promising method to obtain a high-performing solution. However, due to the inherent prediction error, it is challenging to ensure all solution output by the NN is fe… ▽ More

    Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  16. arXiv:2407.02342  [pdf, ps, other

    cs.LG cs.DC cs.MA cs.NI

    Optimizing Age of Information in Vehicular Edge Computing with Federated Graph Neural Network Multi-Agent Reinforcement Learning

    Authors: Wenhua Wang, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: With the rapid development of intelligent vehicles and Intelligent Transport Systems (ITS), the sensors such as cameras and LiDAR installed on intelligent vehicles provides higher capacity of executing computation-intensive and delay-sensitive tasks, thereby raising deployment costs. To address this issue, Vehicular Edge Computing (VEC) has been proposed to process data through Road Side Units (RS… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/Optimizing-AoI-in-VEC-with-Federated-Graph-Neural-Network-Multi-Agent-Reinforcement-Learning

  17. arXiv:2406.13568  [pdf, other

    cs.AI

    Trapezoidal Gradient Descent for Effective Reinforcement Learning in Spiking Networks

    Authors: Yuhao Pan, Xiucheng Wang, Nan Cheng, Qi Qiu

    Abstract: With the rapid development of artificial intelligence technology, the field of reinforcement learning has continuously achieved breakthroughs in both theory and practice. However, traditional reinforcement learning algorithms often entail high energy consumption during interactions with the environment. Spiking Neural Network (SNN), with their low energy consumption characteristics and performance… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  18. arXiv:2406.13145  [pdf, other

    eess.SY cs.LG

    Constructing and Evaluating Digital Twins: An Intelligent Framework for DT Development

    Authors: Longfei Ma, Nan Cheng, Xiucheng Wang, Jiong Chen, Yinjun Gao, Dongxiao Zhang, Jun-Jie Zhang

    Abstract: The development of Digital Twins (DTs) represents a transformative advance for simulating and optimizing complex systems in a controlled digital space. Despite their potential, the challenge of constructing DTs that accurately replicate and predict the dynamics of real-world systems remains substantial. This paper introduces an intelligent framework for the construction and evaluation of DTs, spec… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  19. arXiv:2406.12238  [pdf, other

    cs.CL

    PFID: Privacy First Inference Delegation Framework for LLMs

    Authors: Haoyan Yang, Zhitao Li, Yong Zhang, Jianzong Wang, Ning Cheng, Ming Li, Jing Xiao

    Abstract: This paper introduces a novel privacy-preservation framework named PFID for LLMs that addresses critical privacy concerns by localizing user data through model sharding and singular value decomposition. When users are interacting with LLM systems, their prompts could be subject to being exposed to eavesdroppers within or outside LLM system providers who are interested in collecting users' input. I… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Submitted to EMNLP2024

  20. arXiv:2406.11318  [pdf, other

    cs.MA cs.DC cs.LG cs.NI eess.SP

    Reconfigurable Intelligent Surface Assisted VEC Based on Multi-Agent Reinforcement Learning

    Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang

    Abstract: Vehicular edge computing (VEC) is an emerging technology that enables vehicles to perform high-intensity tasks by executing tasks locally or offloading them to nearby edge devices. However, obstacles such as buildings may degrade the communications and incur communication interruptions, and thus the vehicle may not meet the requirement for task offloading. Reconfigurable intelligent surfaces (RIS)… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/RIS-VEC-MARL.git

  21. arXiv:2406.11245  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    Deep-Reinforcement-Learning-Based AoI-Aware Resource Allocation for RIS-Aided IoV Networks

    Authors: Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: Reconfigurable Intelligent Surface (RIS) is a pivotal technology in communication, offering an alternative path that significantly enhances the link quality in wireless communication environments. In this paper, we propose a RIS-assisted internet of vehicles (IoV) network, considering the vehicle-to-everything (V2X) communication method. In addition, in order to improve the timeliness of vehicle-t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at https://github.com/qiongwu86/RIS-RB-AoI-V2X-DRL.git

  22. arXiv:2406.09846  [pdf, ps, other

    cs.IT eess.SP

    Multiple Intelligent Reflecting Surfaces Collaborative Wireless Localization System

    Authors: Ziheng Zhang, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Jingfeng Chen, Nan Cheng

    Abstract: This paper studies a multiple intelligent reflecting surfaces (IRSs) collaborative localization system where multiple semi-passive IRSs are deployed in the network to locate one or more targets based on time-of-arrival. It is assumed that each semi-passive IRS is equipped with reflective elements and sensors, which are used to establish the line-of-sight links from the base station (BS) to multipl… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 13 pages, 8 figures

  23. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Ming Fang, Tao Wei, Zijian Li, Ning Cheng, Wei Hu, Shaojun Wang, Jing Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. In this paper, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EffectiveASR. It uses an Index Mappin… ▽ More

    Submitted 28 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Submitted to ICASSP 2025

  24. arXiv:2406.07996  [pdf, other

    cs.NI eess.SP

    Semantic-Aware Resource Allocation Based on Deep Reinforcement Learning for 5G-V2X HetNets

    Authors: Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang

    Abstract: This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinforcement learning (DRL) proximal policy optimization (PPO). Specifically, we investigate V2X networks within a two-tiered HetNets structure. In response to the needs of high-speed vehicular networking i… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Letter.The source code has been released at: https://github.com/qiongwu86/Semantic-Aware-Resource-Allocation-Based-on-Deep-Reinforcement-Learning-for-5G-V2X-HetNets

  25. arXiv:2406.07857  [pdf, other

    eess.SY cs.LG cs.NI

    Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and Challenges

    Authors: Nan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, Xuemin Shen

    Abstract: This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL methods face several unified challenges when applied to physical networks, including limited exploration efficiency, slow convergence, poor long-term performance, and safety concerns during the exploration… ▽ More

    Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 7pages, 6figures

  26. arXiv:2406.07349  [pdf, other

    cs.CR

    Erasing Radio Frequency Fingerprints via Active Adversarial Perturbation

    Authors: Zhaoyi Lu, Wenchao Xu, Ming Tu, Xin Xie, Cunqing Hua, Nan Cheng

    Abstract: Radio Frequency (RF) fingerprinting is to identify a wireless device from its uniqueness of the analog circuitry or hardware imperfections. However, unlike the MAC address which can be modified, such hardware feature is inevitable for the signal emitted to air, which can possibly reveal device whereabouts, e.g., a sniffer can use a pre-trained model to identify a nearby device when receiving its s… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  27. arXiv:2406.07213  [pdf, other

    cs.LG

    Semantic-Aware Spectrum Sharing in Internet of Vehicles Based on Deep Reinforcement Learning

    Authors: Zhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: This work aims to investigate semantic communication in high-speed mobile Internet of vehicles (IoV) environments, with a focus on the spectrum sharing between vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. We specifically address spectrum scarcity and network traffic and then propose a semantic-aware spectrum sharing algorithm (SSS) based on the deep reinforcement le… ▽ More

    Submitted 17 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/Semantic-Aware-Spectrum-Sharing-in-Internet-of-Vehicles-Based-on-Deep-Reinforcement-Learning

  28. arXiv:2406.03813  [pdf, other

    cs.RO

    Touch100k: A Large-Scale Touch-Language-Vision Dataset for Touch-Centric Multimodal Representation

    Authors: Ning Cheng, Changhao Guan, Jing Gao, Weihao Wang, You Li, Fandong Meng, Jie Zhou, Bin Fang, Jinan Xu, Wenjuan Han

    Abstract: Touch holds a pivotal position in enhancing the perceptual and interactive capabilities of both humans and robots. Despite its significance, current tactile research mainly focuses on visual and tactile modalities, overlooking the language domain. Inspired by this, we construct Touch100k, a paired touch-language-vision dataset at the scale of 100k, featuring tactile sensation descriptions in multi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  29. arXiv:2405.18692  [pdf, other

    cs.IT eess.SP

    Movable Antenna Empowered Downlink NOMA Systems: Power Allocation and Antenna Position Optimization

    Authors: Yufeng Zhou, Wen Chen, Qingqing Wu, Xusheng Zhu, Nan Cheng

    Abstract: This paper investigates a novel communication paradigm employing movable antennas (MAs) within a multiple-input single-output (MISO) non-orthogonal multiple access (NOMA) downlink framework, where users are equipped with MAs. Initially, leveraging the far-field response, we delineate the channel characteristics concerning both the power allocation coefficient and positions of MAs. Subsequently, we… ▽ More

    Submitted 7 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  30. arXiv:2405.17900  [pdf, other

    cs.CL

    Enhancing Emotion Recognition in Conversation through Emotional Cross-Modal Fusion and Inter-class Contrastive Learning

    Authors: Haoxiang Shi, Xulong Zhang, Ning Cheng, Yong Zhang, Jun Yu, Jing Xiao, Jianzong Wang

    Abstract: The purpose of emotion recognition in conversation (ERC) is to identify the emotion category of an utterance based on contextual information. Previous ERC methods relied on simple connections for cross-modal fusion and ignored the information differences between modalities, resulting in the model being unable to focus on modality-specific emotional information. At the same time, the shared informa… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by the 20th International Conference on Intelligent Computing (ICIC 2024)

  31. arXiv:2405.17777  [pdf, other

    cs.IR

    RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval

    Authors: Jianzong Wang, Haoxiang Shi, Kaiyi Luo, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Known for efficient computation and easy storage, hashing has been extensively explored in cross-modal retrieval. The majority of current hashing models are predicated on the premise of a direct one-to-one mapping between data points. However, in real practice, data correspondence across modalities may be partially provided. In this research, we introduce an innovative unsupervised hashing techniq… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by the 20th International Conference on Intelligent Computing (ICIC 2024)

  32. arXiv:2405.17028  [pdf, other

    cs.SD eess.AS

    RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis

    Authors: Haoxiang Shi, Jianzong Wang, Xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao

    Abstract: Although current Text-To-Speech (TTS) models are able to generate high-quality speech samples, there are still challenges in developing emotion intensity controllable TTS. Most existing TTS models achieve emotion intensity control by extracting intensity information from reference speeches. Unfortunately, limited by the lack of modeling for intra-class emotion intensity and the model's information… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by the 8th APWeb-WAIM International Joint Conference on Web and Big Data

  33. arXiv:2405.12779  [pdf

    cs.LG cs.AI

    Transformer in Touch: A Survey

    Authors: Jing Gao, Ning Cheng, Bin Fang, Wenjuan Han

    Abstract: The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception. This review aims to comprehensively outline the application and development of Transformers in tactile technology. We first introduce the two fundamental concepts behind the success of the Transformer: the self-atte… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 27 pages, 2 tables, 5 figures, accepted by ICIC 2024

  34. arXiv:2405.06410  [pdf, other

    cs.CL

    Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL

    Authors: Ning Cheng, Zhaohui Yan, Ziming Wang, Zhijie Li, Jiaming Yu, Zilong Zheng, Kewei Tu, Jinan Xu, Wenjuan Han

    Abstract: Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. Nevertheless, an ongoing controversy exists over the extent to which LLMs can grasp structured semantics. To assess this, we propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured se… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted by ICIC 2024

  35. arXiv:2405.00930  [pdf, other

    cs.SD eess.AS

    MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion

    Authors: Pengcheng Li, Jianzong Wang, Xulong Zhang, Yong Zhang, Jing Xiao, Ning Cheng

    Abstract: One-shot voice conversion aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing methods face difficulties in satisfactory speech representation disentanglement and suffer from sizable networks as some of them leverage numerous complex modules for disentanglement. In this paper, we propose a model named MAIN-VC to effectively… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  36. arXiv:2405.00603  [pdf, other

    cs.SD eess.AS

    Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

    Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issue… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  37. arXiv:2404.19316  [pdf, other

    cs.CL

    QLSC: A Query Latent Semantic Calibrator for Robust Extractive Question Answering

    Authors: Sheng Ouyang, Jianzong Wang, Yong Zhang, Zhitao Li, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Extractive Question Answering (EQA) in Machine Reading Comprehension (MRC) often faces the challenge of dealing with semantically identical but format-variant inputs. Our work introduces a novel approach, called the ``Query Latent Semantic Calibrator (QLSC)'', designed as an auxiliary module for existing MRC models. We propose a unique scaling strategy to capture latent semantic center features of… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  38. arXiv:2404.19214  [pdf, other

    cs.SD eess.AS

    EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

    Authors: Jianzong Wang, Ziqi Liang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  39. arXiv:2404.19212  [pdf, other

    cs.SD eess.AS

    EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning

    Authors: Ziqi Liang, Jianzong Wang, Xulong Zhang, Yong Zhang, Ning Cheng, Jing Xiao

    Abstract: Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot research topic. Existing works generally take into account disentangling speech components through human-crafted bottleneck features which can not achieve sufficient information disentangling, while pitch and rhythm may still be mixed together. There is a risk of informat… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  40. arXiv:2404.19187  [pdf, other

    cs.SD eess.AS

    CONTUNER: Singing Voice Beautifying with Pitch and Expressiveness Condition

    Authors: Jianzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Singing voice beautifying is a novel task that has application value in people's daily life, aiming to correct the pitch of the singing voice and improve the expressiveness without changing the original timbre and content. Existing methods rely on paired data or only concentrate on the correction of pitch. However, professional songs and amateur songs from the same person are hard to obtain, and s… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  41. arXiv:2404.16130  [pdf, other

    cs.CL cs.AI cs.IR

    From Local to Global: A Graph RAG Approach to Query-Focused Summarization

    Authors: Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Jonathan Larson

    Abstract: The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused s… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    ACM Class: H.3.3; I.2.7

  42. arXiv:2403.09813  [pdf, other

    cs.CV cs.RO

    Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset

    Authors: Ning Cheng, You Li, Jing Gao, Bin Fang, Jinan Xu, Wenjuan Han

    Abstract: Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots. Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language. Beyond vocabulary, sentence-level descriptions contain richer semantics. Based on this, we construct a touch-langua… ▽ More

    Submitted 17 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by ICIC 2024

  43. arXiv:2403.05000  [pdf, other

    cs.AI

    Medical Speech Symptoms Classification via Disentangled Representation

    Authors: Jianzong Wang, Pengcheng Li, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Intent is defined for understanding spoken language in existing works. Both textual features and acoustic features involved in medical speech contain intent, which is important for symptomatic diagnosis. In this paper, we propose a medical speech classification model named DRSC that automatically learns to disentangle intent and content representations from textual-acoustic data for classification… ▽ More

    Submitted 29 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by the 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2024)

  44. arXiv:2403.03681  [pdf, other

    cs.RO cs.CV

    3D Object Visibility Prediction in Autonomous Driving

    Authors: Chuanyu Luo, Nuo Cheng, Ren Zhong, Haipeng Jiang, Wenyu Chen, Aoli Wang, Pu Li

    Abstract: With the rapid advancement of hardware and software technologies, research in autonomous driving has seen significant growth. The prevailing framework for multi-sensor autonomous driving encompasses sensor installation, perception, path planning, decision-making, and motion control. At the perception phase, a common approach involves utilizing neural networks to infer 3D bounding box (Bbox) attrib… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  45. arXiv:2402.15972  [pdf, other

    cs.LG cs.NI

    Structural Knowledge-Driven Meta-Learning for Task Offloading in Vehicular Networks with Integrated Communications, Sensing and Computing

    Authors: Ruijin Sun, Yao Wen, Nan Cheng, Wei Wan, Rong Chai, Yilong Hui

    Abstract: Task offloading is a potential solution to satisfy the strict requirements of computation-intensive and latency-sensitive vehicular applications due to the limited onboard computing resources. However, the overwhelming upload traffic may lead to unacceptable uploading time. To tackle this issue, for tasks taking environmental data as input, the data perceived by roadside units (RSU) equipped with… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  46. arXiv:2402.15239  [pdf, other

    cs.CV cs.LG

    GS-EMA: Integrating Gradient Surgery Exponential Moving Average with Boundary-Aware Contrastive Learning for Enhanced Domain Generalization in Aneurysm Segmentation

    Authors: Fengming Lin, Yan Xia, Michael MacRaild, Yash Deo, Haoran Dou, Qiongyao Liu, Nina Cheng, Nishant Ravikumar, Alejandro F. Frangi

    Abstract: The automated segmentation of cerebral aneurysms is pivotal for accurate diagnosis and treatment planning. Confronted with significant domain shifts and class imbalance in 3D Rotational Angiography (3DRA) data from various medical institutions, the task becomes challenging. These shifts include differences in image appearance, intensity distribution, resolution, and aneurysm size, all of which com… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted by ISBI 2024

  47. arXiv:2402.11345  [pdf, other

    stat.ML cs.LG math.OC

    Variational Entropy Search for Adjusting Expected Improvement

    Authors: Nuojin Cheng, Stephen Becker

    Abstract: Bayesian optimization is a widely used technique for optimizing black-box functions, with Expected Improvement (EI) being the most commonly utilized acquisition function in this domain. While EI is often viewed as distinct from other information-theoretic acquisition functions, such as entropy search (ES) and max-value entropy search (MES), our work reveals that EI can be considered a special case… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  48. arXiv:2402.03246  [pdf, other

    cs.CV cs.AI cs.RO

    SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM

    Authors: Mingrui Li, Shuhong Liu, Heng Zhou, Guohao Zhu, Na Cheng, Tianchen Deng, Hongyu Wang

    Abstract: We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compens… ▽ More

    Submitted 26 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Journal ref: European Conference on Computer Vision (ECCV) 2024

  49. arXiv:2402.01665  [pdf, other

    cs.NI cs.LG eess.SP

    Knowledge-Driven Deep Learning Paradigms for Wireless Network Optimization in 6G

    Authors: Ruijin Sun, Nan Cheng, Changle Li, Fangjiong Chen, Wen Chen

    Abstract: In the sixth-generation (6G) networks, newly emerging diversified services of massive users in dynamic network environments are required to be satisfied by multi-dimensional heterogeneous resources. The resulting large-scale complicated network optimization problems are beyond the capability of model-based theoretical methods due to the overwhelming computational complexity and the long processing… ▽ More

    Submitted 15 January, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures

  50. arXiv:2402.00530  [pdf, other

    cs.CL

    Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

    Authors: Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, Tianyi Zhou

    Abstract: Instruction tuning is critical to improve LLMs but usually suffers from low-quality and redundant data. Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process. But it also leads to extra cost and computation due to the involvement of LLMs in this process. To reduce the filtering cost, we study Superfiltering: Can we use a s… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: ACL2024 main, Camera-ready