Skip to main content

Showing 1–50 of 7,268 results for author: Liu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02917  [pdf, other

    cs.CV cs.AI

    UC-NeRF: Uncertainty-aware Conditional Neural Radiance Fields from Endoscopic Sparse Views

    Authors: Jiaxin Guo, Jiangliu Wang, Ruofeng Wei, Di Kang, Qi Dou, Yun-hui Liu

    Abstract: Visualizing surgical scenes is crucial for revealing internal anatomical structures during minimally invasive procedures. Novel View Synthesis is a vital technique that offers geometry and appearance reconstruction, enhancing understanding, planning, and decision-making in surgical scenes. Despite the impressive achievements of Neural Radiance Field (NeRF), its direct application to surgical scene… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02715  [pdf, other

    cs.CV cs.CR cs.LG

    Recoverable Anonymization for Pose Estimation: A Privacy-Enhancing Approach

    Authors: Wenjun Huang, Yang Ni, Arghavan Rezvani, SungHeon Jeong, Hanning Chen, Yezi Liu, Fei Wen, Mohsen Imani

    Abstract: Human pose estimation (HPE) is crucial for various applications. However, deploying HPE algorithms in surveillance contexts raises significant privacy concerns due to the potential leakage of sensitive personal information (SPI) such as facial features, and ethnicity. Existing privacy-enhancing methods often compromise either privacy or performance, or they require costly additional modalities. We… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  3. arXiv:2409.02669  [pdf, other

    cs.RO cs.AI cs.LG

    Causality-Aware Transformer Networks for Robotic Navigation

    Authors: Ruoyu Wang, Yao Liu, Yuanjiang Cao, Lina Yao

    Abstract: Recent advances in machine learning algorithms have garnered growing interest in developing versatile Embodied AI systems. However, current research in this domain reveals opportunities for improvement. First, the direct adoption of RNNs and Transformers often overlooks the specific differences between Embodied AI and traditional sequential data modelling, potentially limiting its performance in E… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.02451  [pdf, other

    eess.AS cs.AI cs.SD

    Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP

    Authors: Yisi Liu, Bohan Yu, Drake Lin, Peter Wu, Cheol Jun Cho, Gopala Krishna Anumanchipalli

    Abstract: Articulatory trajectories like electromagnetic articulography (EMA) provide a low-dimensional representation of the vocal tract filter and have been used as natural, grounded features for speech synthesis. Differentiable digital signal processing (DDSP) is a parameter-efficient framework for audio synthesis. Therefore, integrating low-dimensional EMA features with DDSP can significantly enhance th… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: accepted for Spoken Language Technology Workshop 2024

  5. arXiv:2409.02375  [pdf, other

    cs.CL

    How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review

    Authors: Xichou Zhu, Yang Liu, Zhou Shen, Yi Liu, Min Li, Yujun Chen, Benzi John, Zhenzhen Ma, Tao Hu, Bolong Yang, Manman Wang, Zongxing Xie, Peng Liu, Dan Cai, Junhui Wang

    Abstract: The recent advances in large language models (LLMs) have significantly expanded their applications across various fields such as language generation, summarization, and complex question answering. However, their application to privacy compliance and technical privacy reviews remains under-explored, raising critical concerns about their ability to adhere to global privacy standards and protect sens… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 8 pages, 4 figures

  6. arXiv:2409.02370  [pdf, other

    cs.CL cs.AI

    Do Large Language Models Possess Sensitive to Sentiment?

    Authors: Yang Liu, Xichou Zhu, Zhou Shen, Yi Liu, Min Li, Yujun Chen, Benzi John, Zhenzhen Ma, Tao Hu, Zhiyang Xu, Wei Luo, Junhui Wang

    Abstract: Large Language Models (LLMs) have recently displayed their extraordinary capabilities in language understanding. However, how to comprehensively assess the sentiment capabilities of LLMs continues to be a challenge. This paper investigates the ability of LLMs to detect and react to sentiment in text modal. As the integration of LLMs into diverse applications is on the rise, it becomes highly criti… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 10 pages, 2 figures

  7. arXiv:2409.02322  [pdf, other

    cs.LG cs.AI

    TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model

    Authors: Defu Cao, Wen Ye, Yizhou Zhang, Yan Liu

    Abstract: With recent advances in building foundation models for texts and video data, there is a surge of interest in foundation models for time series. A family of models have been developed, utilizing a temporal auto-regressive generative Transformer architecture, whose effectiveness has been proven in Large Language Models. While the empirical results are promising, almost all existing time series found… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 23 Pages, 6 Figures, 11 Tables. First present at ICML 2024 Workshop on Foundation Models in the Wild

  8. arXiv:2409.02139  [pdf, other

    cs.LG cs.AI cs.CR

    The Role of Transformer Models in Advancing Blockchain Technology: A Systematic Survey

    Authors: Tianxu Liu, Yanbin Wang, Jianguo Sun, Ye Tian, Yanyu Huang, Tao Xue, Peiyue Li, Yiwei Liu

    Abstract: As blockchain technology rapidly evolves, the demand for enhanced efficiency, security, and scalability grows.Transformer models, as powerful deep learning architectures,have shown unprecedented potential in addressing various blockchain challenges. However, a systematic review of Transformer applications in blockchain is lacking. This paper aims to fill this research gap by surveying over 200 rel… ▽ More

    Submitted 5 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

  9. arXiv:2409.02123  [pdf, other

    cs.LG cs.AI physics.ao-ph

    PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

    Authors: Shengchen Zhu, Yiming Chen, Peiying Yu, Xiang Qu, Yuxiao Zhou, Yiming Ma, Zhizhan Zhao, Yukai Liu, Hao Mi, Bin Wang

    Abstract: Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechani… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  10. arXiv:2409.02118  [pdf, other

    cs.LG cs.AI cs.CL

    TSO: Self-Training with Scaled Preference Optimization

    Authors: Kaihui Chen, Hao Yi, Qingyang Li, Tianyu Qi, Yulan Hu, Fuzheng Zhang, Yong Liu

    Abstract: Enhancing the conformity of large language models (LLMs) to human preferences remains an ongoing research challenge. Recently, offline approaches such as Direct Preference Optimization (DPO) have gained prominence as attractive options due to offering effective improvement in simple, efficient, and stable without interactions with reward models. However, these offline preference optimization metho… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  11. arXiv:2409.02074  [pdf, other

    cs.CR cs.HC cs.LG cs.SE

    RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer

    Authors: Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, Wenyuan Xu

    Abstract: Malicious shell commands are linchpins to many cyber-attacks, but may not be easy to understand by security analysts due to complicated and often disguised code structures. Advances in large language models (LLMs) have unlocked the possibility of generating understandable explanations for shell commands. However, existing general-purpose LLMs suffer from a lack of expert knowledge and a tendency t… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by NDSS Symposium 2025. Please cite this paper as "Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, Wenyuan Xu. RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer. In the 32nd Annual Network and Distributed System Security Symposium (NDSS 2025)."

  12. arXiv:2409.02010  [pdf, other

    quant-ph cs.ET

    Ternary Tree Fermion-to-Qubit Mapping with Hamiltonian Aware Optimization

    Authors: Yuhao Liu, Kevin Yao, Jonathan Hong, Julien Froustey, Yunong Shi, Ermal Rrapaj, Costin Iancu, Gushu Li

    Abstract: This paper introduces the Hamiltonian-Aware Ternary Tree (HATT) framework to compile optimized Fermion-to-qubit mapping for specific Fermionic Hamiltonians. In the simulation of Fermionic quantum systems, efficient Fermion-to-qubit mapping plays a critical role in transforming the Fermionic system into a qubit system. HATT utilizes ternary tree mapping and a bottom-up construction procedure to gen… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  13. arXiv:2409.01710  [pdf, other

    cs.MM

    Privacy-Preserving Multimedia Mobile Cloud Computing Using Protective Perturbation

    Authors: Zhongze Tang, Mengmei Ye, Yao Liu, Sheng Wei

    Abstract: Mobile cloud computing has been adopted in many multimedia applications, where the resource-constrained mobile device sends multimedia data (e.g., images) to remote cloud servers to request computation-intensive multimedia services (e.g., image recognition). While significantly improving the performance of the mobile applications, the cloud-based mechanism often causes privacy concerns as the mult… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  14. arXiv:2409.01691  [pdf, other

    cs.CV

    When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

    Authors: Yifan Liu, Wuyang Li, Cheng Wang, Hui Chen, Yixuan Yuan

    Abstract: Tooth point cloud segmentation is a fundamental task in many orthodontic applications. Current research mainly focuses on fully supervised learning which demands expensive and tedious manual point-wise annotation. Although recent weakly-supervised alternatives are proposed to use weak labels for 3D segmentation and achieve promising results, they tend to fail when the labels are extremely sparse.… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: To appear at MICCAI24

  15. arXiv:2409.01524  [pdf, other

    cs.CL cs.AI

    S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners

    Authors: Yuchen Yan, Jin Jiang, Yang Liu, Yixin Cao, Xin Xu, Mengdi zhang, Xunliang Cai, Jian Shao

    Abstract: Self-correction is a novel method that can stimulate the potential reasoning abilities of large language models (LLMs). It involves detecting and correcting errors during the inference process when LLMs solve reasoning problems. However, recent works do not regard self-correction as a spontaneous and intrinsic capability of LLMs. Instead, such correction is achieved through post-hoc generation, ex… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  16. arXiv:2409.01251  [pdf, ps, other

    cs.LG cs.DC

    GAS: Generative Activation-Aided Asynchronous Split Federated Learning

    Authors: Jiarong Yang, Yuan Liu

    Abstract: Split Federated Learning (SFL) splits and collaboratively trains a shared model between clients and server, where clients transmit activations and client-side models to server for updates. Recent SFL studies assume synchronous transmission of activations and client-side models from clients to server. However, due to significant variations in computational and communication capabilities among clien… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  17. arXiv:2409.01179  [pdf, other

    cs.CV

    Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

    Authors: Yi Chen, Jian Xu, Xu-Yao Zhang, Wen-Zhuo Liu, Yang-Yang Liu, Cheng-Lin Liu

    Abstract: With the advancement of large-scale language modeling techniques, large multimodal models combining visual encoders with large language models have demonstrated exceptional performance in various visual tasks. Most of the current large-scale multimodal models achieve this by mapping visual features obtained from the visual encoder into a large language model and using them as inputs alongside text… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  18. arXiv:2409.01073  [pdf, other

    cs.CV cs.AI cs.CL

    SCOPE: Sign Language Contextual Processing with Embedding from LLMs

    Authors: Yuqi Liu, Wenqian Zhang, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu

    Abstract: Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition (SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. To address these challenges, we introduce SCOPE (Sign langua… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  19. arXiv:2409.01071  [pdf, other

    cs.CV cs.CL

    VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

    Authors: Yuxuan Wang, Cihang Xie, Yang Liu, Zilong Zheng

    Abstract: Recent advancements in large-scale video-language models have shown significant potential for real-time planning and detailed interactions. However, their high computational demands and the scarcity of annotated datasets limit their practicality for academic researchers. In this work, we introduce VideoLLaMB, a novel framework that utilizes temporal memory tokens within bridge layers to allow for… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  20. arXiv:2409.01068  [pdf, other

    cs.CV

    Progressive Retinal Image Registration via Global and Local Deformable Transformations

    Authors: Yepeng Liu, Baosheng Yu, Tian Chen, Yuliang Gu, Bo Du, Yongchao Xu, Jun Cheng

    Abstract: Retinal image registration plays an important role in the ophthalmological diagnosis process. Since there exist variances in viewing angles and anatomical structures across different retinal images, keypoint-based approaches become the mainstream methods for retinal image registration thanks to their robustness and low latency. These methods typically assume the retinal surfaces are planar, and ad… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted at BIBM 2024

  21. arXiv:2409.01027  [pdf

    cs.HC

    Mindscape: Research of high-information density street environments based on electroencephalogram recording and virtual reality head-mounted simulation

    Authors: Yijiang Liu, Xiangyu Guan, Hui Wang, Lun Liu

    Abstract: This study aims to investigate, through neuroscientific methods, the effects of particular architectural elements on pedestrian spatial cognition and experience in the analysis and design of walking street spaces. More precisely, this paper will describe the impact of the density variation of storefront signs on the brainwaves of passersby in East Asian city walking streets, providing strategies a… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 10 pages, 10 figures, This paper has been accepted at the eCAADe 2024 Conference

    ACM Class: J.6

  22. arXiv:2409.00982  [pdf, other

    cs.HC

    Experimental Analysis of Freehand Multi-Object Selection Techniques in Virtual Reality Head-Mounted Displays

    Authors: Rongkai Shi, Yushi Wei, Xuning Hu, Yu Liu, Yong Yue, Lingyun Yu, Hai-Ning Liang

    Abstract: Object selection is essential in virtual reality (VR) head-mounted displays (HMDs). Prior work mainly focuses on enhancing and evaluating techniques for selecting a single object in VR, leaving a gap in the techniques for multi-object selection, a more complex but common selection scenario. To enable multi-object selection, the interaction technique should support group selection in addition to th… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: To be presented at ACM ISS 2024

  23. arXiv:2409.00962  [pdf, other

    cs.HC

    Mental-Gen: A Brain-Computer Interface-Based Interactive Method for Interior Space Generative Design

    Authors: Yijiang Liu, Hui Wang

    Abstract: Interior space design significantly influences residents' daily lives. However, the process often presents high barriers and complex reasoning for users, leading to semantic losses in articulating comprehensive requirements and communicating them to designers. This study proposes the Mental-Gen design method, which focuses on interpreting users' spatial design intentions at neural level and expres… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 18 pages, 8 figures

    ACM Class: J.6

  24. arXiv:2409.00920  [pdf, other

    cs.LG cs.AI cs.CL

    ToolACE: Winning the Points of LLM Function Calling

    Authors: Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian , et al. (2 additional authors not shown)

    Abstract: Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic ag… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 21 pages, 22 figures

  25. arXiv:2409.00899  [pdf, other

    cs.SE cs.AI

    MarsCode Agent: AI-native Automated Bug Fixing

    Authors: Yizhou Liu, Pengfei Gao, Xinchen Wang, Jie Liu, Yexuan Shi, Zhao Zhang, Chao Peng

    Abstract: Recent advances in large language models (LLMs) have shown significant potential to automate various software development tasks, including code completion, test generation, and bug fixing. However, the application of LLMs for automated bug fixing remains challenging due to the complexity and diversity of real-world software systems. In this paper, we introduce MarsCode Agent, a novel framework tha… ▽ More

    Submitted 4 September, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: Yizhou Liu and Pengfei Gao contributed equally and the order is determined by rolling the dice. Chao Peng is the corresponding author

  26. arXiv:2409.00877  [pdf, other

    cs.CV

    Digital Twins in Additive Manufacturing: A Systematic Review

    Authors: Md Manjurul Ahsan, Benjamin Bevans, Chris Billings, Alexander Riensche, Yingtao Liu, Shivakumar Raman, Zahed Siddique

    Abstract: Digital Twins (DTs) are becoming popular in Additive Manufacturing (AM) due to their ability to create virtual replicas of physical components of AM machines, which helps in real-time production monitoring. Advanced techniques such as Machine Learning (ML), Augmented Reality (AR), and simulation-based models play key roles in developing intelligent and adaptable DTs in manufacturing processes. How… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  27. arXiv:2409.00671  [pdf, other

    cs.CE

    InvariantStock: Learning Invariant Features for Mastering the Shifting Market

    Authors: Haiyao Cao, Jinan Zou, Yuhang Liu, Zhen Zhang, Ehsan Abbasnejad, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: Accurately predicting stock returns is crucial for effective portfolio management. However, existing methods often overlook a fundamental issue in the market, namely, distribution shifts, making them less practical for predicting future markets or newly listed stocks. This study introduces a novel approach to address this challenge by focusing on the acquisition of invariant features across variou… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  28. arXiv:2409.00620  [pdf, other

    cs.CV cs.AI

    Enhancing Vectorized Map Perception with Historical Rasterized Maps

    Authors: Xiaoyu Zhang, Guangwei Liu, Zihao Liu, Ningyi Xu, Yunhui Liu, Ji Zhao

    Abstract: In autonomous driving, there is growing interest in end-to-end online vectorized map perception in bird's-eye-view (BEV) space, with an expectation that it could replace traditional high-cost offline high-definition (HD) maps. However, the accuracy and robustness of these methods can be easily compromised in challenging conditions, such as occlusion or adverse weather, when relying only on onboard… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024

  29. arXiv:2409.00618  [pdf, other

    cs.CV

    YOLOO: You Only Learn from Others Once

    Authors: Lipeng Gu, Mingqiang Wei, Xuefeng Yan, Dingkun Zhu, Wei Zhao, Haoran Xie, Yong-Jin Liu

    Abstract: Multi-modal 3D multi-object tracking (MOT) typically necessitates extensive computational costs of deep neural networks (DNNs) to extract multi-modal representations. In this paper, we propose an intriguing question: May we learn from multiple modalities only during training to avoid multi-modal input in the inference phase? To answer it, we propose \textbf{YOLOO}, a novel multi-modal 3D MOT parad… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  30. arXiv:2409.00606  [pdf, other

    cs.CV

    Style Transfer: From Stitching to Neural Networks

    Authors: Xinhe Xu, Zhuoer Wang, Yihan Zhang, Yizhou Liu, Zhaoyue Wang, Zhihao Xu, Muhan Zhao

    Abstract: This article compares two style transfer methods in image processing: the traditional method, which synthesizes new images by stitching together small patches from existing images, and a modern machine learning-based approach that uses a segmentation network to isolate foreground objects and apply style transfer solely to the background. The traditional method excels in creating artistic abstracti… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  31. arXiv:2409.00509  [pdf, other

    cs.CL

    LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

    Authors: Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi

    Abstract: Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training s… ▽ More

    Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Work in Progress

  32. arXiv:2409.00499  [pdf, other

    cs.RO cs.CV

    DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

    Authors: Haonan Chang, Kowndinya Boyalakuntla, Yuhan Liu, Xinyu Zhang, Liam Schramm, Abdeslam Boularias

    Abstract: Solving storage problem: where objects must be accurately placed into containers with precise orientations and positions, presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same st… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Paper Accepted by IROS2024. Arxiv version is 8 pages

  33. arXiv:2409.00402  [pdf, ps, other

    cs.IT eess.SP

    Generalized Orthogonal Chirp Division Multiplexing in Doubly Selective Channels

    Authors: Yun Liu, Hao Zhao, Huazhen Yao, Zeng Hu, Yinming Cui, Dehuan Wan

    Abstract: In recent years, orthogonal chirp division modulation (OCDM) has gained attention as a robust communication waveform due to its strong resistance to both time-domain and frequency-domain interference. However, similar to orthogonal frequency division multiplexing (OFDM), OCDM suffers from a high peak-to-average power ratio (PAPR), resulting in increased hardware costs and reduced energy efficiency… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  34. arXiv:2409.00356  [pdf, other

    cs.SD cs.AI eess.AS

    Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology

    Authors: Weinan Dai, Yifeng Jiang, Yuanjing Liu, Jinkun Chen, Xin Sun, Jinglei Tao

    Abstract: This paper addresses the persistent challenge in Keyword Spotting (KWS), a fundamental component in speech technology, regarding the acquisition of substantial labeled data for training. Given the difficulty in obtaining large quantities of positive samples and the laborious process of collecting new target samples when the keyword changes, we introduce a novel approach combining unsupervised cont… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: This paper has been accepted by the ICPR2024

  35. arXiv:2409.00324  [pdf, other

    cs.NI

    User-centric Service Provision for Edge-assisted Mobile AR: A Digital Twin-based Approach

    Authors: Conghao Zhou, Jie Gao, Yixiang Liu, Shisheng Hu, Nan Cheng, Xuemin Shen

    Abstract: Future 6G networks are envisioned to support mobile augmented reality (MAR) applications and provide customized immersive experiences for users via advanced service provision. In this paper, we investigate user-centric service provision for edge-assisted MAR to support the timely camera frame uploading of an MAR device by optimizing the spectrum resource reservation. To address the challenge of no… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  36. arXiv:2409.00143  [pdf, other

    cs.LG cs.AI cs.CV

    Robust Temporal-Invariant Learning in Multimodal Disentanglement

    Authors: Guoyang Xu, Junqi Xue, Zhenxi Song, Yuxin Liu, Zirui Wang, Min Zhang, Zhiguo Zhang

    Abstract: Multimodal sentiment recognition aims to learn representations from different modalities to identify human emotions. However, previous works does not suppresses the frame-level redundancy inherent in continuous time series, resulting in incomplete modality representations with noise. To address this issue, we propose the Temporal-invariant learning, which minimizes the distributional differences b… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

    Comments: 5 pages, 2 figures, this is the first version. The code is available at https://github.com/X-G-Y/RTIL

  37. arXiv:2409.00138  [pdf, other

    cs.CL cs.AI cs.CR

    PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

    Authors: Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, Diyi Yang

    Abstract: As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challe… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

    Comments: Under review

  38. arXiv:2408.17439  [pdf, ps, other

    quant-ph cs.CC

    Quantum state testing with restricted measurements

    Authors: Yuhan Liu, Jayadev Acharya

    Abstract: We study quantum state testing where the goal is to test whether $ρ=ρ_0\in\mathbb{C}^{d\times d}$ or $\|ρ-ρ_0\|_1>\varepsilon$, given $n$ copies of $ρ$ and a known state description $ρ_0$. In practice, not all measurements can be easily applied, even using unentangled measurements where each copy is measured separately. We develop an information-theoretic framework that yields unified copy complex… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 43 pages. Part of the work was published at COLT 2024. arXiv admin note: text overlap with arXiv:2401.09650

  39. arXiv:2408.17437  [pdf, other

    cs.CL

    SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists

    Authors: Raoyuan Zhao, Abdullatif Köksal, Yihong Liu, Leonie Weissweiler, Anna Korhonen, Hinrich Schütze

    Abstract: Traditional benchmarking in NLP typically involves using static held-out test sets. However, this approach often results in an overestimation of performance and lacks the ability to offer comprehensive, interpretable, and dynamic assessments of NLP models. Recently, works like DynaBench (Kiela et al., 2021) and CheckList (Ribeiro et al., 2020) have addressed these limitations through behavioral te… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  40. arXiv:2408.17355  [pdf, other

    cs.RO cs.AI cs.LG

    Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling

    Authors: Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Maximilian Du, Chelsea Finn

    Abstract: Predicting and executing a sequence of actions without intermediate replanning, known as action chunking, is increasingly used in robot learning from human demonstrations. However, its effects on learned policies remain puzzling: some studies highlight its importance for achieving strong performance, while others observe detrimental effects. In this paper, we first dissect the role of action chunk… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Project website: https://bid-robot.github.io/

  41. arXiv:2408.17135  [pdf, other

    cs.CV

    Temporal and Interactive Modeling for Efficient Human-Human Motion Generation

    Authors: Yabiao Wang, Shuo Wang, Jiangning Zhang, Ke Fan, Jiafu Wu, Zhengkai Jiang, Yong Liu

    Abstract: Human-human motion generation is essential for understanding humans as social beings. Although several transformer-based methods have been proposed, they typically model each individual separately and overlook the causal relationships in temporal motion sequences. Furthermore, the attention mechanism in transformers exhibits quadratic computational complexity, significantly reducing their efficien… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Homepage: https://aigc-explorer.github.io/TIM-page/

  42. arXiv:2408.16975  [pdf, other

    q-bio.BM cs.AI cs.LG

    Technical Report of HelixFold3 for Biomolecular Structure Prediction

    Authors: Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Xiaonan Zhang, Xiaomin Fang

    Abstract: The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predicti… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  43. arXiv:2408.16886  [pdf, other

    eess.IV cs.CV

    LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation

    Authors: Juntao Jiang, Mengmeng Wang, Huizhong Tian, Lingbo Cheng, Yong Liu

    Abstract: Although the progress made by large models in computer vision, optimization challenges, the complexity of transformer models, computational limitations, and the requirements of practical applications call for simpler designs in model architecture for medical image segmentation, especially in mobile medical devices that require lightweight and deployable models with real-time performance. However,… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  44. ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding

    Authors: Minghang Zheng, Jiahua Zhang, Qingchao Chen, Yuxin Peng, Yang Liu

    Abstract: Visual grounding aims to localize the object referred to in an image based on a natural language query. Although progress has been made recently, accurately localizing target objects within multiple-instance distractions (multiple objects of the same category as the target) remains a significant challenge. Existing methods demonstrate a significant performance drop when there are multiple distract… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

    ACM Class: I.2

  45. arXiv:2408.16247  [pdf, other

    cs.CV

    Anno-incomplete Multi-dataset Detection

    Authors: Yiran Xu, Haoxiang Zhong, Kai Wu, Jialin Li, Yong Liu, Chengjie Wang, Shu-Tao Xia, Hongen Liao

    Abstract: Object detectors have shown outstanding performance on various public datasets. However, annotating a new dataset for a new task is usually unavoidable in real, since 1) a single existing dataset usually does not contain all object categories needed; 2) using multiple datasets usually suffers from annotation incompletion and heterogeneous features. We propose a novel problem as "Annotation-incompl… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 12 pages, 9 figures

  46. arXiv:2408.16219  [pdf, other

    cs.CV

    Training-free Video Temporal Grounding using Large-scale Pre-trained Models

    Authors: Minghang Zheng, Xinhao Cai, Qingchao Chen, Yuxin Peng, Yang Liu

    Abstract: Video temporal grounding aims to identify video segments within untrimmed videos that are most relevant to a given natural language query. Existing video temporal localization models rely on specific datasets for training and have high data collection costs, but they exhibit poor generalization capability under the across-dataset and out-of-distribution (OOD) settings. In this paper, we propose a… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  47. arXiv:2408.16030  [pdf

    cs.SD cs.AI cs.LG eess.AS

    A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds

    Authors: Ying-Chieh Hsu, Stanley Yung-Chuan Liu, Chao-Jung Huang, Chi-Wei Wu, Ren-Kai Cheng, Jane Yung-Jen Hsu, Shang-Ran Huang, Yuan-Ren Cheng, Fu-Shun Hsu

    Abstract: This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  48. arXiv:2408.15947  [pdf, other

    eess.IV cs.CV

    Auxiliary Input in Training: Incorporating Catheter Features into Deep Learning Models for ECG-Free Dynamic Coronary Roadmapping

    Authors: Yikang Liu, Lin Zhao, Eric Z. Chen, Xiao Chen, Terrence Chen, Shanhui Sun

    Abstract: Dynamic coronary roadmapping is a technology that overlays the vessel maps (the "roadmap") extracted from an offline image sequence of X-ray angiography onto a live stream of X-ray fluoroscopy in real-time. It aims to offer navigational guidance for interventional surgeries without the need for repeated contrast agent injections, thereby reducing the risks associated with radiation exposure and ki… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024

  49. arXiv:2408.15916  [pdf, other

    eess.AS cs.LG cs.SD

    Multi-modal Adversarial Training for Zero-Shot Voice Cloning

    Authors: John Janiczek, Dading Chong, Dongyang Dai, Arlo Faria, Chao Wang, Tao Wang, Yuzong Liu

    Abstract: A text-to-speech (TTS) model trained to reconstruct speech given text tends towards predictions that are close to the average characteristics of a dataset, failing to model the variations that make human speech sound natural. This problem is magnified for zero-shot voice cloning, a task that requires training data with high variance in speaking styles. We build off of recent works which have used… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted at INTERSPEECH 2024

  50. arXiv:2408.15813  [pdf, other

    cs.CV

    DQFormer: Towards Unified LiDAR Panoptic Segmentation with Decoupled Queries

    Authors: Yu Yang, Jianbiao Mei, Liang Liu, Siliang Du, Yilin Xiao, Jongwon Ra, Yong Liu, Xiao Xu, Huifeng Wu

    Abstract: LiDAR panoptic segmentation, which jointly performs instance and semantic segmentation for things and stuff classes, plays a fundamental role in LiDAR perception tasks. While most existing methods explicitly separate these two segmentation tasks and utilize different branches (i.e., semantic and instance branches), some recent methods have embraced the query-based paradigm to unify LiDAR panoptic… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages, 10 figures