Skip to main content

Showing 1–50 of 906 results for author: Gao, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02616  [pdf, other

    cs.IT

    Group Information Geometry Approach for Ultra-Massive MIMO Signal Detection

    Authors: Jiyuan Yang, Yan Chen, Xiqi Gao, Xiang-Gen Xia, Dirk Slock

    Abstract: We propose a group information geometry approach (GIGA) for ultra-massive multiple-input multiple-output (MIMO) signal detection. The signal detection task is framed as computing the approximate marginals of the a posteriori distribution of the transmitted data symbols of all users. With the approximate marginals, we perform the maximization of the {\textsl{a posteriori}} marginals (MPM) detection… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02438  [pdf, other

    cs.CV

    Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation

    Authors: Yilong Chen, Zongyi Xu, Xiaoshui Huang, Shanshan Zhao, Xinqi Jiang, Xinyu Gao, Xinbo Gao

    Abstract: Compared to single-modal knowledge distillation, cross-modal knowledge distillation faces more severe challenges due to domain gaps between modalities. Although various methods have proposed various solutions to overcome these challenges, there is still limited research on how domain gaps affect cross-modal knowledge distillation. This paper provides an in-depth analysis and evaluation of this iss… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.02095  [pdf, other

    cs.CV cs.AI cs.GR

    DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

    Authors: Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan

    Abstract: Despite significant advancements in monocular depth estimation for static images, estimating video depth in the open world remains challenging, since open-world videos are extremely diverse in content, motion, camera movement, and length. We present DepthCrafter, an innovative method for generating temporally consistent long depth sequences with intricate details for open-world videos, without req… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Project webpage: https://depthcrafter.github.io

  4. arXiv:2409.02048  [pdf, other

    cs.CV

    ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

    Authors: Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo Hu, Xiaoyu Li, Zhipeng Huang, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, Yonghong Tian

    Abstract: Despite recent advancements in neural 3D reconstruction, the dependence on dense multi-view captures restricts their broader applicability. In this work, we propose \textbf{ViewCrafter}, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images with the prior of video diffusion model. Our method takes advantage of the powerful generation capabilities… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Project page: https://drexubery.github.io/ViewCrafter/

  5. arXiv:2409.00088  [pdf, other

    cs.CL

    On-Device Language Models: A Comprehensive Review

    Authors: Jiajun Xu, Zhiyuan Li, Wei Chen, Qun Wang, Xin Gao, Qi Cai, Ziyuan Ling

    Abstract: The advent of large language models (LLMs) revolutionized natural language processing applications, and running LLMs on edge devices has become increasingly attractive for reasons including reduced latency, data localization, and personalized user experiences. This comprehensive review examines the challenges of deploying computationally expensive LLMs on resource-constrained devices and explores… ▽ More

    Submitted 25 August, 2024; originally announced September 2024.

    Comments: 38 pages, 6 figures

  6. arXiv:2408.17168  [pdf, other

    cs.CV

    EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

    Authors: Zhen Fan, Peng Dai, Zhuo Su, Xu Gao, Zheng Lv, Jiarui Zhang, Tianyuan Du, Guidong Wang, Yang Zhang

    Abstract: Egocentric human pose estimation (HPE) using wearable sensors is essential for VR/AR applications. Most methods rely solely on either egocentric-view images or sparse Inertial Measurement Unit (IMU) signals, leading to inaccuracies due to self-occlusion in images or the sparseness and drift of inertial sensors. Most importantly, the lack of real-world datasets containing both modalities is a major… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  7. CooTest: An Automated Testing Approach for V2X Communication Systems

    Authors: An Guo, Xinyu Gao, Zhenyu Chen, Yuan Xiao, Jiakai Liu, Xiuting Ge, Weisong Sun, Chunrong Fang

    Abstract: Perceiving the complex driving environment precisely is crucial to the safe operation of autonomous vehicles. With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) collaboration has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. However, despite spectacular progress, several co… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Journal ref: Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '24), September 16--20, 2024, Vienna, Austria

  8. arXiv:2408.16315  [pdf, other

    cs.HC cs.LG eess.SP

    Passenger hazard perception based on EEG signals for highly automated driving vehicles

    Authors: Ashton Yu Xuan Tan, Yingkai Yang, Xiaofei Zhang, Bowen Li, Xiaorong Gao, Sifa Zheng, Jianqiang Wang, Xinyu Gu, Jun Li, Yang Zhao, Yuxin Zhang, Tania Stathaki

    Abstract: Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.15708  [pdf, other

    cs.CV

    Towards Realistic Example-based Modeling via 3D Gaussian Stitching

    Authors: Xinyu Gao, Ziyi Yang, Bingchen Gong, Xiaoguang Han, Sipeng Yang, Xiaogang Jin

    Abstract: Using parts of existing models to rebuild new models, commonly termed as example-based modeling, is a classical methodology in the realm of computer graphics. Previous works mostly focus on shape composition, making them very hard to use for realistic composition of 3D objects captured from real-world scenes. This leads to combining multiple NeRFs into a single 3D scene to achieve seamless appeara… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  10. arXiv:2408.14892  [pdf, other

    cs.CL cs.SD eess.AS

    A Functional Trade-off between Prosodic and Semantic Cues in Conveying Sarcasm

    Authors: Zhu Li, Xiyuan Gao, Yuqing Zhang, Shekhar Nayak, Matt Coler

    Abstract: This study investigates the acoustic features of sarcasm and disentangles the interplay between the propensity of an utterance being used sarcastically and the presence of prosodic cues signaling sarcasm. Using a dataset of sarcastic utterances compiled from television shows, we analyze the prosodic features within utterances and key phrases belonging to three distinct sarcasm categories (embedded… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: accepted at Interspeech 2024

  11. arXiv:2408.14418  [pdf, other

    cs.CL cs.AI

    MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues

    Authors: Kuluhan Binici, Abhinav Ramesh Kashyap, Viktor Schlegel, Andy T. Liu, Vijay Prakash Dwivedi, Thanh-Tung Nguyen, Xiaoxue Gao, Nancy F. Chen, Stefan Winkler

    Abstract: Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solut… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  12. arXiv:2408.14211  [pdf, other

    cs.CV cs.AI

    MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

    Authors: Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang

    Abstract: Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D d… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Project Page: https://thuhcsi.github.io/MagicMan

  13. arXiv:2408.14135  [pdf, other

    cs.CV

    Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models

    Authors: Chaohua Shi, Xuan Wang, Si Shi, Xule Wang, Mingrui Zhu, Nannan Wang, Xinbo Gao

    Abstract: Food image composition requires the use of existing dish images and background images to synthesize a natural new image, while diffusion models have made significant advancements in image generation, enabling the construction of end-to-end architectures that yield promising results. However, existing diffusion models face challenges in processing and fusing information from multiple images and lac… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 14 pages

  14. arXiv:2408.13024  [pdf, other

    cs.CV

    Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

    Authors: Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao, Xuelong Li

    Abstract: 3D Object Affordance Grounding aims to predict the functional regions on a 3D object and has laid the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  15. arXiv:2408.10694  [pdf, other

    cs.CV

    MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification

    Authors: Huafeng Qin, Yuming Fu, Huiyan Zhang, Mounim A. El-Yacoubi, Xinbo Gao, Qun Song, Jun Wang

    Abstract: Deep neural networks have recently achieved promising performance in the vein recognition task and have shown an increasing application trend, however, they are prone to adversarial perturbation attacks by adding imperceptible perturbations to the input, resulting in making incorrect recognition. To address this issue, we propose a novel defense model named MsMemoryGAN, which aims to filter the pe… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  16. arXiv:2408.09048  [pdf, other

    q-bio.QM cs.AI cs.LG

    mRNA2vec: mRNA Embedding with Language Model in the 5'UTR-CDS for mRNA Design

    Authors: Honggen Zhang, Xiangrui Gao, June Zhang, Lipeng Lai

    Abstract: Messenger RNA (mRNA)-based vaccines are accelerating the discovery of new drugs and revolutionizing the pharmaceutical industry. However, selecting particular mRNA sequences for vaccines and therapeutics from extensive mRNA libraries is costly. Effective mRNA therapeutics require carefully designed sequences with optimized expression levels and stability. This paper proposes a novel contextual lan… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  17. arXiv:2408.08516  [pdf, other

    cs.MA

    Multilevel Graph Reinforcement Learning for Consistent Cognitive Decision-making in Heterogeneous Mixed Autonomy

    Authors: Xin Gao, Zhaoyang Ma, Xueyuan Li, Xiaoqiang Meng, Zirui Li

    Abstract: In the realm of heterogeneous mixed autonomy, vehicles experience dynamic spatial correlations and nonlinear temporal interactions in a complex, non-Euclidean space. These complexities pose significant challenges to traditional decision-making frameworks. Addressing this, we propose a hierarchical reinforcement learning framework integrated with multilevel graph representations, which effectively… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 15 pages, 9 figures

  18. arXiv:2408.07578  [pdf, other

    cs.MA cs.LG

    A Nested Graph Reinforcement Learning-based Decision-making Strategy for Eco-platooning

    Authors: Xin Gao, Xueyuan Li, Hao Liu, Ao Li, Zhaoyang Ma, Zirui Li

    Abstract: Platooning technology is renowned for its precise vehicle control, traffic flow optimization, and energy efficiency enhancement. However, in large-scale mixed platoons, vehicle heterogeneity and unpredictable traffic conditions lead to virtual bottlenecks. These bottlenecks result in reduced traffic throughput and increased energy consumption within the platoon. To address these challenges, we int… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 14 pages, 18 figures

  19. arXiv:2408.07476  [pdf, other

    cs.CV

    One Step Diffusion-based Super-Resolution with Time-Aware Distillation

    Authors: Xiao He, Huaao Tang, Zhijun Tu, Junchao Zhang, Kun Cheng, Hanting Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu

    Abstract: Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. However, these approaches typically require tens or even hundreds of iterative samplings, resulting in significant latency. Recently, techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowl… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 18 pages

  20. arXiv:2408.06907  [pdf, other

    cs.IT

    An Information Geometry Interpretation for Approximate Message Passing

    Authors: Bingyan Liu, An-An Lu, Mingrui Fan, Jiyuan Yang, Xiqi Gao

    Abstract: In this paper, we propose an information geometry (IG) framework to solve the standard linear regression problem. The proposed framework is an extension of the one for computing the mean of complex multivariate Gaussian distribution. By applying the proposed framework, the information geometry approach (IGA) and the approximate information geometry approach (AIGA) for basis pursuit de-noising (BPD… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 30 pages, 5 figures

  21. arXiv:2408.05743  [pdf, other

    cs.CV

    Neural Architecture Search based Global-local Vision Mamba for Palm-Vein Recognition

    Authors: Huafeng Qin, Yuming Fu, Jing Chen, Mounim A. El-Yacoubi, Xinbo Gao, Jun Wang

    Abstract: Due to the advantages such as high security, high privacy, and liveness recognition, vein recognition has been received more and more attention in past years. Recently, deep learning models, e.g., Mamba has shown robust feature representation with linear computational complexity and successfully applied for visual tasks. However, vision Manba can capture long-distance feature dependencies but unfo… ▽ More

    Submitted 13 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

  22. arXiv:2408.05437  [pdf, other

    cs.LG

    Predicting Long-Term Allograft Survival in Liver Transplant Recipients

    Authors: Xiang Gao, Michael Cooper, Maryam Naghibzadeh, Amirhossein Azhie, Mamatha Bhat, Rahul G. Krishnan

    Abstract: Liver allograft failure occurs in approximately 20% of liver transplant recipients within five years post-transplant, leading to mortality or the need for retransplantation. Providing an accurate and interpretable model for individualized risk estimation of graft failure is essential for improving post-transplant care. To this end, we introduce the Model for Allograft Survival (MAS), a simple line… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: Accepted at MLHC 2024

  23. arXiv:2408.04755  [pdf, other

    cs.SE eess.SY

    Automation Configuration in Smart Home Systems: Challenges and Opportunities

    Authors: Sheik Murad Hassan Anik, Xinghua Gao, Hao Zhong, Xiaoyin Wang, Na Meng

    Abstract: As the innovation of smart devices and internet-of-things (IoT), smart homes have become prevalent. People tend to transform residences into smart homes by customizing off-the-shelf smart home platforms, instead of creating IoT systems from scratch. Among the alternatives, Home Assistant (HA) is one of the most popular platforms. It allows end-users (i.e., home residents) to smartify homes by (S1)… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 13 pages, 3 figures, 3 tables, 10 listings

  24. arXiv:2408.03765  [pdf, other

    cs.LG

    Reliable Node Similarity Matrix Guided Contrastive Graph Clustering

    Authors: Yunhui Liu, Xinyi Gao, Tieke He, Tao Zheng, Jianhua Zhao, Hongzhi Yin

    Abstract: Graph clustering, which involves the partitioning of nodes within a graph into disjoint clusters, holds significant importance for numerous subsequent applications. Recently, contrastive learning, known for utilizing supervisory information, has demonstrated encouraging results in deep graph clustering. This methodology facilitates the learning of favorable node representations for clustering by a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)

  25. arXiv:2408.02907  [pdf, other

    cs.CL

    Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

    Authors: Tiezheng Guo, Chen Wang, Yanyi Liu, Jiawei Tang, Pan Li, Sai Xu, Qingwen Yang, Xianlin Gao, Zhi Li, Yingyou Wen

    Abstract: Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we pr… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  26. arXiv:2408.01037  [pdf, other

    cs.CV

    MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for Efficient Pedestrian Detection

    Authors: Xiangbo Gao, Asiegbu Miracle Kanu-Asiegbu, Xiaoxiao Du

    Abstract: This paper proposes MambaST, a plug-and-play cross-spectral spatial-temporal fusion pipeline for efficient pedestrian detection. Several challenges exist for pedestrian detection in autonomous driving applications. First, it is difficult to perform accurate detection using RGB cameras under dark or low-light conditions. Cross-spectral systems must be developed to integrate complementary informatio… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: ITSC 2024 Accepted

  27. arXiv:2408.01000  [pdf, other

    cs.LG

    Adaptive Two-Stage Cloud Resource Scaling via Hierarchical Multi-Indicator Forecasting and Bayesian Decision-Making

    Authors: Yang Luo, Shiyu Wang, Zhemeng Yu, Wei Lu, Xiaofeng Gao, Lintao Ma, Guihai Chen

    Abstract: The surging demand for cloud computing resources, driven by the rapid growth of sophisticated large-scale models and data centers, underscores the critical importance of efficient and adaptive resource allocation. As major tech enterprises deploy massive infrastructures with thousands of GPUs, existing cloud platforms still struggle with low resource utilization due to key challenges: capturing hi… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  28. arXiv:2408.00998  [pdf, other

    cs.CV cs.AI

    FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation

    Authors: Xiang Gao, Jiaying Liu

    Abstract: Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing wonderful image generation with natural-language text prompt. However, the issue of lacking controllability of such models restricts their practical applicability for real-life content creation. Thus, attention has been focused on leveraging a referen… ▽ More

    Submitted 6 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted conference paper of ACM MM 2024

  29. arXiv:2408.00957  [pdf, other

    cs.DC

    Caching Aided Multi-Tenant Serverless Computing

    Authors: Chu Qiao, Cong Wang, Zhenkai Zhang, Yuede Ji, Xing Gao

    Abstract: One key to enabling high-performance serverless computing is to mitigate cold-starts. Current solutions utilize a warm pool to keep function alive: a warm-start can be analogous to a CPU cache-hit. However, modern cache has multiple hierarchies and the last-level cache is shared among cores, whereas the warm pool is limited to a single tenant for security concerns. Also, the warm pool keep-alive p… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  30. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  31. arXiv:2407.17642  [pdf, other

    cs.LG cs.AI

    SMA-Hyper: Spatiotemporal Multi-View Fusion Hypergraph Learning for Traffic Accident Prediction

    Authors: Xiaowei Gao, James Haworth, Ilya Ilyankou, Xianghui Zhang, Tao Cheng, Stephen Law, Huanfa Chen

    Abstract: Predicting traffic accidents is the key to sustainable city management, which requires effective address of the dynamic and complex spatiotemporal characteristics of cities. Current data-driven models often struggle with data sparsity and typically overlook the integration of diverse urban data sources and the high-order dependencies within them. Additionally, they frequently rely on predefined to… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  32. arXiv:2407.16931  [pdf, other

    cs.CL

    ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering

    Authors: Xiuying Chen, Tairan Wang, Taicheng Guo, Kehan Guo, Juexiao Zhou, Haoyang Li, Mingchen Zhuge, Jürgen Schmidhuber, Xin Gao, Xiangliang Zhang

    Abstract: Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. While QA datasets are plentiful in areas like general domain and biomedicine, academic chemistry is less explored. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. Addressing this gap, we introduce S… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 14 pages

  33. arXiv:2407.15334  [pdf, other

    cs.CV

    Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection

    Authors: Yiran Yang, Xu Gao, Tong Wang, Xin Hao, Yifeng Shi, Xiao Tan, Xiaoqing Ye, Jingdong Wang

    Abstract: Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems. However, these sensors often exhibit heterogeneous natures, resulting in distributional modality gaps that present significant challenges for fusion. To address this, a robust fusion technique is crucial, particularly for enhancing 3D object detection. In this paper, we introduce a dynamic adjustment… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  34. arXiv:2407.15199  [pdf, other

    cs.CV cs.CY

    Multiple Object Detection and Tracking in Panoramic Videos for Cycling Safety Analysis

    Authors: Jingwei Guo, Meihui Wang, Ilya Ilyankou, Natchapon Jongwiriyanurak, Xiaowei Gao, Nicola Christie, James Haworth

    Abstract: Panoramic cycling videos can record 360° views around the cyclists. Thus, it is essential to conduct automatic road user analysis on them using computer vision models to provide data for studies on cycling safety. However, the features of panoramic data such as severe distortions, large number of small objects and boundary continuity have brought great challenges to the existing CV models, includi… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  35. arXiv:2407.14651  [pdf, other

    eess.IV cs.AI cs.CV

    Improving Representation of High-frequency Components for Medical Foundation Models

    Authors: Yuetan Chu, Yilan Zhang, Zhongyi Han, Changchun Yang, Longxi Zhou, Gongning Luo, Xin Gao

    Abstract: Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomic… ▽ More

    Submitted 25 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  36. arXiv:2407.14367  [pdf, other

    cs.CV

    Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations

    Authors: Decheng Liu, Zongqi Wang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Due to the successful development of deep image generation technology, forgery detection plays a more important role in social and economic security. Racial bias has not been explored thoroughly in the deep forgery detection field. In the paper, we first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOT… ▽ More

    Submitted 31 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  37. arXiv:2407.13252  [pdf, other

    cs.CV

    Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models

    Authors: Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han

    Abstract: With the rapid advancements of large-scale text-to-image diffusion models, various practical applications have emerged, bringing significant convenience to society. However, model developers may misuse the unauthorized data to train diffusion models. These data are at risk of being memorized by the models, thus potentially violating citizens' privacy rights. Therefore, in order to judge whether a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  38. arXiv:2407.12768  [pdf, other

    quant-ph cs.CC cs.IT math-ph physics.atom-ph

    A polynomial-time classical algorithm for noisy quantum circuits

    Authors: Thomas Schuster, Chao Yin, Xun Gao, Norman Y. Yao

    Abstract: We provide a polynomial-time classical algorithm for noisy quantum circuits. The algorithm computes the expectation value of any observable for any circuit, with a small average error over input states drawn from an ensemble (e.g. the computational basis). Our approach is based upon the intuition that noise exponentially damps non-local correlations relative to local correlations. This enables one… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures + 22 page Supplementary Information

  39. arXiv:2407.12317  [pdf, other

    cs.CV

    Out of Length Text Recognition with Sub-String Matching

    Authors: Yongkun Du, Zhineng Chen, Caiyan Jia, Xieping Gao, Yu-Gang Jiang

    Abstract: Scene Text Recognition (STR) methods have demonstrated robust performance in word-level text recognition. However, in real applications the text image is sometimes long due to detected with multiple horizontal words. It triggers the requirement to build long text recognition models from readily available short (i.e., word-level) text datasets, which has been less studied previously. In this paper,… ▽ More

    Submitted 13 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: Preprint, 16 pages

  40. arXiv:2407.11011  [pdf, other

    cs.CR cs.CV cs.LG

    Toward Availability Attacks in 3D Point Clouds

    Authors: Yifan Zhu, Yibo Miao, Yinpeng Dong, Xiao-Shan Gao

    Abstract: Despite the great progress of 3D vision, data privacy and security issues in 3D deep learning are not explored systematically. In the domain of 2D images, many availability attacks have been proposed to prevent data from being illicitly learned by unauthorized deep models. However, unlike images represented on a fixed dimensional grid, point clouds are characterized as unordered and unstructured s… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: ICML 2024, 21 pages

  41. On the Spectral Efficiency of Multi-user Holographic MIMO Uplink Transmission

    Authors: Mengyu Qian, Li You, Xiang-Gen Xia, Xiqi Gao

    Abstract: With antenna spacing much less than half a wavelength in confined space, holographic multiple-input multiple-output (HMIMO) technology presents a promising frontier in next-generation mobile communication. We delve into the research of the multi-user uplink transmission with both the base station and the users equipped with holographic planar arrays. To begin, we construct an HMIMO channel model u… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 14 pages, 7 figures, to appear in IEEE Transactions on Wireless Communications

  42. arXiv:2407.10439  [pdf, other

    cs.CV

    PolyRoom: Room-aware Transformer for Floorplan Reconstruction

    Authors: Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen

    Abstract: Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from point clouds. Despite significant advancements achieved in recent years, current methods still encounter several challenges, such as missing corners or edges, inacc… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  43. arXiv:2407.10281  [pdf, other

    cs.CV

    Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning

    Authors: Xinyuan Gao, Songlin Dong, Yuhang He, Qiang Wang, Yihong Gong

    Abstract: The problem of Rehearsal-Free Continual Learning (RFCL) aims to continually learn new knowledge while preventing forgetting of the old knowledge, without storing any old samples and prototypes. The latest methods leverage large-scale pre-trained models as the backbone and use key-query matching to generate trainable prompts to learn new knowledge. However, the domain gap between the pre-training d… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  44. arXiv:2407.10172  [pdf, other

    cs.CV

    Restoring Images in Adverse Weather Conditions via Histogram Transformer

    Authors: Shangquan Sun, Wenqi Ren, Xinwei Gao, Rui Wang, Xiaochun Cao

    Abstract: Transformer-based image restoration methods in adverse weather have achieved significant progress. Most of them use self-attention along the channel dimension or within spatially fixed-range blocks to reduce computational load. However, such a compromise results in limitations in capturing long-range spatial features. Inspired by the observation that the weather-induced degradation factors mainly… ▽ More

    Submitted 25 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 19 pages, 7 figures, 10MB

  45. arXiv:2407.07805  [pdf, other

    cs.CV

    SUMix: Mixup with Semantic and Uncertain Information

    Authors: Huafeng Qin, Xin Jin, Hongyu Zhu, Hongchao Liao, Mounîm A. El-Yacoubi, Xinbo Gao

    Abstract: Mixup data augmentation approaches have been applied for various tasks of deep learning to improve the generalization ability of deep neural networks. Some existing approaches CutMix, SaliencyMix, etc. randomly replace a patch in one image with patches from another to generate the mixed image. Similarly, the corresponding labels are linearly combined by a fixed ratio $λ$ by l. The objects in two i… ▽ More

    Submitted 3 September, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024 [Camera Ready] (19 pages, 7 figures) with the source code at https://github.com/JinXins/SUMix

  46. arXiv:2407.07520  [pdf, other

    cs.CV

    IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

    Authors: Mingjin Zhang, Yuchun Wang, Jie Guo, Yunsong Li, Xinbo Gao, Jing Zhang

    Abstract: The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared i… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 18 pages, 8 figures, to be published in ECCV2024

  47. arXiv:2407.07372  [pdf, other

    eess.IV cs.CV

    Trustworthy Contrast-enhanced Brain MRI Synthesis

    Authors: Jiyao Liu, Yuxin Li, Shangqi Gao, Yuncheng Zhou, Xin Gao, Ningsheng Xu, Xiao-Yong Zhang, Xiahai Zhuang

    Abstract: Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking inte… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  48. arXiv:2407.05965  [pdf, other

    cs.CV cs.AI cs.CL cs.CR cs.LG

    T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

    Authors: Yibo Miao, Yifan Zhu, Yinpeng Dong, Lijia Yu, Jun Zhu, Xiao-Shan Gao

    Abstract: The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus o… ▽ More

    Submitted 1 September, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  49. Ubiquitous Integrated Sensing and Communications for Massive MIMO LEO Satellite Systems

    Authors: Li You, Yongxiang Zhu, Xiaoyu Qiang, Christos G. Tsinos, Wenjin Wang, Xiqi Gao, Björn Ottersten

    Abstract: The next sixth generation (6G) networks are envisioned to integrate sensing and communications in a single system, thus greatly improving spectrum utilization and reducing hardware costs. Low earth orbit (LEO) satellite communications combined with massive multiple-input multiple-output (MIMO) technology holds significant promise in offering ubiquitous and seamless connectivity with high data rate… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 6 pages,4 figures

    Journal ref: IEEE Internet of Things Magazine, vol. 7, no. 4, pp. 30-35, Jul. 2024

  50. Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation

    Authors: Xiang Gao, Zhengbo Xu, Junhan Zhao, Jiaying Liu

    Abstract: Recently, large-scale text-to-image (T2I) diffusion models have emerged as a powerful tool for image-to-image translation (I2I), allowing open-domain image translation via user-provided text prompts. This paper proposes frequency-controlled diffusion model (FCDiffusion), an end-to-end diffusion-based framework that contributes a novel solution to text-guided I2I from a frequency-domain perspective… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI 2024)

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(3), 1824-1832