Skip to main content

Showing 1–50 of 1,956 results for author: Li, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03605  [pdf, other

    cs.CV cs.MM

    SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing

    Authors: Lingyu Xiong, Xize Cheng, Jintao Tan, Xianjia Wu, Xiandong Li, Lei Zhu, Fei Ma, Minglei Li, Huang Xu, Zhihu Hu

    Abstract: Audio-driven talking face generation aims to synthesize video with lip movements synchronized to input audio. However, current generative techniques face challenges in preserving intricate regional textures (skin, teeth). To address the aforementioned challenges, we propose a novel framework called SegTalker to decouple lip movements and image textures by introducing segmentation as intermediate r… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 10 pages, 7 figures, 3 tables

  2. arXiv:2409.03597  [pdf, other

    cs.SD cs.AI eess.AS

    Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Cord Paralysis

    Authors: Yucong Zhang, Xin Zou, Jinshan Yang, Wenjun Chen, Faya Liang, Ming Li

    Abstract: This paper presents the Multimodal Analyzing System for Laryngoscope (MASL), a system that combines audio and video data to automatically extract key segments and metrics from laryngeal videostroboscopic videos for clinical assessment. MASL integrates glottis detection with keyword spotting to analyze patient vocalizations and refine video highlights for better inspection of vocal cord movements.… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.03594  [pdf, other

    cs.GT

    A Complete Landscape of EFX Allocations of Mixed Manna on Graphs

    Authors: Yu Zhou, Tianze Wei, Minming Li, Bo Li

    Abstract: We study envy-free up to any item (EFX) allocations on graphs where vertices and edges represent agents and items respectively. An agent is only interested in items that are incident to her and all other items have zero marginal values to her. Christodoulou et al. [EC, 2023] first proposed this setting and studied the case of goods. We extend this setting to the case of mixed manna where an item m… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted in IJCAI 2024

  4. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  5. arXiv:2409.03368  [pdf, other

    cs.NE

    Training-free Conversion of Pretrained ANNs to SNNs for Low-Power and High-Performance Applications

    Authors: Tong Bu, Maohua Li, Zhaofei Yu

    Abstract: Spiking Neural Networks (SNNs) have emerged as a promising substitute for Artificial Neural Networks (ANNs) due to their advantages of fast inference and low power consumption. However, the lack of efficient training algorithms has hindered their widespread adoption. Existing supervised learning algorithms for SNNs require significantly more memory and time than their ANN counterparts. Even common… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  6. arXiv:2409.03206  [pdf, other

    cs.CV cs.AI

    TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations

    Authors: Mingze Gao, Jingyu Liu, Mingda Li, Jiangtao Xie, Qingbin Liu, Bo Zhao, Xi Chen, Hui Xiong

    Abstract: Multimodal Large Language Models (MLLMs) have significantly improved performance across various image-language applications. Recently, there has been a growing interest in adapting image pre-trained MLLMs for video-related tasks. However, most efforts concentrate on enhancing the vision encoder and projector components, while the core part, Large Language Models (LLMs), remains comparatively under… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  7. arXiv:2409.02615  [pdf, other

    eess.AS cs.SD

    USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction

    Authors: Bang Zeng, Ming Li

    Abstract: Target speaker extraction aims to isolate the voice of a specific speaker from mixed speech. Traditionally, this process has relied on extracting a speaker embedding from a reference speech, necessitating a speaker recognition model. However, identifying an appropriate speaker recognition model can be challenging, and using the target speaker embedding as reference information may not be optimal f… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 13 pages, 6 figures

  8. arXiv:2409.02375  [pdf, other

    cs.CL

    How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review

    Authors: Xichou Zhu, Yang Liu, Zhou Shen, Yi Liu, Min Li, Yujun Chen, Benzi John, Zhenzhen Ma, Tao Hu, Bolong Yang, Manman Wang, Zongxing Xie, Peng Liu, Dan Cai, Junhui Wang

    Abstract: The recent advances in large language models (LLMs) have significantly expanded their applications across various fields such as language generation, summarization, and complex question answering. However, their application to privacy compliance and technical privacy reviews remains under-explored, raising critical concerns about their ability to adhere to global privacy standards and protect sens… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 8 pages, 4 figures

  9. arXiv:2409.02370  [pdf, other

    cs.CL cs.AI

    Do Large Language Models Possess Sensitive to Sentiment?

    Authors: Yang Liu, Xichou Zhu, Zhou Shen, Yi Liu, Min Li, Yujun Chen, Benzi John, Zhenzhen Ma, Tao Hu, Zhiyang Xu, Wei Luo, Junhui Wang

    Abstract: Large Language Models (LLMs) have recently displayed their extraordinary capabilities in language understanding. However, how to comprehensively assess the sentiment capabilities of LLMs continues to be a challenge. This paper investigates the ability of LLMs to detect and react to sentiment in text modal. As the integration of LLMs into diverse applications is on the rise, it becomes highly criti… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 10 pages, 2 figures

  10. arXiv:2409.01315  [pdf, other

    physics.comp-ph cs.AI cs.LG

    Multi-frequency Neural Born Iterative Method for Solving 2-D Inverse Scattering Problems

    Authors: Daoqi Liu, Tao Shan, Maokun Li, Fan Yang, Shenheng Xu

    Abstract: In this work, we propose a deep learning-based imaging method for addressing the multi-frequency electromagnetic (EM) inverse scattering problem (ISP). By combining deep learning technology with EM physical laws, we have successfully developed a multi-frequency neural Born iterative method (NeuralBIM), guided by the principles of the single-frequency NeuralBIM. This method integrates multitask lea… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    MSC Class: 35Q61 ACM Class: I.2.6; G.1.8; G.1.3

  11. arXiv:2409.01282  [pdf

    cs.CV cs.CR cs.LG

    One-Index Vector Quantization Based Adversarial Attack on Image Classification

    Authors: Haiju Fan, Xiaona Qin, Shuang Chen, Hubert P. H. Shum, Ming Li

    Abstract: To improve storage and transmission, images are generally compressed. Vector quantization (VQ) is a popular compression method as it has a high compression ratio that suppresses other compression techniques. Despite this, existing adversarial attack methods on image classification are mostly performed in the pixel domain with few exceptions in the compressed domain, making them less applicable in… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  12. arXiv:2409.00250  [pdf, other

    cs.CV

    Medical Report Generation Is A Multi-label Classification Problem

    Authors: Yijian Fan, Zhenbang Yang, Rui Liu, Mingjie Li, Xiaojun Chang

    Abstract: Medical report generation is a critical task in healthcare that involves the automatic creation of detailed and accurate descriptions from medical images. Traditionally, this task has been approached as a sequence generation problem, relying on vision-and-language techniques to generate coherent and contextually relevant reports. However, in this paper, we propose a novel perspective: rethinking m… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: Accepted to 2024 IEEE International Conference on Medical Artificial Intelligence

  13. arXiv:2409.00133  [pdf, other

    cs.CL cs.AI

    A Survey for Large Language Models in Biomedicine

    Authors: Chong Wang, Mengyao Li, Junjun He, Zhongruo Wang, Erfan Darzi, Zan Chen, Jin Ye, Tianbin Li, Yanzhou Su, Jing Ke, Kaili Qu, Shuxin Li, Yi Yu, Pietro Liò, Tianyun Wang, Yu Guang Wang, Yiqing Shen

    Abstract: Recent breakthroughs in large language models (LLMs) offer unprecedented natural language understanding and generation capabilities. However, existing surveys on LLMs in biomedicine often focus on specific applications or model architectures, lacking a comprehensive analysis that integrates the latest advancements across various biomedical domains. This review, based on an analysis of 484 publicat… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

  14. arXiv:2408.16423  [pdf, other

    eess.AS cs.SD

    WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding

    Authors: Mohan Li, Cong-Thanh Do, Simon Keizer, Youmna Farag, Svetlana Stoyanchev, Rama Doddipatla

    Abstract: Speech large language models (speech-LLMs) integrate speech and text-based foundation models to provide a unified framework for handling a wide range of downstream tasks. In this paper, we introduce WHISMA, a speech-LLM tailored for spoken language understanding (SLU) that demonstrates robust performance in various zero-shot settings. WHISMA combines the speech encoder from Whisper with the Llama-… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: accepted to SLT 2024

  15. arXiv:2408.15516  [pdf, other

    cs.NI

    Predicting Parameter Change's Effect on Cellular Network Time Series

    Authors: Mingjie Li, Yongqian Sun, Xiaolei Hua, Renkai Yu, Xinwen Fan, Lin Zhu, Junlan Feng, Dan Pei

    Abstract: The cellular network provides convenient network access for ever-growing mobile phones. During the continuous optimization, operators can adjust cell parameters to enhance the Quality of Service (QoS) flexibly. A precise prediction of the parameter change's effect can help operators make proper parameter adjustments. This work focuses on predicting cell status (like the workload and QoS) after adj… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  16. arXiv:2408.15283  [pdf, other

    cs.CV eess.SP

    3D Photon Counting CT Image Super-Resolution Using Conditional Diffusion Model

    Authors: Chuang Niu, Christopher Wiedeman, Mengzhou Li, Jonathan S Maltz, Ge Wang

    Abstract: This study aims to improve photon counting CT (PCCT) image resolution using denoising diffusion probabilistic models (DDPM). Although DDPMs have shown superior performance when applied to various computer vision tasks, their effectiveness has yet to be translated to high dimensional CT super-resolution. To train DDPMs in a conditional sampling manner, we first leverage CatSim to simulate realistic… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 17th International Meeting on Fully 3D Image Reconstruction in Radiology and Nuclear Medicine, Stony Brook, NY, USA, 2023 [arXiv:2310.16846]

  17. arXiv:2408.15026  [pdf, other

    cs.CV cs.AI

    Sequence-aware Pre-training for Echocardiography Probe Guidance

    Authors: Haojun Jiang, Zhenguo Sun, Yu Sun, Ning Jia, Meng Li, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: Cardiac ultrasound probe guidance aims to help novices adjust the 6-DOF probe pose to obtain high-quality sectional images. Cardiac ultrasound faces two major challenges: (1) the inherently complex structure of the heart, and (2) significant individual variations. Previous works have only learned the population-averaged 2D and 3D structures of the heart rather than personalized cardiac structural… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Tech Report

  18. arXiv:2408.14515  [pdf, other

    cs.SE cs.AI cs.LG cs.PL

    A Joint Learning Model with Variational Interaction for Multilingual Program Translation

    Authors: Yali Du, Hui Sun, Ming Li

    Abstract: Programs implemented in various programming languages form the foundation of software applications. To alleviate the burden of program migration and facilitate the development of software systems, automated program translation across languages has garnered significant attention. Previous approaches primarily focus on pairwise translation paradigms, learning translation between pairs of languages u… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)

  19. arXiv:2408.14493  [pdf

    cs.LG eess.SY

    Extraction of Typical Operating Scenarios of New Power System Based on Deep Time Series Aggregation

    Authors: Zhaoyang Qu, Zhenming Zhang, Nan Qu, Yuguang Zhou, Yang Li, Tao Jiang, Min Li, Chao Long

    Abstract: Extracting typical operational scenarios is essential for making flexible decisions in the dispatch of a new power system. This study proposed a novel deep time series aggregation scheme (DTSAs) to generate typical operational scenarios, considering the large amount of historical operational snapshot data. Specifically, DTSAs analyze the intrinsic mechanisms of different scheduling operational sce… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by CAAI Transactions on Intelligence Technology

  20. arXiv:2408.12981  [pdf, other

    cs.AI

    QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

    Authors: Chenghua Gao, Min Li, Jianshuo Liu, Junxing Ren, Lin Chen, Haoyu Liu, Bo Meng, Jitao Fu, Wenwen Su

    Abstract: Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language s… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures, 4 tables

  21. arXiv:2408.12867  [pdf, other

    cs.CV

    Semantic Alignment for Multimodal Large Language Models

    Authors: Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu

    Abstract: Research on Multi-modal Large Language Models (MLLMs) towards the multi-image cross-modal instruction has received increasing attention and made significant progress, particularly in scenarios involving closely resembling images (e.g., change captioning). Existing MLLMs typically follow a two-step process in their pipelines: first, extracting visual tokens independently for each input image, and t… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by MM 2024

  22. arXiv:2408.12606  [pdf, other

    cs.CV cs.AI

    Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model

    Authors: Luyang Luo, Mingxiang Wu, Mei Li, Yi Xin, Qiong Wang, Varut Vardhanabhuti, Winnie CW Chu, Zhenhui Li, Juan Zhou, Pranav Rajpurkar, Hao Chen

    Abstract: Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts… ▽ More

    Submitted 1 September, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 27 pages, 8 figures, 10 tables

  23. arXiv:2408.12325  [pdf, other

    cs.CL

    Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

    Authors: Dingkang Yang, Dongling Xiao, Jinjie Wei, Mingcheng Li, Zhaoyu Chen, Ke Li, Lihua Zhang

    Abstract: Despite their remarkable capabilities, Large Language Models (LLMs) are prone to generate responses that contradict verifiable facts, i.e., unfaithful hallucination content. Existing efforts generally focus on optimizing model parameters or editing semantic representations, which compromise the internal factual knowledge of target LLMs. In addition, hallucinations typically exhibit multifaceted pa… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Hallucination Mitigation in LLMs

  24. arXiv:2408.11982  [pdf, other

    eess.IV cs.CV cs.MM

    AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

    Authors: Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, Zicheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Xiaoheng Tan, Haiqiang Wang, Xiaozhong Xu , et al. (11 additional authors not shown)

    Abstract: Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  25. arXiv:2408.11540  [pdf, other

    cs.CV

    DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments

    Authors: Shuhong Liu, Xiang Chen, Hongming Chen, Quanfeng Xu, Mingrui Li

    Abstract: Reconstruction under adverse rainy conditions poses significant challenges due to reduced visibility and the distortion of visual perception. These conditions can severely impair the quality of geometric maps, which is essential for applications ranging from autonomous planning to environmental monitoring. In response to these challenges, this study introduces the novel task of 3D Reconstruction i… ▽ More

    Submitted 21 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  26. arXiv:2408.11286  [pdf, ps, other

    cs.CV

    Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model

    Authors: Mengying Ge, Dongkai Tang, Mingyang Li

    Abstract: Multimodal emotion recognition is a task of great concern. However, traditional data sets are based on fixed labels, resulting in models that often focus on main emotions and ignore detailed emotional changes in complex scenes. This report introduces the solution of using MLLMs technology to generate open-vocabulary emotion labels from a video. The solution includes the use of framework, data gene… ▽ More

    Submitted 21 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  27. arXiv:2408.10500  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

    Authors: Zebang Cheng, Shuyuan Tu, Dawei Huang, Minghan Li, Xiaojiang Peng, Zhi-Qi Cheng, Alexander G. Hauptmann

    Abstract: This paper presents our winning approach for the MER-NOISE and MER-OV tracks of the MER2024 Challenge on multimodal emotion recognition. Our system leverages the advanced emotional understanding capabilities of Emotion-LLaMA to generate high-quality annotations for unlabeled samples, addressing the challenge of limited labeled data. To enhance multimodal fusion while mitigating modality-specific n… ▽ More

    Submitted 21 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: Ranked 1st in MER24@IJCAI and MRAC24@ACM MM (MER-NOISE & MER-OV (self-evaluated))

  28. AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

    Authors: Shuzhang Zhong, Ling Liang, Yuan Wang, Runsheng Wang, Ru Huang, Meng Li

    Abstract: Mixture-of-Experts (MoE) models are designed to enhance the efficiency of large language models (LLMs) without proportionally increasing the computational demands. However, their deployment on edge devices still faces significant challenges due to high on-demand loading overheads from managing sparsely activated experts. This paper introduces AdapMoE, an algorithm-system co-design framework for ef… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  29. arXiv:2408.09701  [pdf, other

    cs.CL

    Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer

    Authors: Mingda Li, Abhijit Mishra, Utkarsh Mujumdar

    Abstract: The use of Large Language Models (LLMs) for program code generation has gained substantial attention, but their biases and limitations with non-English prompts challenge global inclusivity. This paper investigates the complexities of multilingual prompt-based code generation. Our evaluations of LLMs, including CodeLLaMa and CodeGemma, reveal significant disparities in code quality for non-English… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Under Review

    MSC Class: 68T50 (Primary) 68T07 (Secondary)

  30. arXiv:2408.09615  [pdf, other

    cs.CV

    The First Competition on Resource-Limited Infrared Small Target Detection Challenge: Methods and Results

    Authors: Boyang Li, Xinyi Ying, Ruojing Li, Yongxian Liu, Yangsi Shi, Miao Li

    Abstract: In this paper, we briefly summarize the first competition on resource-limited infrared small target detection (namely, LimitIRSTD). This competition has two tracks, including weakly-supervised infrared small target detection (Track 1) and lightweight infrared small target detection (Track 2). 46 and 60 teams successfully registered and took part in Tracks 1 and Track 2, respectively. The top-perfo… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  31. arXiv:2408.09395  [pdf, other

    cs.CV

    OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

    Authors: Yang Li, Jianing Deng, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Xingtao Zhou, Catherine C. Liu, Bo Fu

    Abstract: Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging and joint modeling of multiple discrete and continuous clinical scores presents a promising new paradigm for multi-task problems in Ophthalmology. The bi-channel framework that arises from the Ophthalmic phenomenon of ``interocular asymmetries'' of both eyes (OU) calls for new employment on the SOTA transformer-based models.… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  32. arXiv:2408.09122  [pdf, other

    cs.CV

    MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation

    Authors: Xiao Zhao, Xukun Zhang, Dingkang Yang, Mingyang Sun, Mingcheng Li, Shunli Wang, Lihua Zhang

    Abstract: Accurate and robust multimodal multi-task perception is crucial for modern autonomous driving systems. However, current multimodal perception research follows independent paradigms designed for specific perception tasks, leading to a lack of complementary learning among tasks and decreased performance in multi-task learning (MTL) due to joint training. In this paper, we propose MaskBEV, a masked a… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  33. HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

    Authors: Xiao Zhao, Bo Chen, Mingyang Sun, Dingkang Yang, Youxing Wang, Xukun Zhang, Mingcheng Li, Dongliang Kou, Xiaoyi Wei, Lihua Zhang

    Abstract: Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE RAL

  34. arXiv:2408.08703  [pdf, other

    cs.CV

    TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

    Authors: Miaoge Li, Jingcai Guo, Richard Yi Da Xu, Dongsheng Wang, Xiaofeng Cao, Song Guo

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize novel \textit{state-object} compositions by leveraging the shared knowledge of their primitive components. Despite considerable progress, effectively calibrating the bias between semantically similar multimodal representations, as well as generalizing pre-trained knowledge to novel compositional contexts, remains an enduring challenge. In t… ▽ More

    Submitted 22 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures

  35. arXiv:2408.08493  [pdf, other

    cs.LG stat.ML

    Fishers Harvest Parallel Unlearning in Inherited Model Networks

    Authors: Xiao Liu, Mingyuan Li, Xu Wang, Guangsheng Yu, Wei Ni, Lixiang Li, Haipeng Peng, Renping Liu

    Abstract: Unlearning in various learning frameworks remains challenging, with the continuous growth and updates of models exhibiting complex inheritance relationships. This paper presents a novel unlearning framework, which enables fully parallel unlearning among models exhibiting inheritance. A key enabler is the new Unified Model Inheritance Graph (UMIG), which captures the inheritance using a Directed Ac… ▽ More

    Submitted 20 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  36. arXiv:2408.06679  [pdf, other

    cs.LG q-fin.ST stat.ML

    Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals

    Authors: Gregory Yampolsky, Dhruv Desai, Mingshu Li, Stefano Pasquali, Dhagash Mehta

    Abstract: The explainability of black-box machine learning algorithms, commonly known as Explainable Artificial Intelligence (XAI), has become crucial for financial and other regulated industrial applications due to regulatory requirements and the need for transparency in business practices. Among the various paradigms of XAI, Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that e… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures, 5 tables

  37. arXiv:2408.06666  [pdf, ps, other

    cs.RO eess.SY

    Design of a Double-joint Robotic Fish Using a Composite Linkage

    Authors: Ruijia Zhang, Wenke Zhou, Min Li, Miao Li

    Abstract: Robotic fish is one of the most promising directions of the new generation of underwater vehicles. Traditional biomimetic fish often mimic fish joints using tandem components like servos, which leads to increased volume, weight and control complexity. In this paper, a new double-joint robotic fish using a composite linkage was designed, where the propulsion mechanism transforms the single-degree-o… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  38. arXiv:2408.05715  [pdf, other

    cs.AI cs.SE

    Top Pass: Improve Code Generation by Pass@k-Maximized Code Ranking

    Authors: Zhi-Cun Lyu, Xin-Ye Li, Zheng Xie, Ming Li

    Abstract: Code generation has been greatly enhanced by the profound advancements in Large Language Models (LLMs) recently. Nevertheless, such LLM-based code generation approaches still struggle to generate error-free code in a few tries when faced with complex problems. To address this, the prevailing strategy is to sample a huge number of candidate programs, with the hope of any one in them could work. How… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by Frontier of Computer Science

  39. arXiv:2408.05363  [pdf, other

    cs.CV

    AyE-Edge: Automated Deployment Space Search Empowering Accuracy yet Efficient Real-Time Object Detection on the Edge

    Authors: Chao Wu, Yifan Gong, Liangkai Liu, Mengquan Li, Yushu Wu, Xuan Shen, Zhimin Li, Geng Yuan, Weisong Shi, Yanzhi Wang

    Abstract: Object detection on the edge (Edge-OD) is in growing demand thanks to its ever-broad application prospects. However, the development of this field is rigorously restricted by the deployment dilemma of simultaneously achieving high accuracy, excellent power efficiency, and meeting strict real-time requirements. To tackle this dilemma, we propose AyE-Edge, the first-of-this-kind development tool tha… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

  40. arXiv:2408.05019  [pdf, other

    cs.CV

    Instruction Tuning-free Visual Token Complement for Multimodal LLMs

    Authors: Dongsheng Wang, Jiequan Cui, Miaoge Li, Wang Lin, Bo Chen, Hanwang Zhang

    Abstract: As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality instruction pairs and the loss of visual information in image-to-text training objectives. To this end, we propose a Visual Token Complement framework (… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024 (20pages)

  41. arXiv:2408.04836  [pdf, other

    cs.CG cs.DC

    Distributed Augmentation, Hypersweeps, and Branch Decomposition of Contour Trees for Scientific Exploration

    Authors: Mingzhe Li, Hamish Carr, Oliver Rübel, Bei Wang, Gunther H. Weber

    Abstract: Contour trees describe the topology of level sets in scalar fields and are widely used in topological data analysis and visualization. A main challenge of utilizing contour trees for large-scale scientific data is their computation at scale using high-performance computing. To address this challenge, recent work has introduced distributed hierarchical contour trees for distributed computation and… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  42. arXiv:2408.04823  [pdf, other

    cs.CV

    One Shot is Enough for Sequential Infrared Small Target Segmentation

    Authors: Bingbing Dan, Meihui Li, Tao Tang, Jing Zhang

    Abstract: Infrared small target sequences exhibit strong similarities between frames and contain rich contextual information, which motivates us to achieve sequential infrared small target segmentation with minimal data. Inspired by the success of large segmentation models led by Segment Anything Model (SAM) across various downstream tasks, we propose a one-shot and training-free method that perfectly adapt… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  43. arXiv:2408.04682  [pdf, other

    cs.CL cs.AI cs.LG

    ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

    Authors: Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang

    Abstract: Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  44. arXiv:2408.03977  [pdf, other

    cs.LG cs.AI

    Learning from Noisy Labels for Long-tailed Data via Optimal Transport

    Authors: Mengting Li, Chuang Zhu

    Abstract: Noisy labels, which are common in real-world datasets, can significantly impair the training of deep learning models. However, recent adversarial noise-combating methods overlook the long-tailed distribution of real data, which can significantly harm the effect of denoising strategies. Meanwhile, the mismanagement of noisy labels further compromises the model's ability to handle long-tailed data.… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  45. arXiv:2408.03545  [pdf, other

    cs.CV

    CLIP-based Point Cloud Classification via Point Cloud to Image Translation

    Authors: Shuvozit Ghose, Manyi Li, Yiming Qian, Yang Wang

    Abstract: Point cloud understanding is an inherently challenging problem because of the sparse and unordered structure of the point cloud in the 3D space. Recently, Contrastive Vision-Language Pre-training (CLIP) based point cloud classification model i.e. PointCLIP has added a new direction in the point cloud classification research domain. In this method, at first multi-view depth maps are extracted from… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by ICPR2024

  46. arXiv:2408.02963  [pdf, other

    cs.SE

    Adversarial Robustness of Open-source Text Classification Models and Fine-Tuning Chains

    Authors: Hao Qin, Mingyang Li, Junjie Wang, Qing Wang

    Abstract: Context:With the advancement of artificial intelligence (AI) technology and applications, numerous AI models have been developed, leading to the emergence of open-source model hosting platforms like Hugging Face (HF). Thanks to these platforms, individuals can directly download and use models, as well as fine-tune them to construct more domain-specific models. However, just like traditional softwa… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  47. arXiv:2408.02854  [pdf

    cs.IR

    Wiping out the limitations of Large Language Models -- A Taxonomy for Retrieval Augmented Generation

    Authors: Mahei Manhai Li, Irina Nikishina, Özge Sevgili, Martin Semmann

    Abstract: Current research on RAGs is distributed across various disciplines, and since the technology is evolving very quickly, its unit of analysis is mostly on technological innovations, rather than applications in business contexts. Thus, in this research, we aim to create a taxonomy to conceptualize a comprehensive overview of the constituting characteristics that define RAG applications, facilitating… ▽ More

    Submitted 12 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  48. arXiv:2408.02355  [pdf, other

    stat.ML cs.LG q-fin.ST q-fin.TR

    Quantile Regression using Random Forest Proximities

    Authors: Mingshu Li, Bhaskarjit Sarmah, Dhruv Desai, Joshua Rosaler, Snigdha Bhagat, Philip Sommer, Dhagash Mehta

    Abstract: Due to the dynamic nature of financial markets, maintaining models that produce precise predictions over time is difficult. Often the goal isn't just point prediction but determining uncertainty. Quantifying uncertainty, especially the aleatoric uncertainty due to the unpredictable nature of market drivers, helps investors understand varying risk levels. Recently, quantile regression forests (QRF)… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 figures, 3 tables

  49. arXiv:2408.02024  [pdf, other

    cs.CV

    Faster Diffusion Action Segmentation

    Authors: Shuaibing Wang, Shunli Wang, Mingcheng Li, Dingkang Yang, Haopeng Kuang, Ziyun Qian, Lihua Zhang

    Abstract: Temporal Action Segmentation (TAS) is an essential task in video analysis, aiming to segment and classify continuous frames into distinct action segments. However, the ambiguous boundaries between actions pose a significant challenge for high-precision segmentation. Recent advances in diffusion models have demonstrated substantial success in TAS tasks due to their stable training process and high-… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 25 pages, 6 figures

  50. arXiv:2408.02019  [pdf, other

    cs.LG

    Personalized Federated Learning on Heterogeneous and Long-Tailed Data via Expert Collaborative Learning

    Authors: Fengling Lv, Xinyi Shang, Yang Zhou, Yiqun Zhang, Mengke Li, Yang Lu

    Abstract: Personalized Federated Learning (PFL) aims to acquire customized models for each client without disclosing raw data by leveraging the collective knowledge of distributed clients. However, the data collected in real-world scenarios is likely to follow a long-tailed distribution. For example, in the medical domain, it is more common for the number of general health notes to be much larger than those… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.