Skip to main content

Showing 1–50 of 2,411 results for author: Li, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03431  [pdf, other

    cs.CV

    UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images

    Authors: Lulin Li, Ben Chen, Xuechao Zou, Junliang Xing, Pin Tao

    Abstract: Owing to the diverse geographical environments, intricate landscapes, and high-density settlements, the automatic identification of urban village boundaries using remote sensing images is a highly challenging task. This paper proposes a novel and efficient neural network model called UV-Mamba for accurate boundary detection in high-resolution remote sensing images. UV-Mamba mitigates the memory lo… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures, 2 tables

  2. arXiv:2409.03236  [pdf, other

    cs.CV

    Unveiling Context-Related Anomalies: Knowledge Graph Empowered Decoupling of Scene and Action for Human-Related Video Anomaly Detection

    Authors: Chenglizhao Chen, Xinyu Liu, Mengke Song, Luming Li, Xu Yu, Shanchen Pang

    Abstract: Detecting anomalies in human-related videos is crucial for surveillance applications. Current methods primarily include appearance-based and action-based techniques. Appearance-based methods rely on low-level visual features such as color, texture, and shape. They learn a large number of pixel patterns and features related to known scenes during training, making them effective in detecting anomali… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 13pages, 9 figures

  3. arXiv:2409.03213  [pdf, other

    cs.CV

    Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

    Authors: Shen Chen, Jiale Zhou, Lei Li

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a promising approach for 3D scene representation, offering a reduction in computational overhead compared to Neural Radiance Fields (NeRF). However, 3DGS is susceptible to high-frequency artifacts and demonstrates suboptimal performance under sparse viewpoint conditions, thereby limiting its applicability in robotics and computer vision. To address these… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.03192  [pdf, other

    cs.CV

    PEPL: Precision-Enhanced Pseudo-Labeling for Fine-Grained Image Classification in Semi-Supervised Learning

    Authors: Bowen Tian, Songning Lai, Lujundong Li, Zhihao Shuai, Runwei Guan, Tian Wu, Yutao Yue

    Abstract: Fine-grained image classification has witnessed significant advancements with the advent of deep learning and computer vision technologies. However, the scarcity of detailed annotations remains a major challenge, especially in scenarios where obtaining high-quality labeled data is costly or time-consuming. To address this limitation, we introduce Precision-Enhanced Pseudo-Labeling(PEPL) approach s… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Under review

  5. arXiv:2409.02465  [pdf, other

    cs.CL

    DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels

    Authors: Zhe Xu, Jiasheng Ye, Xiangyang Liu, Tianxiang Sun, Xiaoran Liu, Qipeng Guo, Linlin Li, Qun Liu, Xuanjing Huang, Xipeng Qiu

    Abstract: With the rapid advancement of Large Language Models (LLMs), long-context information understanding and processing have become a hot topic in academia and industry. However, benchmarks for evaluating the ability of LLMs to handle long-context information do not seem to have kept pace with the development of LLMs. Despite the emergence of various long-context evaluation benchmarks, the types of capa… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  6. arXiv:2409.02388  [pdf, other

    cs.IT cs.LG

    Gaussian Rate-Distortion-Perception Coding and Entropy-Constrained Scalar Quantization

    Authors: Li Xie, Liangyan Li, Jun Chen, Lei Yu, Zhongshan Zhang

    Abstract: This paper investigates the best known bounds on the quadratic Gaussian distortion-rate-perception function with limited common randomness for the Kullback-Leibler divergence-based perception measure, as well as their counterparts for the squared Wasserstein-2 distance-based perception measure, recently established by Xie et al. These bounds are shown to be nondegenerate in the sense that they can… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  7. arXiv:2409.02050  [pdf, other

    cs.CL cs.SD eess.AS

    Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model

    Authors: Hukai Huang, Jiayan Lin, Kaidi Wang, Yishuang Li, Wenhao Guan, Lin Li, Qingyang Hong

    Abstract: Due to the inherent difficulty in modeling phonetic similarities across different languages, code-switching speech recognition presents a formidable challenge. This study proposes a Collaborative-MoE, a Mixture of Experts (MoE) model that leverages a collaborative mechanism among expert groups. Initially, a preceding routing network explicitly learns Language Identification (LID) tasks and selects… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by IEEE SLT 2024

  8. arXiv:2409.01707  [pdf, other

    quant-ph cs.DC

    Quantum Byzantine Agreement Against Full-information Adversary

    Authors: Longcheng Li, Xiaoming Sun, Jiadong Zhu

    Abstract: We exhibit that, when given a classical Byzantine agreement protocol designed in the private-channel model, it is feasible to construct a quantum agreement protocol that can effectively handle a full-information adversary. Notably, both protocols have equivalent levels of resilience, round complexity, and communication complexity. In the classical private-channel scenario, participating players ar… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 26 pages. This is the extended version of the paper presented at DISC2024. It includes extra results in Appendix D that were not included in the conference submission due to page constraints

  9. arXiv:2409.01327  [pdf, other

    cs.CV

    SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation

    Authors: Yang Zhang, Rui Zhang, Xuecheng Nie, Haochen Li, Jikun Chen, Yifan Hao, Xin Zhang, Luoqi Liu, Ling Li

    Abstract: Recent text-to-image models have achieved remarkable success in generating high-quality images. However, when tasked with multi-concept generation which creates images containing multiple characters or objects, existing methods often suffer from attribute confusion, resulting in severe text-image inconsistency. We found that attribute confusion occurs when a certain region of the latent features a… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  10. arXiv:2409.01021   

    cs.CV

    CONDA: Condensed Deep Association Learning for Co-Salient Object Detection

    Authors: Long Li, Nian Liu, Dingwen Zhang, Zhongyu Li, Salman Khan, Rao Anwer, Hisham Cholakkal, Junwei Han, Fahad Shahbaz Khan

    Abstract: Inter-image association modeling is crucial for co-salient object detection. Despite satisfactory performance, previous methods still have limitations on sufficient inter-image association modeling. Because most of them focus on image feature optimization under the guidance of heuristically calculated raw inter-image associations. They directly rely on raw associations which are not reliable in co… ▽ More

    Submitted 4 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: There is an error. In Sec 4.1, the number of images in some dataset is incorrect and needs to be revised

    Journal ref: ECCV2024

  11. arXiv:2409.00947  [pdf, other

    cs.CV cs.AI

    XNet v2: Fewer Limitations, Better Results and Greater Universality

    Authors: Yanfeng Zhou, Lingrui Li, Zichen Wang, Guole Liu, Ziwen Liu, Ge Yang

    Abstract: XNet introduces a wavelet-based X-shaped unified architecture for fully- and semi-supervised biomedical segmentation. So far, however, XNet still faces the limitations, including performance degradation when images lack high-frequency (HF) information, underutilization of raw images and insufficient fusion. To address these issues, we propose XNet v2, a low- and high-frequency complementary model.… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  12. arXiv:2408.17084  [pdf

    cs.DL

    Evaluating the Accuracy of the Labeling System in Web of Science for the Sustainable Development Goals

    Authors: Yu Zhao, Li Li, Zhesi Shen

    Abstract: Monitoring and fostering research aligned with the Sustainable Development Goals (SDGs) is crucial for formulating evidence-based policies, identifying best practices, and promoting global collaboration. The key step is developing a labeling system to map research publications to their related SDGs. The SDGs labeling system integrated in Web of Science (WoS), which assigns citation topics instead… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  13. arXiv:2408.16198  [pdf, other

    cs.SE

    Chain-of-Experts (CoE): Reverse Engineering Software Bills of Materials for JavaScript Application Bundles through Code Clone Search

    Authors: Leo Song, Steven H. H. Ding, Yuan Tian, Li Tao Li, Philippe Charland, Andrew Walenstein

    Abstract: A Software Bill of Materials (SBoM) is a detailed inventory of all components, libraries, and modules in a software artifact, providing traceability throughout the software supply chain. With the increasing popularity of JavaScript in software engineering due to its dynamic syntax and seamless supply chain integration, the exposure to vulnerabilities and attacks has risen significantly. A JavaScri… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  14. arXiv:2408.14975  [pdf, other

    cs.CV

    MegActor-$Σ$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

    Authors: Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan, Jin Wang

    Abstract: Diffusion models have demonstrated superior performance in the field of portrait animation. However, current approaches relied on either visual or audio modality to control character movements, failing to exploit the potential of mixed-modal control. This challenge arises from the difficulty in balancing the weak control strength of audio modality and the strong control strength of visual modality… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  15. arXiv:2408.14342  [pdf, other

    cs.CV physics.med-ph

    Dual-Domain CLIP-Assisted Residual Optimization Perception Model for Metal Artifact Reduction

    Authors: Xinrui Zhang, Ailong Cai, Shaoyu Wang, Linyuan Wang, Zhizhong Zheng, Lei Li, Bin Yan

    Abstract: Metal artifacts in computed tomography (CT) imaging pose significant challenges to accurate clinical diagnosis. The presence of high-density metallic implants results in artifacts that deteriorate image quality, manifesting in the forms of streaking, blurring, or beam hardening effects, etc. Nowadays, various deep learning-based approaches, particularly generative models, have been proposed for me… ▽ More

    Submitted 29 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 14 pages, 18 figures

  16. arXiv:2408.13945  [pdf, other

    eess.IV cs.CV physics.med-ph

    Personalized Topology-Informed 12-Lead ECG Electrode Localization from Incomplete Cardiac MRIs for Efficient Cardiac Digital Twins

    Authors: Lei Li, Hannah Smith, Yilin Lyu, Julia Camps, Blanca Rodriguez, Abhirup Banerjee, Vicente Grau

    Abstract: Cardiac digital twins (CDTs) offer personalized \textit{in-silico} cardiac representations for the inference of multi-scale properties tied to cardiac mechanisms. The creation of CDTs requires precise information about the electrode position on the torso, especially for the personalized electrocardiogram (ECG) calibration. However, current studies commonly rely on additional acquisition of torso i… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 12 pages

  17. arXiv:2408.13902  [pdf, other

    cs.CV cs.LG cs.RO

    TraIL-Det: Transformation-Invariant Local Feature Networks for 3D LiDAR Object Detection with Unsupervised Pre-Training

    Authors: Li Li, Tanqiu Qiao, Hubert P. H. Shum, Toby P. Breckon

    Abstract: 3D point clouds are essential for perceiving outdoor scenes, especially within the realm of autonomous driving. Recent advances in 3D LiDAR Object Detection focus primarily on the spatial positioning and distribution of points to ensure accurate detection. However, despite their robust performance in variable conditions, these methods are hindered by their sole reliance on coordinates and point in… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: BMVC 2024; 15 pages, 3 figures, 3 tables; Code at https://github.com/l1997i/rapid_seg

    Journal ref: Brit. Mach. Vis. Conf. (BMVC 2024)

  18. arXiv:2408.13852  [pdf, other

    cs.CV

    LaneTCA: Enhancing Video Lane Detection with Temporal Context Aggregation

    Authors: Keyi Zhou, Li Li, Wengang Zhou, Yonghui Wang, Hao Feng, Houqiang Li

    Abstract: In video lane detection, there are rich temporal contexts among successive frames, which is under-explored in existing lane detectors. In this work, we propose LaneTCA to bridge the individual video frames and explore how to effectively aggregate the temporal context. Technically, we develop an accumulative attention module and an adjacent attention module to abstract the long-term and short-term… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  19. arXiv:2408.13832  [pdf, other

    eess.IV cs.CV

    A Low-dose CT Reconstruction Network Based on TV-regularized OSEM Algorithm

    Authors: Ran An, Yinghui Zhang, Xi Chen, Lemeng Li, Ke Chen, Hongwei Li

    Abstract: Low-dose computed tomography (LDCT) offers significant advantages in reducing the potential harm to human bodies. However, reducing the X-ray dose in CT scanning often leads to severe noise and artifacts in the reconstructed images, which might adversely affect diagnosis. By utilizing the expectation maximization (EM) algorithm, statistical priors could be combined with artificial priors to improv… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 11 pages, 8 figures

    ACM Class: I.4.5

  20. arXiv:2408.13487  [pdf, ps, other

    cs.LO eess.SY math.OC

    Towards Automatic Linearization via SMT Solving

    Authors: Jian Cao, Liyong Lin, Lele Li

    Abstract: Mathematical optimization is ubiquitous in modern applications. However, in practice, we often need to use nonlinear optimization models, for which the existing optimization tools such as Cplex or Gurobi may not be directly applicable and an (error-prone) manual transformation often has to be done. Thus, to address this issue, in this paper we investigate the problem of automatically verifying and… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 4 pages, conference

  21. arXiv:2408.13455  [pdf

    cs.DL

    Comparison of Sustainable Development Goals Labeling Systems based on Topic Coverage

    Authors: Li Li, Yu Zhao, Zhesi Shen

    Abstract: With the growing importance of sustainable development goals (SDGs), various labeling systems have emerged for effective monitoring and evaluation. This study assesses six labeling systems across 1.85 million documents at both paper level and topic level. Our findings indicate that the SDGO and SDSN systems are more aggressive, while systems such as Auckland, Aurora, SIRIS, and Elsevier exhibit si… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 17 pages, 6 figures

  22. arXiv:2408.13138  [pdf, ps, other

    cs.CR

    Tamgram: A Frontend for Large-scale Protocol Modeling in Tamarin

    Authors: Di Long Li, Jim de Groot, Alwen Tiu

    Abstract: Automated security protocol verifiers such as ProVerif and Tamarin have been increasingly applied to verify large scale complex real-world protocols. While their ability to automate difficult reasoning processes required to handle protocols at that scale is impressive, there remains a gap in the modeling languages used. In particular, providing support for writing and maintaining large protocol sp… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  23. arXiv:2408.12948  [pdf, other

    cs.SE

    E-code: Mastering Efficient Code Generation through Pretrained Models and Expert Encoder Group

    Authors: Yue Pan, Chen Lyu, Zhenyu Yang, Lantian Li, Qi Liu, Xiuting Shao

    Abstract: Context: With the waning of Moore's Law, the software industry is placing increasing importance on finding alternative solutions for continuous performance enhancement. The significance and research results of software performance optimization have been on the rise in recent years, especially with the advancement propelled by Large Language Models(LLMs). However, traditional strategies for rectify… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  24. arXiv:2408.12665  [pdf, ps, other

    cs.LG cs.AI cs.GR

    Fairness-Aware Streaming Feature Selection with Causal Graphs

    Authors: Leizhen Zhang, Lusi Li, Di Wu, Sheng Chen, Yi He

    Abstract: Its crux lies in the optimization of a tradeoff between accuracy and fairness of resultant models on the selected feature subset. The technical challenge of our setting is twofold: 1) streaming feature inputs, such that an informative feature may become obsolete or redundant for prediction if its information has been covered by other similar features that arrived prior to it, and 2) non-associatio… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by the 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2024)

  25. arXiv:2408.11962  [pdf

    cs.SI cs.CL

    Characterizing Online Toxicity During the 2022 Mpox Outbreak: A Computational Analysis of Topical and Network Dynamics

    Authors: Lizhou Fan, Lingyao Li, Libby Hemphill

    Abstract: Background: Online toxicity, encompassing behaviors such as harassment, bullying, hate speech, and the dissemination of misinformation, has become a pressing social concern in the digital age. The 2022 Mpox outbreak, initially termed "Monkeypox" but subsequently renamed to mitigate associated stigmas and societal concerns, serves as a poignant backdrop to this issue. Objective: In this research, w… ▽ More

    Submitted 31 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 36 pages, 8 figure, and 12 tables

  26. arXiv:2408.10880  [pdf, other

    cs.CV

    Open 3D World in Autonomous Driving

    Authors: Xinlong Cheng, Lei Li

    Abstract: The capability for open vocabulary perception represents a significant advancement in autonomous driving systems, facilitating the comprehension and interpretation of a wide array of textual inputs in real-time. Despite extensive research in open vocabulary tasks within 2D computer vision, the application of such methodologies to 3D environments, particularly within large-scale outdoor contexts, r… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  27. arXiv:2408.10826  [pdf, other

    cs.DC

    NeuLite: Memory-Efficient Federated Learning via Elastic Progressive Training

    Authors: Yebo Wu, Li Li, Chunlin Tian, Dubing Chen, Chengzhong Xu

    Abstract: Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while preserving data privacy. However, intensive memory footprint during the training process severely bottlenecks the deployment of FL on resource-constrained devices in real-world cases. In this paper, we propose NeuLite, a framework that breaks the memory wall throug… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  28. arXiv:2408.10469  [pdf, other

    cs.CV cs.IR

    LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS

    Authors: Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, Lingling Li

    Abstract: Video Object Segmentation (VOS) presents several challenges, including object occlusion and fragmentation, the dis-appearance and re-appearance of objects, and tracking specific objects within crowded scenes. In this work, we combine the strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to address these challenges. Additionally, we explore the impact of various hyperparameters on vide… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2406.03668

  29. arXiv:2408.10280  [pdf, other

    cs.LG

    NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

    Authors: Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo

    Abstract: In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition… ▽ More

    Submitted 27 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Work in progress, revisions ongoing

  30. arXiv:2408.09430  [pdf, other

    cs.CL cs.AI

    FASST: Fast LLM-based Simultaneous Speech Translation

    Authors: Siqi Ouyang, Xi Xu, Chinmay Dandekar, Lei Li

    Abstract: Simultaneous speech translation (SST) takes streaming speech input and generates text translation on the fly. Existing methods either have high latency due to recomputation of input representations, or fall behind of offline ST in translation quality. In this paper, we propose FASST, a fast large language model based method for streaming speech translation. We propose blockwise-causal speech encod… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  31. arXiv:2408.09101  [pdf, other

    cs.DC

    Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing

    Authors: Wu Yebo, Li Li, Tian Chunlin, Chang Tao, Lin Chi, Wang Cong, Xu Cheng-Zhong

    Abstract: In this paper, we propose SmartFreeze, a framework that effectively reduces the memory footprint by conducting the training in a progressive manner. Instead of updating the full model in each training round, SmartFreeze divides the shared model into blocks consisting of a specified number of layers. It first trains the front block with a well-designed output module, safely freezes it after converg… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Published as a conference paper at IWQoS 2024

  32. The Unique Citing Documents Journal Impact Factor (Uniq-JIF) as a Supplement for the standard Journal Impact Factor

    Authors: Zhesi Shen, Li Li, Yu Liao

    Abstract: This paper introduces the Unique Citing Documents Journal Impact Factor(Uniq-JIF) as a supplement to the traditional Journal Impact Factor(JIF). The Uniq-JIF counts each citing document only once, aiming to reduce the effects of citation manipulations. Analysis of 2023 Journal Citation Reports data shows that for most journals, the Uniq-JIF is less than 20% lower than the JIF, though some journals… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 2 figures

    Journal ref: Journal of Data and Information Science, 9(3), 1-3 (2024)

  33. arXiv:2408.08822  [pdf, ps, other

    cs.CV

    PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future

    Authors: Guangyi Wang, Yuren Cai, Lijiang Li, Wei Peng, Songzhi Su

    Abstract: Diffusion Probabilistic Models (DPMs) have shown remarkable potential in image generation, but their sampling efficiency is hindered by the need for numerous denoising steps. Most existing solutions accelerate the sampling process by proposing fast ODE solvers. However, the inevitable discretization errors of the ODE solvers are significantly magnified when the number of function evaluations (NFE)… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  34. arXiv:2408.08754  [pdf, other

    cs.LG

    SE-SGformer: A Self-Explainable Signed Graph Transformer for Link Sign Prediction

    Authors: Lu Li, Jiale Liu, Xingyu Ji, Maojun Wang, Zeyu Zhang

    Abstract: Signed Graph Neural Networks (SGNNs) have been shown to be effective in analyzing complex patterns in real-world situations where positive and negative links coexist. However, SGNN models suffer from poor explainability, which limit their adoptions in critical scenarios that require understanding the rationale behind predictions. To the best of our knowledge, there is currently no research work on… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  35. arXiv:2408.08604  [pdf, other

    cs.CV

    Bi-Directional Deep Contextual Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Shiqi Wang

    Abstract: Deep video compression has made remarkable process in recent years, with the majority of advancements concentrated on P-frame coding. Although efforts to enhance B-frame coding are ongoing, their compression performance is still far behind that of traditional bi-directional video codecs. In this paper, we introduce a bi-directional deep contextual video compression scheme tailored for B-frames, te… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  36. arXiv:2408.08493  [pdf, other

    cs.LG stat.ML

    Fishers Harvest Parallel Unlearning in Inherited Model Networks

    Authors: Xiao Liu, Mingyuan Li, Xu Wang, Guangsheng Yu, Wei Ni, Lixiang Li, Haipeng Peng, Renping Liu

    Abstract: Unlearning in various learning frameworks remains challenging, with the continuous growth and updates of models exhibiting complex inheritance relationships. This paper presents a novel unlearning framework, which enables fully parallel unlearning among models exhibiting inheritance. A key enabler is the new Unified Model Inheritance Graph (UMIG), which captures the inheritance using a Directed Ac… ▽ More

    Submitted 20 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  37. arXiv:2408.07986  [pdf, other

    cs.LG

    Experimental evaluation of offline reinforcement learning for HVAC control in buildings

    Authors: Jun Wang, Linyan Li, Qi Liu, Yu Yang

    Abstract: Reinforcement learning (RL) techniques have been increasingly investigated for dynamic HVAC control in buildings. However, most studies focus on exploring solutions in online or off-policy scenarios without discussing in detail the implementation feasibility or effectiveness of dealing with purely offline datasets or trajectories. The lack of these works limits the real-world deployment of RL-base… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  38. arXiv:2408.07471  [pdf, other

    cs.CL

    Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

    Authors: Yuxin Jiang, Bo Huang, Yufei Wang, Xingshan Zeng, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Wei Wang

    Abstract: Direct preference optimization (DPO), a widely adopted offline preference optimization algorithm, aims to align large language models (LLMs) with human-desired behaviors using pairwise preference data. However, the winning response and the losing response within pairwise data are generated isolatedly, leading to weak correlations between them as well as suboptimal alignment performance. To address… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 18 pages, 8 figures, 8 tables, working in progress

  39. arXiv:2408.07452  [pdf, other

    cs.CL cs.AI

    CMU's IWSLT 2024 Simultaneous Speech Translation System

    Authors: Xi Xu, Siqi Ouyang, Brian Yan, Patrick Fernandes, William Chen, Lei Li, Graham Neubig, Shinji Watanabe

    Abstract: This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner. Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder. We employ a two-stage training approach: initially, we align the representations of spee… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  40. InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

    Authors: Bo-Wen Zhang, Yan Yan, Lin Li, Guang Liu

    Abstract: Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challen… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by CIKM 2024

    ACM Class: I.2.7

  41. arXiv:2408.07060  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

    Authors: Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong

    Abstract: Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agent… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  42. arXiv:2408.06969  [pdf, ps, other

    cs.NI cs.LG

    IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization

    Authors: Guanchang Li, Wensheng Lin, Lixin Li, Yixuan He, Fucheng Yang, Zhu Han

    Abstract: This paper focuses on an intelligent reflecting surface (IRS)-assisted lossy communication system with correlated Rayleigh fading. We analyze the correlated channel model and derive the outage probability of the system. Then, we design a deep reinforce learning (DRL) method to optimize the phase shift of IRS, in order to maximize the received signal power. Moreover, this paper presents results of… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  43. arXiv:2408.05112  [pdf, other

    cs.LG cs.AI eess.IV

    Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

    Authors: Kexin Zhang, Lixin Li, Wensheng Lin, Yuna Yan, Rui Li, Wenchi Cheng, Zhu Han

    Abstract: Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scen… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  44. arXiv:2408.04905  [pdf, other

    cs.CL cs.AI

    GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models

    Authors: Zhibo Zhang, Wuxia Bai, Yuxi Li, Mark Huasong Meng, Kailong Wang, Ling Shi, Li Li, Jun Wang, Haoyu Wang

    Abstract: Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in th… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  45. arXiv:2408.04194  [pdf, other

    cs.SE cs.CR

    FDI: Attack Neural Code Generation Systems through User Feedback Channel

    Authors: Zhensu Sun, Xiaoning Du, Xiapu Luo, Fu Song, David Lo, Li Li

    Abstract: Neural code generation systems have recently attracted increasing attention to improve developer productivity and speed up software development. Typically, these systems maintain a pre-trained neural model and make it available to general users as a service (e.g., through remote APIs) and incorporate a feedback mechanism to extensively collect and utilize the users' reaction to the generated code,… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by ISSTA'24

  46. arXiv:2408.04163  [pdf, other

    cs.SI

    Academic collaboration on large language model studies increases overall but varies across disciplines

    Authors: Lingyao Li, Ly Dinh, Songhua Hu, Libby Hemphill

    Abstract: Interdisciplinary collaboration is crucial for addressing complex scientific challenges. Recent advancements in large language models (LLMs) have shown significant potential in benefiting researchers across various fields. To explore the application of LLMs in scientific disciplines and their implications for interdisciplinary collaboration, we collect and analyze 50,391 papers from OpenAlex, an o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  47. arXiv:2408.04158  [pdf, other

    eess.IV cs.CV

    Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

    Authors: Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li

    Abstract: Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient S… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  48. arXiv:2408.03703  [pdf, other

    cs.CV

    CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications

    Authors: Tianfang Zhang, Lei Li, Yang Zhou, Wentao Liu, Chen Qian, Xiangyang Ji

    Abstract: Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability. However, the pairwise token affinity and complex matrix operations limit its deployment on resource-constrained scenarios and real-time applications, such as mobile devices, although considerable efforts have been made in previous works. In this paper, we introduc… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  49. arXiv:2408.03521  [pdf, other

    cs.CV

    SwinShadow: Shifted Window for Ambiguous Adjacent Shadow Detection

    Authors: Yonghui Wang, Shaokai Liu, Li Li, Wengang Zhou, Houqiang Li

    Abstract: Shadow detection is a fundamental and challenging task in many computer vision applications. Intuitively, most shadows come from the occlusion of light by the object itself, resulting in the object and its shadow being contiguous (referred to as the adjacent shadow in this paper). In this case, when the color of the object is similar to that of the shadow, existing methods struggle to achieve accu… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  50. arXiv:2408.03247  [pdf, other

    cs.CL cs.AI

    Unveiling Factual Recall Behaviors of Large Language Models through Knowledge Neurons

    Authors: Yifei Wang, Yuheng Chen, Wanting Wen, Yu Sheng, Linjing Li, Daniel Dajun Zeng

    Abstract: In this paper, we investigate whether Large Language Models (LLMs) actively recall or retrieve their internal repositories of factual knowledge when faced with reasoning tasks. Through an analysis of LLMs' internal factual recall at each reasoning step via Knowledge Neurons, we reveal that LLMs fail to harness the critical factual associations under certain circumstances. Instead, they tend to opt… ▽ More

    Submitted 12 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.