Skip to main content

Showing 1–50 of 4,929 results for author: Li, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03752  [pdf, other

    cs.CL

    Attention Heads of Large Language Models: A Survey

    Authors: Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Bo Tang, Feiyu Xiong, Zhiyu Li

    Abstract: Since the advent of ChatGPT, Large Language Models (LLMs) have excelled in various tasks but remain largely as black-box systems. Consequently, their development relies heavily on data-driven approaches, limiting performance enhancement through changes in internal architecture and reasoning pathways. As a result, many researchers have begun exploring the potential internal mechanisms of LLMs, aimi… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 20 pages, 11 figures, 4 tables

  2. arXiv:2409.03393  [pdf, other

    cs.NI

    VQ-DeepVSC: A Dual-Stage Vector Quantization Framework for Video Semantic Communication

    Authors: Yongyi Miao, Zhongdang Li, Yang Wang, Die Hu, Jun Yan, Youfang Wang

    Abstract: In response to the rapid growth of global videomtraffic and the limitations of traditional wireless transmission systems, we propose a novel dual-stage vector quantization framework, VQ-DeepVSC, tailored to enhance video transmission over wireless channels. In the first stage, we design the adaptive keyframe extractor and interpolator, deployed respectively at the transmitter and receiver, which i… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.03164  [pdf, other

    cs.LG cs.GR

    A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

    Authors: Zhen Li, Weikai Yang, Jun Yuan, Jing Wu, Changjian Chen, Yao Ming, Fan Yang, Hui Zhang, Shixia Liu

    Abstract: The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ignore anomalous rules that, despite their infrequenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 15 pages, 10 figures

  4. arXiv:2409.03142  [pdf, other

    cs.LG stat.ML

    Causal Temporal Representation Learning with Nonstationary Sparse Transition

    Authors: Xiangchen Song, Zijian Li, Guangyi Chen, Yujia Zheng, Yewen Fan, Xinshuai Dong, Kun Zhang

    Abstract: Causal Temporal Representation Learning (Ctrl) methods aim to identify the temporal causal dynamics of complex nonstationary temporal sequences. Despite the success of existing Ctrl methods, they require either directly observing the domain variables or assuming a Markov prior on them. Such requirements limit the application of these methods in real-world scenarios when we do not have such prior k… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  5. arXiv:2409.02802  [pdf, other

    cs.LG cs.CR stat.ML

    Boosting Certificate Robustness for Time Series Classification with Efficient Self-Ensemble

    Authors: Chang Dong, Zhengyang Li, Liangwei Zheng, Weitong Chen, Wei Emma Zhang

    Abstract: Recently, the issue of adversarial robustness in the time series domain has garnered significant attention. However, the available defense mechanisms remain limited, with adversarial training being the predominant approach, though it does not provide theoretical guarantees. Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 6 figures, 4 tables, 10 pages

    ACM Class: H.3.3

  6. arXiv:2409.02760  [pdf, other

    cs.AI

    An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting

    Authors: Zhuolin Li, Zhen Zhang, Witold Pedrycz

    Abstract: This paper introduces a novel incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting (MCS) problems, enabling decision makers to progressively provide assignment example preference information. Specifically, we first construct a max-margin optimization-based model to model potentially non-monotonic preferences and inconsistent… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 37 pages, 22 figures

  7. arXiv:2409.02738  [pdf, other

    cs.RO

    SOAR: Simultaneous Exploration and Photographing with Heterogeneous UAVs for Fast Autonomous Reconstruction

    Authors: Mingjie Zhang, Chen Feng, Zengzhi Li, Guiyong Zheng, Yiming Luo, Zhu Wang, Jinni Zhou, Shaojie Shen, Boyu Zhou

    Abstract: Unmanned Aerial Vehicles (UAVs) have gained significant popularity in scene reconstruction. This paper presents SOAR, a LiDAR-Visual heterogeneous multi-UAV system specifically designed for fast autonomous reconstruction of complex environments. Our system comprises a LiDAR-equipped explorer with a large field-of-view (FoV), alongside photographers equipped with cameras. To ensure rapid acquisitio… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted to IROS2024. Code: https://github.com/SYSU-STAR/SOAR. Project page: http://sysu-star.com/SOAR/

  8. arXiv:2409.02702  [pdf, other

    cs.SI cs.AI

    Incorporating Like-Minded Peers to Overcome Friend Data Sparsity in Session-Based Social Recommendations

    Authors: Chunyan An, Yunhan Li, Qiang Yang, Winston K. G. Seah, Zhixu Li, Conghao Yanga

    Abstract: Session-based Social Recommendation (SSR) leverages social relationships within online networks to enhance the performance of Session-based Recommendation (SR). However, existing SSR algorithms often encounter the challenge of ``friend data sparsity''. Moreover, significant discrepancies can exist between the purchase preferences of social network friends and those of the target user, reducing the… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: None

  9. arXiv:2409.02561  [pdf, other

    cs.AI cs.RO

    Vision-Language Navigation with Continual Learning

    Authors: Zhiyuan Li, Yanfeng Lv, Ziqin Tu, Di Shang, Hong Qiao

    Abstract: Vision-language navigation (VLN) is a critical domain within embedded intelligence, requiring agents to navigate 3D environments based on natural language instructions. Traditional VLN research has focused on improving environmental understanding and decision accuracy. However, these approaches often exhibit a significant performance gap when agents are deployed in novel environments, mainly due t… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  10. arXiv:2409.02522  [pdf, other

    cs.AI cs.RO

    Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments

    Authors: Zhiyuan Li, Yanfeng Lu, Yao Mu, Hong Qiao

    Abstract: Vision Language Navigation in Continuous Environments (VLN-CE) represents a frontier in embodied AI, demanding agents to navigate freely in unbounded 3D spaces solely guided by natural language instructions. This task introduces distinct challenges in multimodal comprehension, spatial reasoning, and decision-making. To address these challenges, we introduce Cog-GA, a generative agent founded on la… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  11. arXiv:2409.01995  [pdf, other

    eess.AS cs.AI cs.SD

    vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

    Authors: Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang, Shuai Wang, Xie Chen, Kai Yu

    Abstract: We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice conversion (VC). We use discrete tokens from speech self-supervised models as the content features of source speech, and treat VC as a prompted vocoding task. To amend the loss of speaker timbre in the content tokens, vec2wav 2.0 utilizes the WavLM features to provide strong timbre-dependent information. A novel adap… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures

  12. arXiv:2409.01976  [pdf, other

    cs.CR

    Benchmarking ZK-Friendly Hash Functions and SNARK Proving Systems for EVM-compatible Blockchains

    Authors: Hanze Guo, Yebo Feng, Cong Wu, Zengpeng Li, Jiahua Xu

    Abstract: With the rapid development of Zero-Knowledge Proofs (ZKPs), particularly Succinct Non-Interactive Arguments of Knowledge (SNARKs), benchmarking various ZK tools has become a valuable task. ZK-friendly hash functions, as key algorithms in blockchain, have garnered significant attention. Therefore, comprehensive benchmarking and evaluations of these evolving algorithms in ZK circuits present both pr… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  13. arXiv:2409.01944  [pdf, other

    cs.CL

    FuzzCoder: Byte-level Fuzzing Test via Large Language Model

    Authors: Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli Wang, Linzheng ChaI, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li

    Abstract: Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to p… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 11 pages

  14. arXiv:2409.01612  [pdf, other

    cs.AI cs.LG

    Lexicographic optimization-based approaches to learning a representative model for multi-criteria sorting with non-monotonic criteria

    Authors: Zhen Zhang, Zhuolin Li, Wenyu Yu

    Abstract: Deriving a representative model using value function-based methods from the perspective of preference disaggregation has emerged as a prominent and growing topic in multi-criteria sorting (MCS) problems. A noteworthy observation is that many existing approaches to learning a representative model for MCS problems traditionally assume the monotonicity of criteria, which may not always align with the… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 45 pages, 12 figures

  15. arXiv:2409.01552  [pdf, other

    cs.CL cs.AI

    Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs

    Authors: Zhuo Li, Yuhao Du, Jinpeng Hu, Xiang Wan, Anningzhe Gao

    Abstract: Large language models (LLMs) have shown success in generating high-quality responses. In order to achieve better alignment with LLMs with human preference, various works are proposed based on specific optimization process, which, however, is not suitable to Black-Box LLMs like GPT-4, due to inaccessible parameters. In Black-Box LLMs case, their performance is highly dependent on the quality of the… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  16. arXiv:2409.01380  [pdf, other

    cs.CR cs.CL

    Membership Inference Attacks Against In-Context Learning

    Authors: Rui Wen, Zheng Li, Michael Backes, Yang Zhang

    Abstract: Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on gen… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: To Appear in the ACM Conference on Computer and Communications Security, October 14-18, 2024

  17. arXiv:2409.01256  [pdf, other

    cs.CV cs.AI

    Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling

    Authors: Haicheng Liao, Yongkang Li, Chengyue Wang, Songning Lai, Zhenning Li, Zilin Bian, Jaeyoung Lee, Zhiyong Cui, Guohui Zhang, Chengzhong Xu

    Abstract: The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  18. arXiv:2409.01199  [pdf, other

    cs.CV eess.IV

    OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

    Authors: Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinghua Cheng, Li Yuan

    Abstract: Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ign… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: https://github.com/PKU-YuanGroup/Open-Sora-Plan

  19. LLM-PQA: LLM-enhanced Prediction Query Answering

    Authors: Ziyu Li, Wenjie Zhao, Asterios Katsifodimos, Rihan Hai

    Abstract: The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML model has to be employed and inference has to be performed in order to provide an answer. This paper introduces LLM-PQA, a n… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: This paper is accepted as a demo at CIKM 2024

  20. arXiv:2409.01081  [pdf, other

    cs.LG cs.AI q-bio.BM

    Beyond Efficiency: Molecular Data Pruning for Enhanced Generalization

    Authors: Dingshuo Chen, Zhixun Li, Yuyan Ni, Guibin Zhang, Ding Wang, Qiang Liu, Shu Wu, Jeffrey Xu Yu, Liang Wang

    Abstract: With the emergence of various molecular tasks and massive datasets, how to perform efficient training has become an urgent yet under-explored issue in the area. Data pruning (DP), as an oft-stated approach to saving training burdens, filters out less influential samples to form a coreset for training. However, the increasing reliance on pretrained models for molecular tasks renders traditional in-… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 20 pages, under review

  21. arXiv:2409.01021   

    cs.CV

    CONDA: Condensed Deep Association Learning for Co-Salient Object Detection

    Authors: Long Li, Nian Liu, Dingwen Zhang, Zhongyu Li, Salman Khan, Rao Anwer, Hisham Cholakkal, Junwei Han, Fahad Shahbaz Khan

    Abstract: Inter-image association modeling is crucial for co-salient object detection. Despite satisfactory performance, previous methods still have limitations on sufficient inter-image association modeling. Because most of them focus on image feature optimization under the guidance of heuristically calculated raw inter-image associations. They directly rely on raw associations which are not reliable in co… ▽ More

    Submitted 4 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: There is an error. In Sec 4.1, the number of images in some dataset is incorrect and needs to be revised

    Journal ref: ECCV2024

  22. arXiv:2409.01020  [pdf, other

    cs.CV eess.IV

    Fed-MUnet: Multi-modal Federated Unet for Brain Tumor Segmentation

    Authors: Ruojun Zhou, Lisha Qu, Lei Zhang, Ziming Li, Hongwei Yu, Bing Luo

    Abstract: Deep learning-based techniques have been widely utilized for brain tumor segmentation using both single and multi-modal Magnetic Resonance Imaging (MRI) images. Most current studies focus on centralized training due to the intrinsic challenge of data sharing across clinics. To mitigate privacy concerns, researchers have introduced Federated Learning (FL) methods to brain tumor segmentation tasks.… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 6 pages, 3 figures, 2 tables. It was accepted by 2024 IEEE International Conference on E-health Networking, Application & Services (HealthCom)

  23. arXiv:2409.00966  [pdf, other

    math.PR cs.DS cs.LG math.ST

    A computational transition for detecting correlated stochastic block models by low-degree polynomials

    Authors: Guanyi Chen, Jian Ding, Shuyang Gong, Zhangsong Li

    Abstract: Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years. In this work, we consider a pair of correlated (sparse) stochastic block models $\mathcal{S}(n,\tfracλ{n};k,ε;s)$ that are subsampled from a common parent stochastic block model $\mathcal S(n,\tfracλ{n};k,ε)$ with $k=O(1)$ symmetric communiti… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 75 pages, 2 figures

    MSC Class: Primary 68Q87; Secondary 62M20

  24. arXiv:2409.00884  [pdf

    eess.IV cs.CV

    A Novel Hybrid Parameter-Efficient Fine-Tuning Approach for Hippocampus Segmentation and Alzheimer's Disease Diagnosis

    Authors: Wangang Cheng, Guanghua He, Keli Hu, Mingyu Fang, Liang Dong, Zhong Li, Hancan Zhu

    Abstract: Deep learning methods have significantly advanced medical image segmentation, yet their success hinges on large volumes of manually annotated data, which require specialized expertise for accurate labeling. Additionally, these methods often demand substantial computational resources, particularly for three-dimensional medical imaging tasks. Consequently, applying deep learning techniques for medic… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  25. arXiv:2409.00088  [pdf, other

    cs.CL

    On-Device Language Models: A Comprehensive Review

    Authors: Jiajun Xu, Zhiyuan Li, Wei Chen, Qun Wang, Xin Gao, Qi Cai, Ziyuan Ling

    Abstract: The advent of large language models (LLMs) revolutionized natural language processing applications, and running LLMs on edge devices has become increasingly attractive for reasons including reduced latency, data localization, and personalized user experiences. This comprehensive review examines the challenges of deploying computationally expensive LLMs on resource-constrained devices and explores… ▽ More

    Submitted 25 August, 2024; originally announced September 2024.

    Comments: 38 pages, 6 figures

  26. arXiv:2409.00086  [pdf, other

    cs.NI cs.AR cs.HC cs.LG eess.SY

    Towards Battery-Free Wireless Sensing via Radio-Frequency Energy Harvesting

    Authors: Tao Ni, Zehua Sun, Mingda Han, Guohao Lan, Yaxiong Xie, Zhenjiang Li, Tao Gu, Weitao Xu

    Abstract: Diverse Wi-Fi-based wireless applications have been proposed, ranging from daily activity recognition to vital sign monitoring. Despite their remarkable sensing accuracy, the high energy consumption and the requirement for customized hardware modification hinder the wide deployment of the existing sensing solutions. In this paper, we propose REHSense, an energy-efficient wireless sensing solution… ▽ More

    Submitted 25 August, 2024; originally announced September 2024.

  27. arXiv:2409.00040  [pdf, other

    cs.NI

    Digital Twin-Empowered Routing Management for Reliable Multi-Hop Millimeter Wave V2X

    Authors: Supat Roongpraiwan, Zongdian Li, Tao Yu, Kei Sakaguchi

    Abstract: Digital twin (DT) technology can replicate physical entities in cyberspace. A mobility DT digitalizes connected and autonomous vehicles (CAVs) and their surrounding traffic environment, allowing to monitor the maneuvering and distribution of CAVs in real-time, which is crucial for managing vehicle-to-everything (V2X) connectivity, especially when millimeter wave (mmWave) is adopted. MmWave V2X rel… ▽ More

    Submitted 18 August, 2024; originally announced September 2024.

  28. arXiv:2408.17253  [pdf, other

    cs.CV cs.AI

    VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters

    Authors: Mouxiang Chen, Lefei Shen, Zhuo Li, Xiaoyun Joy Wang, Jianling Sun, Chenghao Liu

    Abstract: Foundation models have emerged as a promising approach in time series forecasting (TSF). Existing approaches either fine-tune large language models (LLMs) or build large-scale time-series datasets to develop TSF foundation models. However, these methods face challenges due to the severe cross-domain gap or in-domain heterogeneity. In this paper, we explore a new road to building a TSF foundation m… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 26 pages, 11 figures

  29. arXiv:2408.17227  [pdf

    stat.AP cs.CE

    A Framework for Digital Asset Risks with Insurance Applications

    Authors: Zhengming Li, Jianxi Su, Maochao Xu, Jimmy Yuen

    Abstract: The remarkable growth of digital assets, starting from the inception of Bitcoin in 2009 into a 1 trillion market in 2024, underscores the momentum behind disruptive technologies and the global appetite for digital assets. This paper develops a framework to enhance actuaries' understanding of the cyber risks associated with the developing digital asset ecosystem, as well as their measurement method… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  30. arXiv:2408.16913  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Analyzing Inference Privacy Risks Through Gradients in Machine Learning

    Authors: Zhuohang Li, Andrew Lowy, Jing Liu, Toshiaki Koike-Akino, Kieran Parsons, Bradley Malin, Ye Wang

    Abstract: In distributed learning settings, models are iteratively updated with shared gradients computed from potentially sensitive user data. While previous work has studied various privacy risks of sharing gradients, our paper aims to provide a systematic approach to analyze private information leakage from gradients. We present a unified game-based framework that encompasses a broad range of attacks inc… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  31. arXiv:2408.16766  [pdf, other

    cs.CV

    CSGO: Content-Style Composition in Text-to-Image Generation

    Authors: Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li

    Abstract: The diffusion model has shown exceptional capabilities in controlled image generation, which has further fueled interest in image style transfer. Existing works mainly focus on training free-based methods (e.g., image inversion) due to the scarcity of specific data. In this study, we present a data construction pipeline for content-style-stylized image triplets that generates and automatically cle… ▽ More

    Submitted 4 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

  32. arXiv:2408.16673  [pdf, other

    cs.LG cs.AI

    Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

    Authors: Ziniu Li, Congliang Chen, Tian Xu, Zeyu Qin, Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo

    Abstract: Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting and limited output diversity due to its aggressive updates to the data distribution. This paper aim to address these issues by introducing the maximum entropy principle, which favors models with flatter distributions… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  33. arXiv:2408.16577  [pdf, other

    cs.LG cs.AI

    Seeking the Sufficiency and Necessity Causal Features in Multimodal Representation Learning

    Authors: Boyu Chen, Junjie Liu, Zhu Li, Mengyue yang

    Abstract: Learning representations with a high Probability of Necessary and Sufficient Causes (PNS) has been shown to enhance deep learning models' ability. This task involves identifying causal features that are both sufficient (guaranteeing the outcome) and necessary (without which the outcome cannot occur). However, current research predominantly focuses on unimodal data, and extending PNS learning to mu… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  34. arXiv:2408.16288  [pdf, other

    cs.LG cs.AI cs.DB cs.SI

    OpenFGL: A Comprehensive Benchmarks for Federated Graph Learning

    Authors: Xunkai Li, Yinlin Zhu, Boyang Pang, Guochen Yan, Yeyu Yan, Zening Li, Zhengyu Wu, Wentao Zhang, Rong-Hua Li, Guoren Wang

    Abstract: Federated graph learning (FGL) has emerged as a promising distributed training paradigm for graph neural networks across multiple local systems without direct data sharing. This approach is particularly beneficial in privacy-sensitive scenarios and offers a new perspective on addressing scalability challenges in large-scale graph learning. Despite the proliferation of FGL, the diverse motivations… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Under Review

  35. arXiv:2408.15991  [pdf, other

    cs.CV

    Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

    Authors: Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun

    Abstract: Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into an one-step student generator, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  36. arXiv:2408.15740  [pdf

    cs.CV

    MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms

    Authors: Tianyi Shang, Zhenyu Li, Wenhao Pei, Pengjie Xu, ZhaoJun Deng, Fanchen Kong

    Abstract: Vision Language Place Recognition (VLVPR) enhances robot localization performance by incorporating natural language descriptions from images. By utilizing language information, VLVPR directs robot place matching, overcoming the constraint of solely depending on vision. The essence of multimodal fusion lies in mining the complementary information between different modalities. However, general fusio… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 8 pages

  37. arXiv:2408.15702  [pdf, other

    cs.LG cs.AI

    Evaluating Model Robustness Using Adaptive Sparse L0 Regularization

    Authors: Weiyou Liu, Zhenyang Li, Weitong Chen

    Abstract: Deep Neural Networks have demonstrated remarkable success in various domains but remain susceptible to adversarial examples, which are slightly altered inputs designed to induce misclassification. While adversarial attacks typically optimize under Lp norm constraints, attacks based on the L0 norm, prioritising input sparsity, are less studied due to their complex and non convex nature. These spars… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by the 20th International Conference on Advanced Data Mining and Applications (ADMA 2024)

    ACM Class: F.2.2, I.2.7

  38. arXiv:2408.15561  [pdf, other

    cs.AR cs.AI

    CGRA4ML: A Framework to Implement Modern Neural Networks for Scientific Edge Computing

    Authors: G Abarajithan, Zhenghua Ma, Zepeng Li, Shrideep Koparkar, Ravidu Munasinghe, Francesco Restuccia, Ryan Kastner

    Abstract: Scientific edge computing increasingly relies on hardware-accelerated neural networks to implement complex, near-sensor processing at extremely high throughputs and low latencies. Existing frameworks like HLS4ML are effective for smaller models, but struggle with larger, modern neural networks due to their requirement of spatially implementing the neural network layers and storing all weights in o… ▽ More

    Submitted 28 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  39. arXiv:2408.15518  [pdf, other

    cs.CL

    Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models

    Authors: Wei Chen, Zhiyuan Li, Shuo Xin, Yihao Wang

    Abstract: This paper presents Dolphin, a novel decoder-decoder architecture for energy-efficient processing of long contexts in language models. Our approach addresses the significant energy consumption and latency challenges inherent in on-device models. Dolphin employs a compact 0.5B parameter decoder to distill extensive contextual information into a memory embedding, substantially reducing the input len… ▽ More

    Submitted 3 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  40. arXiv:2408.15273  [pdf

    eess.SP cs.IT

    Concentric UCAs Based Low-Order OAM for High Capacity in Radio Vortex Wireless Communications

    Authors: Haiyue Jing, Wenchi Cheng, Zan Li, Hailin Zhang

    Abstract: Due to the potential capacity-boosting for wireless communications, the Radio vOrtex Wireless COMMunication (RowComm) over orthogonal states/modes of Orbital Angular Momentum (OAM) has been paid much attention in recent years. Uniform circular array (UCA), as an efficient and convenient antenna structure, can transmit/receive multiple OAM beams with different OAM-modes simultaneously when the tran… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  41. arXiv:2408.14892  [pdf, other

    cs.CL cs.SD eess.AS

    A Functional Trade-off between Prosodic and Semantic Cues in Conveying Sarcasm

    Authors: Zhu Li, Xiyuan Gao, Yuqing Zhang, Shekhar Nayak, Matt Coler

    Abstract: This study investigates the acoustic features of sarcasm and disentangles the interplay between the propensity of an utterance being used sarcastically and the presence of prosodic cues signaling sarcasm. Using a dataset of sarcastic utterances compiled from television shows, we analyze the prosodic features within utterances and key phrases belonging to three distinct sarcasm categories (embedded… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: accepted at Interspeech 2024

  42. arXiv:2408.14853  [pdf, other

    cs.CL cs.AI cs.CR

    Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models

    Authors: Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao

    Abstract: Large Language Models (LLMs) have become a focal point in the rapidly evolving field of artificial intelligence. However, a critical concern is the presence of toxic content within the pre-training corpus of these models, which can lead to the generation of inappropriate outputs. Investigating methods for detecting internal faults in LLMs can help us understand their limitations and improve their… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  43. arXiv:2408.14851  [pdf, other

    cs.IR

    Graph and Sequential Neural Networks in Session-based Recommendation: A Survey

    Authors: Zihao Li, Chao Yang, Yakun Chen, Xianzhi Wang, Hongxu Chen, Guandong Xu, Lina Yao, Quan Z. Sheng

    Abstract: Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users' short-term preference capture and aims to provide a more dynamic and timely recommendation based on the ongoing interacted actions. In this survey, we will give a comprehensive overview… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  44. arXiv:2408.14840  [pdf, other

    cs.AI cs.CL cs.LG

    CL4KGE: A Curriculum Learning Method for Knowledge Graph Embedding

    Authors: Yang Liu, Chuan Zhou, Peng Zhang, Yanan Cao, Yongchao Liu, Zhao Li, Hongyang Chen

    Abstract: Knowledge graph embedding (KGE) constitutes a foundational task, directed towards learning representations for entities and relations within knowledge graphs (KGs), with the objective of crafting representations comprehensive enough to approximate the logical and symbolic interconnections among entities. In this paper, we define a metric Z-counts to measure the difficulty of training each triple (… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 16 pages, 3 figures

  45. arXiv:2408.14805  [pdf, other

    cs.CV

    Platypus: A Generalized Specialist Model for Reading Text in Various Forms

    Authors: Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao

    Abstract: Reading text from images (either natural scenes or documents) has been a long-standing research topic for decades, due to the high technical challenge and wide application range. Previously, individual specialist models are developed to tackle the sub-tasks of text reading (e.g., scene text recognition, handwritten text recognition and mathematical expression recognition). However, such specialist… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024

  46. arXiv:2408.14611  [pdf

    cs.DC cs.DB

    Scalable, reproducible, and cost-effective processing of large-scale medical imaging datasets

    Authors: Michael E. Kim, Karthik Ramadass, Chenyu Gao, Praitayini Kanakaraj, Nancy R. Newlin, Gaurav Rudravaram, Kurt G. Schilling, Blake E. Dewey, Derek Archer, Timothy J. Hohman, Zhiyuan Li, Shunxing Bao, Bennett A. Landman, Nazirah Mohd Khairi

    Abstract: Curating, processing, and combining large-scale medical imaging datasets from national studies is a non-trivial task due to the intense computation and data throughput required, variability of acquired data, and associated financial overhead. Existing platforms or tools for large-scale data curation, processing, and storage have difficulty achieving a viable cost-to-scale ratio of computation spee… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  47. arXiv:2408.14468  [pdf, other

    cs.AI cs.CV cs.HC

    K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences

    Authors: Zhikai Li, Xuewen Liu, Dongrong Fu, Jianquan Li, Qingyi Gu, Kurt Keutzer, Zhen Dong

    Abstract: The rapid advancement of visual generative models necessitates efficient and reliable evaluation methods. Arena platform, which gathers user votes on model comparisons, can rank models with human preferences. However, traditional Arena methods, while established, require an excessive number of comparisons for ranking to converge and are vulnerable to preference noise in voting, suggesting the need… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Project page: https://huggingface.co/spaces/ksort/K-Sort-Arena

  48. arXiv:2408.14051  [pdf, other

    cs.CV

    Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection

    Authors: Yuncheng Jiang, Zixun Zhang, Jun Wei, Chun-Mei Feng, Guanbin Li, Xiang Wan, Shuguang Cui, Zhen Li

    Abstract: AI-assisted lesion detection models play a crucial role in the early screening of cancer. However, previous image-based models ignore the inter-frame contextual information present in videos. On the other hand, video-based models capture the inter-frame context but are computationally expensive. To mitigate this contradiction, we delve into Video-to-Image knowledge distillation leveraging DEtectio… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: BIBM2024

  49. arXiv:2408.13985  [pdf, other

    cs.CL

    TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models

    Authors: Zelin Li, Kehai Chen, Xuefeng Bai, Lemao Liu, Mingming Yang, Yang Xiang, Min Zhang

    Abstract: With the great advancements in large language models (LLMs), adversarial attacks against LLMs have recently attracted increasing attention. We found that pre-existing adversarial attack methodologies exhibit limited transferability and are notably inefficient, particularly when applied to LLMs. In this paper, we analyze the core mechanisms of previous predominant adversarial attack methods, reveal… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 14 pages, 6 figures

  50. arXiv:2408.13770  [pdf, other

    cs.CV

    TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

    Authors: Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, Haoqian Wang

    Abstract: Compared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlap… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.