Skip to main content

Showing 1–50 of 1,267 results for author: He, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03183  [pdf, other

    cs.CL cs.AI

    Bypassing DARCY Defense: Indistinguishable Universal Adversarial Triggers

    Authors: Zuquan Peng, Yuanyuan He, Jianbing Ni, Ben Niu

    Abstract: Neural networks (NN) classification models for Natural Language Processing (NLP) are vulnerable to the Universal Adversarial Triggers (UAT) attack that triggers a model to produce a specific prediction for any input. DARCY borrows the "honeypot" concept to bait multiple trapdoors, effectively detecting the adversarial examples generated by UAT. Unfortunately, we find a new UAT generation method, c… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 13 pages, 5 figures

    ACM Class: I.2.7

  2. arXiv:2409.03024  [pdf, other

    cs.LG

    NUMOSIM: A Synthetic Mobility Dataset with Anomaly Detection Benchmarks

    Authors: Chris Stanford, Suman Adari, Xishun Liao, Yueshuai He, Qinhua Jiang, Chenchen Kuai, Jiaqi Ma, Emmanuel Tung, Yinlong Qian, Lingyi Zhao, Zihao Zhou, Zeeshan Rasheed, Khurram Shafique

    Abstract: Collecting real-world mobility data is challenging. It is often fraught with privacy concerns, logistical difficulties, and inherent biases. Moreover, accurately annotating anomalies in large-scale data is nearly impossible, as it demands meticulous effort to distinguish subtle and complex patterns. These challenges significantly impede progress in geospatial anomaly detection research by restrict… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.02919  [pdf, other

    cs.CV

    HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

    Authors: Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qingfeng Liu, Yike Guo

    Abstract: The potential for higher-resolution image generation using pretrained diffusion models is immense, yet these models often struggle with issues of object repetition and structural artifacts especially when scaling to 4K resolution and higher. We figure out that the problem is caused by that, a single prompt for the generation of multiple scales provides insufficient efficacy. In response, we propos… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.01668  [pdf, other

    cs.SD cs.AI eess.AS

    Pureformer-VC: Non-parallel One-Shot Voice Conversion with Pure Transformer Blocks and Triplet Discriminative Training

    Authors: Wenhan Yao, Zedong Xing, Xiarun Chen, Jia Liu, Yongqiang He, Weiping Wen

    Abstract: One-shot voice conversion(VC) aims to change the timbre of any source speech to match that of the unseen target speaker with only one speech sample. Existing style transfer-based VC methods relied on speech representation disentanglement and suffered from accurately and independently encoding each speech component and recomposing back to converted speech effectively. To tackle this, we proposed Pu… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: submmited to ICASSP 2025

  5. arXiv:2409.01012  [pdf, other

    cs.IR cs.LG

    Improved Diversity-Promoting Collaborative Metric Learning for Recommendation

    Authors: Shilong Bao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, Qingming Huang

    Abstract: Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this settin… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2209.15292

  6. arXiv:2409.00426  [pdf, other

    cs.CR

    Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks

    Authors: Yu He, Boheng Li, Yao Wang, Mengda Yang, Juan Wang, Hongxin Hu, Xingyu Zhao

    Abstract: The vulnerability of machine learning models to Membership Inference Attacks (MIAs) has garnered considerable attention in recent years. These attacks determine whether a data sample belongs to the model's training set or not. Recent research has focused on reference-based attacks, which leverage difficulty calibration with independently trained reference models. While empirical studies have demon… ▽ More

    Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Accepted by ACM CCS 2024

  7. arXiv:2408.17284  [pdf, other

    cs.CV

    DCUDF2: Improving Efficiency and Accuracy in Extracting Zero Level Sets from Unsigned Distance Fields

    Authors: Xuhui Chen, Fugang Yu, Fei Hou, Wencheng Wang, Zhebin Zhang, Ying He

    Abstract: Unsigned distance fields (UDFs) allow for the representation of models with complex topologies, but extracting accurate zero level sets from these fields poses significant challenges, particularly in preserving topological accuracy and capturing fine geometric details. To overcome these issues, we introduce DCUDF2, an enhancement over DCUDF--the current state-of-the-art method--for extracting zero… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  8. arXiv:2408.16451  [pdf, other

    cs.CV

    Weakly Supervised Object Detection for Automatic Tooth-marked Tongue Recognition

    Authors: Yongcun Zhang, Jiajun Xu, Yina He, Shaozi Li, Zhiming Luo, Huangwei Lei

    Abstract: Tongue diagnosis in Traditional Chinese Medicine (TCM) is a crucial diagnostic method that can reflect an individual's health status. Traditional methods for identifying tooth-marked tongues are subjective and inconsistent because they rely on practitioner experience. We propose a novel fully automated Weakly Supervised method using Vision transformer and Multiple instance learning WSVM for tongue… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.16289  [pdf

    cs.CV

    Convolutional Neural Network Compression Based on Low-Rank Decomposition

    Authors: Yaping He, Linhao Jiang, Di Wu

    Abstract: Deep neural networks typically impose significant computational loads and memory consumption. Moreover, the large parameters pose constraints on deploying the model on edge devices such as embedded systems. Tensor decomposition offers a clear advantage in compressing large-scale weight tensors. Nevertheless, direct utilization of low-rank decomposition typically leads to significant accuracy loss.… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 10 pages, 1 figures

  10. arXiv:2408.15508  [pdf, other

    cs.SD cs.AI eess.AS

    EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models

    Authors: Wenhan Yao, Zedong XingXiarun Chen, Jia Liu, yongqiang He, Weiping Wen

    Abstract: Deep speech classification tasks, mainly including keyword spotting and speaker verification, play a crucial role in speech-based human-computer interaction. Recently, the security of these technologies has been demonstrated to be vulnerable to backdoor attacks. Specifically speaking, speech samples are attacked by noisy disruption and component modification in present triggers. We suggest that sp… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Submitted to ICASSP 2025

  11. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  12. arXiv:2408.13995  [pdf, other

    cs.CV

    Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control

    Authors: Yixuan He, Lin Geng Foo, Ajmal Saeed Mian, Hossein Rahmani, Jun Jiu

    Abstract: Language based editing of 3D human avatars to precisely match user requirements is challenging due to the inherent ambiguity and limited expressiveness of natural language. To overcome this, we propose the Avatar Concept Slider (ACS), a 3D avatar editing method that allows precise manipulation of semantic concepts in human avatars towards a specified intermediate point between two extremes of conc… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  13. arXiv:2408.13656  [pdf, other

    cs.LG cs.CL cs.CV

    Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic

    Authors: Yifei He, Yuzheng Hu, Yong Lin, Tong Zhang, Han Zhao

    Abstract: Model merging offers an effective strategy to combine the strengths of multiple finetuned models into a unified model that preserves the specialized capabilities of each. Existing methods merge models in a global manner, performing arithmetic operations across all model parameters. However, such global merging often leads to task interference, degrading the performance of the merged model. In this… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  14. arXiv:2408.12825  [pdf, other

    cs.CV

    MergeUp-augmented Semi-Weakly Supervised Learning for WSI Classification

    Authors: Mingxi Ouyang, Yuqiu Fu, Renao Yan, ShanShan Shi, Xitong Ling, Lianghui Zhu, Yonghong He, Tian Guan

    Abstract: Recent advancements in computational pathology and artificial intelligence have significantly improved whole slide image (WSI) classification. However, the gigapixel resolution of WSIs and the scarcity of manual annotations present substantial challenges. Multiple instance learning (MIL) is a promising weakly supervised learning approach for WSI classification. Recently research revealed employing… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  15. arXiv:2408.12665  [pdf, ps, other

    cs.LG cs.AI cs.GR

    Fairness-Aware Streaming Feature Selection with Causal Graphs

    Authors: Leizhen Zhang, Lusi Li, Di Wu, Sheng Chen, Yi He

    Abstract: Its crux lies in the optimization of a tradeoff between accuracy and fairness of resultant models on the selected feature subset. The technical challenge of our setting is twofold: 1) streaming feature inputs, such that an informative feature may become obsolete or redundant for prediction if its information has been covered by other similar features that arrived prior to it, and 2) non-associatio… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by the 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2024)

  16. arXiv:2408.11878  [pdf, other

    cs.CL cs.CE q-fin.CP

    Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

    Authors: Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu , et al. (14 additional authors not shown)

    Abstract: Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, table… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 33 pages, 13 figures

  17. arXiv:2408.11210  [pdf, other

    cs.CV

    A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation

    Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Daguang Xu, Wenqi Li

    Abstract: Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  18. arXiv:2408.10774  [pdf, other

    cs.AI cs.CL

    Flexora: Flexible Low Rank Adaptation for Large Language Models

    Authors: Chenxing Wei, Yao Shu, Ying Tiffany He, Fei Richard Yu

    Abstract: Large Language Models (LLMs) are driving advancements in artificial intelligence by increasing the scale of model parameters, which has significantly enhanced generalization ability and unlocked new capabilities in practice. However, their performance in specific downstream tasks is usually hindered by their knowledge boundaries on these tasks. Thus, fine-tuning techniques, especially the widely u… ▽ More

    Submitted 21 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 29 pages, 13 figures

  19. arXiv:2408.10571   

    cs.CV cs.AI

    Prompt-Agnostic Adversarial Perturbation for Customized Diffusion Models

    Authors: Cong Wan, Yuhang He, Xiang Song, Yihong Gong

    Abstract: Diffusion models have revolutionized customized text-to-image generation, allowing for efficient synthesis of photos from personal data with textual descriptions. However, these advancements bring forth risks including privacy breaches and unauthorized replication of artworks. Previous researches primarily center around using prompt-specific methods to generate adversarial examples to protect pers… ▽ More

    Submitted 29 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: The experiments are insufficient and need to be completed

  20. arXiv:2408.10145  [pdf, other

    cs.CV

    Multi-Scale Representation Learning for Image Restoration with State-Space Model

    Authors: Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, Lu Wang

    Abstract: Image restoration endeavors to reconstruct a high-quality, detail-rich image from a degraded counterpart, which is a pivotal process in photography and various computer vision systems. In real-world scenarios, different types of degradation can cause the loss of image details at various scales and degrade image contrast. Existing methods predominantly rely on CNN and Transformer to capture multi-s… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  21. arXiv:2408.09920  [pdf, other

    cs.CV cs.MM eess.IV

    Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement

    Authors: Kang Xiao, Xu Wang, Yulin He, Baoliang Chen, Xuelin Shen

    Abstract: Full-reference image quality assessment (FR-IQA) models generally operate by measuring the visual differences between a degraded image and its reference. However, existing FR-IQA models including both the classical ones (eg, PSNR and SSIM) and deep-learning based measures (eg, LPIPS and DISTS) still exhibit limitations in capturing the full perception characteristics of the human visual system (HV… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 6 pages, 5 figures, accepted by ICME2024

  22. arXiv:2408.09406  [pdf, other

    cs.SI physics.soc-ph

    Uncovering multi-order Popularity and Similarity Mechanisms in Link Prediction by graphlet predictors

    Authors: Yong-Jian He, Yijun Ran, Zengru Di, Tao Zhou, Xiao-Ke Xu

    Abstract: Link prediction has become a critical problem in network science and has thus attracted increasing research interest. Popularity and similarity are two primary mechanisms in the formation of real networks. However, the roles of popularity and similarity mechanisms in link prediction across various domain networks remain poorly understood. Accordingly, this study used orbit degrees of graphlets to… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 40 pages, 9 figures

  23. arXiv:2408.09265  [pdf, other

    cs.CR cs.LG cs.NI eess.SY

    ByCAN: Reverse Engineering Controller Area Network (CAN) Messages from Bit to Byte Level

    Authors: Xiaojie Lin, Baihe Ma, Xu Wang, Guangsheng Yu, Ying He, Ren Ping Liu, Wei Ni

    Abstract: As the primary standard protocol for modern cars, the Controller Area Network (CAN) is a critical research target for automotive cybersecurity threats and autonomous applications. As the decoding specification of CAN is a proprietary black-box maintained by Original Equipment Manufacturers (OEMs), conducting related research and industry developments can be challenging without a comprehensive unde… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accept by IEEE Internet of Things Journal, 15 pages, 5 figures, 6 tables

  24. arXiv:2408.08248  [pdf, other

    cs.AI

    Conformalized Answer Set Prediction for Knowledge Graph Embedding

    Authors: Yuqicheng Zhu, Nico Potyka, Jiarong Pan, Bo Xiong, Yunjie He, Evgeny Kharlamov, Steffen Staab

    Abstract: Knowledge graph embeddings (KGE) apply machine learning methods on knowledge graphs (KGs) to provide non-classical reasoning capabilities based on similarities and analogies. The learned KG embeddings are typically used to answer queries by ranking all potential answers, but rankings often lack a meaningful probabilistic interpretation - lower-ranked answers do not necessarily have a lower probabi… ▽ More

    Submitted 25 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: Under Review

  25. arXiv:2408.08226  [pdf, other

    cs.AI

    Predictive Multiplicity of Knowledge Graph Embeddings in Link Prediction

    Authors: Yuqicheng Zhu, Nico Potyka, Mojtaba Nayyeri, Bo Xiong, Yunjie He, Evgeny Kharlamov, Steffen Staab

    Abstract: Knowledge graph embedding (KGE) models are often used to predict missing links for knowledge graphs (KGs). However, multiple KG embeddings can perform almost equally well for link prediction yet suggest conflicting predictions for certain queries, termed \textit{predictive multiplicity} in literature. This behavior poses substantial risks for KGE-based applications in high-stake domains but has be… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Under Review

  26. arXiv:2408.08134  [pdf, other

    cs.CV

    CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning

    Authors: Wei Zhu, Yicheng Liu, Yuping He, Tangfei Liao, Kang Zheng, Xiaoqiu Xu, Tao Wang, Tong Lu

    Abstract: In the fields of computer vision and robotics, accurate pixel-level correspondences are essential for enabling advanced tasks such as structure-from-motion and simultaneous localization and mapping. Recent correspondence pruning methods usually focus on learning local consistency through k-nearest neighbors, which makes it difficult to capture robust context for each correspondence. We propose Cor… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures, accepted by ECAI

  27. arXiv:2408.07890  [pdf, other

    stat.ML cs.LG

    Local Causal Discovery with Background Knowledge

    Authors: Qingyuan Zheng, Yue Liu, Yangbo He

    Abstract: Causality plays a pivotal role in various fields of study. Based on the framework of causal graphical models, previous works have proposed identifying whether a variable is a cause or non-cause of a target in every Markov equivalent graph solely by learning a local structure. However, the presence of prior knowledge, often represented as a partially known causal graph, is common in many causal mod… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  28. arXiv:2408.06969  [pdf, ps, other

    cs.NI cs.LG

    IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization

    Authors: Guanchang Li, Wensheng Lin, Lixin Li, Yixuan He, Fucheng Yang, Zhu Han

    Abstract: This paper focuses on an intelligent reflecting surface (IRS)-assisted lossy communication system with correlated Rayleigh fading. We analyze the correlated channel model and derive the outage probability of the system. Then, we design a deep reinforce learning (DRL) method to optimize the phase shift of IRS, in order to maximize the received signal power. Moreover, this paper presents results of… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  29. arXiv:2408.06742  [pdf, other

    cs.CV

    Long-Tailed Out-of-Distribution Detection: Prioritizing Attention to Tail

    Authors: Yina He, Lei Peng, Yongcun Zhang, Juanjuan Weng, Zhiming Luo, Shaozi Li

    Abstract: Current out-of-distribution (OOD) detection methods typically assume balanced in-distribution (ID) data, while most real-world data follow a long-tailed distribution. Previous approaches to long-tailed OOD detection often involve balancing the ID data by reducing the semantics of head classes. However, this reduction can severely affect the classification accuracy of ID data. The main challenge of… ▽ More

    Submitted 24 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  30. arXiv:2408.04713  [pdf, other

    cs.LG cs.AI

    DyGMamba: Efficiently Modeling Long-Term Temporal Dependency on Continuous-Time Dynamic Graphs with State Space Models

    Authors: Zifeng Ding, Yifeng Li, Yuan He, Antonio Norelli, Jingcheng Wu, Volker Tresp, Yunpu Ma, Michael Bronstein

    Abstract: Learning useful representations for continuous-time dynamic graphs (CTDGs) is challenging, due to the concurrent need to span long node interaction histories and grasp nuanced temporal details. In particular, two problems emerge: (1) Encoding longer histories requires more computational resources, making it crucial for CTDG models to maintain low computational complexity to ensure efficiency; (2)… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Preprint. Work on progress

  31. Digital Avatars: Framework Development and Their Evaluation

    Authors: Timothy Rupprecht, Sung-En Chang, Yushu Wu, Lei Lu, Enfu Nan, Chih-hsiang Li, Caiyue Lai, Zhimin Li, Zhijun Hu, Yumei He, David Kaeli, Yanzhi Wang

    Abstract: We present a novel prompting strategy for artificial intelligence driven digital avatars. To better quantify how our prompting strategy affects anthropomorphic features like humor, authenticity, and favorability we present Crowd Vote - an adaptation of Crowd Score that allows for judges to elect a large language model (LLM) candidate over competitors answering the same or similar prompts. To visua… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: This work was presented during the IJCAI 2024 conference proceedings for demonstrations

    MSC Class: 68 ACM Class: D.2.2; C.3

    Journal ref: 2024 Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence Demo Track. Pages 8780-8783

  32. arXiv:2408.01923  [pdf, other

    cs.RO

    Scalable Signal Temporal Logic Guided Reinforcement Learning via Value Function Space Optimization

    Authors: Yiting He, Peiran Liu, Yiding Ji

    Abstract: The integration of reinforcement learning (RL) and formal methods has emerged as a promising framework for solving long-horizon planning problems. Conventional approaches typically involve abstraction of the state and action spaces and manually created labeling functions or predicates. However, the efficiency of these approaches deteriorates as the tasks become increasingly complex, which results… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  33. arXiv:2408.00315  [pdf, other

    cs.LG cs.AI cs.CV

    ADBM: Adversarial diffusion bridge model for reliable adversarial purification

    Authors: Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, Xiaolin Hu

    Abstract: Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples. However, we find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification, to be suboptimal. This is due to an inherent trade-off between noise purification performance and data recovery quality. Additionally, the reliabilit… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 20 pages

  34. arXiv:2408.00296  [pdf, other

    cs.CV

    Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°

    Authors: Yuxiao He, Yiyu Zhuang, Yanwen Wang, Yao Yao, Siyu Zhu, Xiaoyu Li, Qi Zhang, Xun Cao, Hao Zhu

    Abstract: Creating a 360° parametric model of a human head is a very challenging task. While recent advancements have demonstrated the efficacy of leveraging synthetic data for building such parametric head models, their performance remains inadequate in crucial areas such as expression-driven animation, hairstyle editing, and text-based modifications. In this paper, we build a dataset of artist-designed hi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  35. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  36. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  37. arXiv:2407.20937  [pdf, other

    eess.IV cs.CV

    EAR: Edge-Aware Reconstruction of 3-D vertebrae structures from bi-planar X-ray images

    Authors: Lixing Tan, Shuang Song, Yaofeng He, Kangneng Zhou, Tong Lu, Ruoxiu Xiao

    Abstract: X-ray images ease the diagnosis and treatment process due to their rapid imaging speed and high resolution. However, due to the projection process of X-ray imaging, much spatial information has been lost. To accurately provide efficient spinal morphological and structural information, reconstructing the 3-D structures of the spine from the 2-D X-ray images is essential. It is challenging for curre… ▽ More

    Submitted 4 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 13 pages, 11 figures, 3 tables

  38. arXiv:2407.18625  [pdf, other

    cs.ET cs.AI cs.NE

    Topology Optimization of Random Memristors for Input-Aware Dynamic SNN

    Authors: Bo Wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang

    Abstract: There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  39. arXiv:2407.18569  [pdf, other

    cs.RO cs.AI cs.LG

    PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning

    Authors: Fangze Lin, Ying He, Fei Yu

    Abstract: Personalized motion planning holds significant importance within urban automated driving, catering to the unique requirements of individual users. Nevertheless, prior endeavors have frequently encountered difficulties in simultaneously addressing two crucial aspects: personalized planning within intricate urban settings and enhancing planning performance through data utilization. The challenge ari… ▽ More

    Submitted 4 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: IROS 2024 Accepted

  40. arXiv:2407.17942  [pdf, other

    cs.RO cs.IT

    A Novel Perception Entropy Metric for Optimizing Vehicle Perception with LiDAR Deployment

    Authors: Yongjiang He, Peng Cao, Zhongling Su, Xiaobo Liu

    Abstract: Developing an effective evaluation metric is crucial for accurately and swiftly measuring LiDAR perception performance. One major issue is the lack of metrics that can simultaneously generate fast and accurate evaluations based on either object detection or point cloud data. In this study, we propose a novel LiDAR perception entropy metric based on the probability of vehicle grid occupancy. This m… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  41. SNNGX: Securing Spiking Neural Networks with Genetic XOR Encryption on RRAM-based Neuromorphic Accelerator

    Authors: Kwunhang Wong, Songqi Wang, Wei Huang, Xinyuan Zhang, Yangu He, Karl M. H. Lai, Yuzhong Jiao, Ning Lin, Xiaojuan Qi, Xiaoming Chen, Zhongrui Wang

    Abstract: Biologically plausible Spiking Neural Networks (SNNs), characterized by spike sparsity, are growing tremendous attention over intellectual edge devices and critical bio-medical applications as compared to artificial neural networks (ANNs). However, there is a considerable risk from malicious attempts to extract white-box information (i.e., weights) from SNNs, as attackers could exploit well-traine… ▽ More

    Submitted 26 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: International Conference on Computer-Aided Design 2024

  42. arXiv:2407.14829  [pdf, other

    cs.CL

    Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

    Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

    Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More

    Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

  43. arXiv:2407.14537  [pdf

    physics.soc-ph cs.SI stat.AP stat.ME

    Small but not least changes: The Art of Creating Disruptive Innovations

    Authors: Youwei He, Jeong-Dong Lee

    Abstract: In the ever-evolving landscape of technology, product innovation thrives on replacing outdated technologies with groundbreaking ones or through the ingenious recombination of existing technologies. Our study embarks on a revolutionary journey by genetically representing products, extracting their chromosomal data, and constructing a comprehensive phylogenetic network of automobiles. We delve deep… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  44. arXiv:2407.14292  [pdf, other

    cs.CV eess.IV

    Adaptive Frequency Enhancement Network for Single Image Deraining

    Authors: Fei Yan, Yuhong He, Keyu Chen, En Cheng, Jikang Ma

    Abstract: Image deraining aims to improve the visibility of images damaged by rainy conditions, targeting the removal of degradation elements such as rain streaks, raindrops, and rain accumulation. While numerous single image deraining methods have shown promising results in image enhancement within the spatial domain, real-world rain degradation often causes uneven damage across an image's entire frequency… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 8pages

  45. arXiv:2407.13155  [pdf, other

    cs.CV

    Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement

    Authors: Yulin He, Wei Chen, Tianci Xun, Yusong Tan

    Abstract: Occupancy prediction plays a pivotal role in autonomous driving (AD) due to the fine-grained geometric perception and general object recognition capabilities. However, existing methods often incur high computational costs, which contradicts the real-time demands of AD. To this end, we first evaluate the speed and memory usage of most public available methods, aiming to redirect the focus from sole… ▽ More

    Submitted 21 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  46. arXiv:2407.12817  [pdf, other

    cs.CL cs.SD eess.AS

    Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition

    Authors: Yuchun Shu, Bo Hu, Yifeng He, Hao Shi, Longbiao Wang, Jianwu Dang

    Abstract: Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method. A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses as the reference to find the wrong word position. Besides, the acoustic f… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  47. arXiv:2407.11913  [pdf, other

    cs.CV cs.LG

    Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data

    Authors: Tim Elsner, Paula Usinger, Victor Czech, Gregor Kobsik, Yanjiang He, Isaak Lim, Leif Kobbelt

    Abstract: In quantised autoencoders, images are usually split into local patches, each encoded by one token. This representation is redundant in the sense that the same number of tokens is spend per region, regardless of the visual information content in that region. Adaptive discretisation schemes like quadtrees are applied to allocate tokens for patches with varying sizes, but this just varies the region… ▽ More

    Submitted 5 August, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  48. arXiv:2407.10988  [pdf, other

    cs.LG

    Residual resampling-based physics-informed neural network for neutron diffusion equations

    Authors: Heng Zhang, Yun-Ling He, Dong Liu, Qin Hang, He-Min Yao, Di Xiang

    Abstract: The neutron diffusion equation plays a pivotal role in the analysis of nuclear reactors. Nevertheless, employing the Physics-Informed Neural Network (PINN) method for its solution entails certain limitations. Traditional PINN approaches often utilize fully connected network (FCN) architecture, which is susceptible to overfitting, training instability, and gradient vanishing issues as the network d… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  49. arXiv:2407.10688  [pdf, other

    cs.LG

    Probability Passing for Graph Neural Networks: Graph Structure and Representations Joint Learning

    Authors: Ziyan Wang, YaXuan He, Bin Liu

    Abstract: Graph Neural Networks (GNNs) have achieved notable success in the analysis of non-Euclidean data across a wide range of domains. However, their applicability is constrained by the dependence on the observed graph structure. To solve this problem, Latent Graph Inference (LGI) is proposed to infer a task-specific latent structure by computing similarity or edge probability of node features and then… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  50. arXiv:2407.10281  [pdf, other

    cs.CV

    Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning

    Authors: Xinyuan Gao, Songlin Dong, Yuhang He, Qiang Wang, Yihong Gong

    Abstract: The problem of Rehearsal-Free Continual Learning (RFCL) aims to continually learn new knowledge while preventing forgetting of the old knowledge, without storing any old samples and prototypes. The latest methods leverage large-scale pre-trained models as the backbone and use key-query matching to generate trainable prompts to learn new knowledge. However, the domain gap between the pre-training d… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV2024