Skip to main content

Showing 1–50 of 934 results for author: Tang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03421  [pdf

    cs.RO

    F3T: A soft tactile unit with 3D force and temperature mathematical decoupling ability for robots

    Authors: Xiong Yang, Hao Ren, Dong Guo, Zhengrong Ling, Tieshan Zhang, Gen Li, Yifeng Tang, Haoxiang Zhao, Jiale Wang, Hongyuan Chang, Jia Dong, Yajing Shen

    Abstract: The human skin exhibits remarkable capability to perceive contact forces and environmental temperatures, providing intricate information essential for nuanced manipulation. Despite recent advancements in soft tactile sensors, a significant challenge remains in accurately decoupling signals - specifically, separating force from directional orientation and temperature - resulting in fail to meet the… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.02727  [pdf, other

    cs.CL cs.IR

    Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?

    Authors: Yixuan Tang, Yi Yang

    Abstract: The significant advancements of Large Language Models (LLMs) in generative tasks have led to a growing body of work exploring LLM-based embedding models. While these models, employing different pooling and attention strategies, have achieved state-of-the-art performance on public embedding benchmarks, questions still arise about what constitutes an effective design for LLM-based embedding models.… ▽ More

    Submitted 5 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: https://github.com/yixuantt/PoolingAndAttn

  3. arXiv:2409.01780  [pdf, other

    cs.CL

    State-of-the-art Advances of Deep-learning Linguistic Steganalysis Research

    Authors: Yihao Wang, Ru Zhang, Yifan Tang, Jianyi Liu

    Abstract: With the evolution of generative linguistic steganography techniques, conventional steganalysis falls short in robustly quantifying the alterations induced by steganography, thereby complicating detection. Consequently, the research paradigm has pivoted towards deep-learning-based linguistic steganalysis. This study offers a comprehensive review of existing contributions and evaluates prevailing d… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by 2023 International Conference on Data, Information and Computing Science

    Report number: no. 316

  4. arXiv:2409.00839  [pdf, other

    cs.CV cs.AI cs.IT

    Entropy Loss: An Interpretability Amplifier of 3D Object Detection Network for Intelligent Driving

    Authors: Haobo Yang, Shiyan Zhang, Zhuoyi Yang, Xinyu Zhang, Li Wang, Yifan Tang, Jilong Guo, Jun Li

    Abstract: With the increasing complexity of the traffic environment, the significance of safety perception in intelligent driving is intensifying. Traditional methods in the field of intelligent driving perception rely on deep learning, which suffers from limited interpretability, often described as a "black box." This paper introduces a novel type of loss function, termed "Entropy Loss," along with an inno… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  5. arXiv:2408.15966  [pdf, other

    cs.CV cs.AI cs.CL

    More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding

    Authors: Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Jinfeng Xu, Yixue Hao, Long Hu, Min Chen

    Abstract: Enabling Large Language Models (LLMs) to comprehend the 3D physical world remains a significant challenge. Due to the lack of large-scale 3D-text pair datasets, the success of LLMs has yet to be replicated in 3D understanding. In this paper, we rethink this issue and propose a new task: 3D Data-Efficient Point-Language Understanding. The goal is to enable LLMs to achieve robust 3D object understan… ▽ More

    Submitted 5 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  6. arXiv:2408.15710  [pdf, other

    cs.CL

    Conan-embedding: General Text Embedding with More and Better Negative Samples

    Authors: Shiyu Li, Yang Tang, Shizhe Chen, Xi Chen

    Abstract: With the growing popularity of RAG, the capabilities of embedding models are gaining increasing attention. Embedding models are primarily trained through contrastive loss learning, with negative examples being a key component. Previous work has proposed various hard negative mining strategies, but these strategies are typically employed as preprocessing steps. In this paper, we propose the conan-e… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  7. arXiv:2408.15558  [pdf, ps, other

    cs.IT

    New quantum codes from constacyclic codes over finite chain rings

    Authors: Yongsheng Tang, Ting Yao, Heqian Xu, Xiaoshan Kai

    Abstract: Let $R$ be the finite chain ring $\mathbb{F}_{p^{2m}}+{u}\mathbb{F}_{p^{2m}}$, where $\mathbb{F}_{p^{2m}}$ is the finite field with $p^{2m}$ elements, $p$ is a prime, $m$ is a non-negative integer and ${u}^{2}=0.$ In this paper, we firstly define a class of Gray maps, which changes the Hermitian self-orthogonal property of linear codes over $\mathbb{F}_{2^{2m}}+{u}\mathbb{F}_{2^{2m}}$ into the H… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  8. arXiv:2408.13983  [pdf, other

    cs.CV

    Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation

    Authors: Yushun Tang, Shuoshuo Chen, Zhihe Lu, Xinchao Wang, Zhihai He

    Abstract: Transformer-based methods have achieved remarkable success in various machine learning tasks. How to design efficient test-time adaptation methods for transformer models becomes an important research task. In this work, motivated by the dual-subband wavelet lifting scheme developed in multi-scale signal processing which is able to efficiently separate the input signals into principal components an… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  9. arXiv:2408.12664  [pdf, other

    cs.AI q-bio.NC

    Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

    Authors: Zhonghao He, Jascha Achterberg, Katie Collins, Kevin Nejad, Danyal Akarca, Yinzhu Yang, Wes Gurnee, Ilia Sucholutsky, Yuhan Tang, Rebeca Ianov, George Ogden, Chole Li, Kai Sandbrink, Stephen Casper, Anna Ivanova, Grace W. Lindsay

    Abstract: As deep learning systems are scaled up to many billions of parameters, relating their internal structure to external behaviors becomes very challenging. Although daunting, this problem is not new: Neuroscientists and cognitive scientists have accumulated decades of experience analyzing a particularly complex system - the brain. In this work, we argue that interpreting both biological and artificia… ▽ More

    Submitted 25 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  10. arXiv:2408.12419  [pdf, other

    cs.LG cs.AI

    4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment

    Authors: Kaihui Cheng, Ce Liu, Qingkun Su, Jun Wang, Liwei Zhang, Yining Tang, Yao Yao, Siyu Zhu, Yuan Qi

    Abstract: Protein structure prediction is pivotal for understanding the structure-function relationship of proteins, advancing biological research, and facilitating pharmaceutical development and experimental design. While deep learning methods and the expanded availability of experimental 3D protein structures have accelerated structure prediction, the dynamic nature of protein structures has received limi… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  11. arXiv:2408.12413  [pdf, other

    q-bio.BM cs.AI

    Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures

    Authors: Ce Liu, Jun Wang, Zhiqiang Cai, Yingxu Wang, Huizhen Kuang, Kaihui Cheng, Liwei Zhang, Qingkun Su, Yining Tang, Fenglei Cao, Limei Han, Siyu Zhu, Yuan Qi

    Abstract: Despite significant progress in static protein structure collection and prediction, the dynamic behavior of proteins, one of their most vital characteristics, has been largely overlooked in prior research. This oversight can be attributed to the limited availability, diversity, and heterogeneity of dynamic protein datasets. To address this gap, we propose to enhance existing prestigious static 3D… ▽ More

    Submitted 4 September, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  12. arXiv:2408.12095  [pdf, other

    cs.CL cs.AI cs.LG

    uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

    Authors: Aishik Nagar, Yutong Liu, Andy T. Liu, Viktor Schlegel, Vijay Prakash Dwivedi, Arun-Kumar Kaliya-Perumal, Guna Pratheep Kalanchiam, Yili Tang, Robby T. Tan

    Abstract: Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as fai… ▽ More

    Submitted 25 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 12 pages

  13. arXiv:2408.12009  [pdf, other

    cs.CV

    CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

    Authors: Yunlong Tang, Gen Zhan, Li Yang, Yiting Liao, Chenliang Xu

    Abstract: Video saliency prediction aims to identify the regions in a video that attract human attention and gaze, driven by bottom-up features from the video and top-down processes like memory and cognition. Among these top-down influences, language plays a crucial role in guiding attention by shaping how visual information is interpreted. Existing methods primarily focus on modeling perceptual information… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  14. arXiv:2408.11210  [pdf, other

    cs.CV

    A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation

    Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Daguang Xu, Wenqi Li

    Abstract: Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  15. arXiv:2408.10600  [pdf

    cs.CV cs.AI

    Breast tumor classification based on self-supervised contrastive learning from ultrasound videos

    Authors: Yunxin Tang, Siyuan Tang, Jian Zhang, Hao Chen

    Abstract: Background: Breast ultrasound is prominently used in diagnosing breast tumors. At present, many automatic systems based on deep learning have been developed to help radiologists in diagnosis. However, training such systems remains challenging because they are usually data-hungry and demand amounts of labeled data, which need professional knowledge and are expensive. Methods: We adopted a triplet n… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  16. arXiv:2408.06798  [pdf, other

    cs.CV

    Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning

    Authors: Shibo Jie, Yehui Tang, Jianyuan Guo, Zhi-Hong Deng, Kai Han, Yunhe Wang

    Abstract: Token compression expedites the training and inference of Vision Transformers (ViTs) by reducing the number of the redundant tokens, e.g., pruning inattentive tokens or merging similar tokens. However, when applied to downstream tasks, these approaches suffer from significant performance drop when the compression degrees are mismatched between training and inference stages, which limits the applic… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV2024

  17. arXiv:2408.05981  [pdf, other

    cs.RO

    CAD-Mesher: A Convenient, Accurate, Dense Mesh-based Mapping Module in SLAM for Dynamic Environments

    Authors: Yanpeng Jia, Fengkui Cao, Ting Wang, Yandong Tang, Shiliang Shao, Lianqing Liu

    Abstract: Most LiDAR odometry and SLAM systems construct maps in point clouds, which are discrete and sparse when zoomed in, making them not directly suitable for navigation. Mesh maps represent a dense and continuous map format with low memory consumption, which can approximate complex structures with simple elements, attracting significant attention of researchers in recent years. However, most implementa… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures

  18. arXiv:2408.05524  [pdf, other

    cs.CL cs.DB

    Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs

    Authors: Kexin Ma, Ruochun Jin, Xi Wang, Huan Chen, Jing Ren, Yuhua Tang

    Abstract: Retrieval-Augmented Large Language Models (RALMs) have made significant strides in enhancing the accuracy of generated responses.However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods.We propose to boost the precision of RALMs' answers from a data quality perspective through the Contex… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  19. arXiv:2408.04227  [pdf, other

    eess.IV cs.CV

    Physical prior guided cooperative learning framework for joint turbulence degradation estimation and infrared video restoration

    Authors: Ziran Zhang, Yuhang Tang, Zhigang Wang, Yueting Chen, Bin Zhao

    Abstract: Infrared imaging and turbulence strength measurements are in widespread demand in many fields. This paper introduces a Physical Prior Guided Cooperative Learning (P2GCL) framework to jointly enhance atmospheric turbulence strength estimation and infrared image restoration. P2GCL involves a cyclic collaboration between two models, i.e., a TMNet measures turbulence strength and outputs the refractiv… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 21

  20. arXiv:2408.00754  [pdf, other

    cs.CV cs.LG

    Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

    Authors: Benlin Liu, Yuhao Dong, Yiqin Wang, Yongming Rao, Yansong Tang, Wei-Chiu Ma, Ranjay Krishna

    Abstract: Multimodal language models (MLLMs) are increasingly being implemented in real-world environments, necessitating their ability to interpret 3D spaces and comprehend temporal dynamics. Despite their potential, current top models within our community still fall short in adequately understanding spatial and temporal dimensions. We introduce Coarse Correspondence, a simple, training-free, effective, an… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: project page: https://coarse-correspondence.github.io

  21. arXiv:2408.00346  [pdf, other

    cs.LG cs.AI

    Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce

    Authors: Houye Ji, Ye Tang, Zhaoxin Chen, Lixi Deng, Jun Hu, Lei Su

    Abstract: With the rapid development of the short video industry, traditional e-commerce has encountered a new paradigm, video-driven e-commerce, which leverages attractive videos for product showcases and provides both video and item services for users. Benefitting from the dynamic and visualized introduction of items,video-driven e-commerce has shown huge potential in stimulating consumer confidence and p… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  22. arXiv:2407.21369  [pdf, other

    cs.SE

    An LLM-based Readability Measurement for Unit Tests' Context-aware Inputs

    Authors: Zhichao Zhou, Yutian Tang, Yun Lin, Jingzhu He

    Abstract: Automated test techniques usually generate unit tests with higher code coverage than manual tests. However, the readability of automated tests is crucial for code comprehension and maintenance. The readability of unit tests involves many aspects. In this paper, we focus on test inputs. The central limitation of existing studies on input readability is that they focus on test codes alone without ta… ▽ More

    Submitted 18 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  23. arXiv:2407.20730  [pdf, other

    cs.CV

    Autogenic Language Embedding for Coherent Point Tracking

    Authors: Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang

    Abstract: Point tracking is a challenging task in computer vision, aiming to establish point-wise correspondence across long video sequences. Recent advancements have primarily focused on temporal modeling techniques to improve local feature similarity, often overlooking the valuable semantic consistency inherent in tracked points. In this paper, we introduce a novel approach leveraging language embeddings… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: accepted by ACM MM 2024

  24. arXiv:2407.20171  [pdf, other

    cs.CV

    Diffusion Feedback Helps CLIP See Better

    Authors: Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang

    Abstract: Contrastive Language-Image Pre-training (CLIP), which excels at abstracting open-world representations across domains and modalities, has become a foundation for a variety of vision and multimodal tasks. However, recent studies reveal that CLIP has severe visual shortcomings, such as which can hardly distinguish orientation, quantity, color, structure, etc. These visual shortcomings also limit the… ▽ More

    Submitted 23 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  25. arXiv:2407.18039  [pdf, other

    cs.LG cs.AI

    Peak-Controlled Logits Poisoning Attack in Federated Distillation

    Authors: Yuhan Tang, Aoxu Zhang, Zhiyuan Wu, Bo Gao, Tian Wen, Yuwei Wang, Sheng Sun

    Abstract: Federated Distillation (FD) offers an innovative approach to distributed machine learning, leveraging knowledge distillation for efficient and flexible cross-device knowledge transfer without necessitating the upload of extensive model parameters to a central server. While FD has gained popularity, its vulnerability to poisoning attacks remains underexplored. To address this gap, we previously int… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.03685

  26. arXiv:2407.16697  [pdf, other

    cs.CV

    AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking

    Authors: Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, Yutong Tang, Yining Cao, Haoqi Han, Zheyuan Zhang, Jiawei Liu, Tiezheng Zhang, Yujiu Ma, Jincheng Wang, Guang Zhang, Alan Yuille, Zongwei Zhou

    Abstract: We introduce the largest abdominal CT dataset (termed AbdomenAtlas) of 20,460 three-dimensional CT volumes sourced from 112 hospitals across diverse populations, geographies, and facilities. AbdomenAtlas provides 673K high-quality masks of anatomical structures in the abdominal region annotated by a team of 10 radiologists with the help of AI algorithms. We start by having expert radiologists manu… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Published in Medical Image Analysis

  27. arXiv:2407.15613  [pdf, other

    cs.CV

    Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

    Authors: Xiangyan Qu, Jing Yu, Keke Gai, Jiamin Zhuang, Yuanmin Tang, Gang Xiong, Gaopeng Gou, Qi Wu

    Abstract: Recent work shows that documents from encyclopedias serve as helpful auxiliary information for zero-shot learning. Existing methods align the entire semantics of a document with corresponding images to transfer knowledge. However, they disregard that semantic information is not equivalent between them, resulting in a suboptimal alignment. In this work, we propose a novel network to extract multi-v… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM International Conference on Multimedia (MM) 2024

  28. arXiv:2407.12051  [pdf, other

    q-bio.GN cs.AI cs.LG

    Dy-mer: An Explainable DNA Sequence Representation Scheme using Sparse Recovery

    Authors: Zhiyuan Peng, Yuanbo Tang, Yang Li

    Abstract: DNA sequences encode vital genetic and biological information, yet these unfixed-length sequences cannot serve as the input of common data mining algorithms. Hence, various representation schemes have been developed to transform DNA sequences into fixed-length numerical representations. However, these schemes face difficulties in learning high-quality representations due to the complexity and spar… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  29. arXiv:2407.11504  [pdf, other

    cs.IR

    Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval

    Authors: Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query. Recent studies have highlighted the potential of a strong generative retrieval model, trained with carefully crafted pre-training tasks, to enhance downstream retrieval tasks via fine-tuning. However, the full power of pre-training for generative retrieval remains unde… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL Findings 2024

  30. arXiv:2407.11480  [pdf, other

    cs.LG cs.AI

    AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models

    Authors: Lei Ren, Haiteng Wang, Yang Tang, Chunhua Yang

    Abstract: With the remarkable success of generative models like ChatGPT, Artificial Intelligence Generated Content (AIGC) is undergoing explosive development. Not limited to text and images, generative models can generate industrial time series data, addressing challenges such as the difficulty of data collection and data annotation. Due to their outstanding generation ability, they have been widely used in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 17 pages, 4 figures.This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  31. arXiv:2407.10627  [pdf, other

    cs.CL cs.AI cs.LG

    Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

    Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Qingwei Lin, Jianguang Lou, Shifeng Chen, Yansong Tang, Weizhu Chen

    Abstract: Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate thes… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  32. arXiv:2407.10332  [pdf, other

    cs.CY cs.LG cs.MA

    Ontology-driven Reinforcement Learning for Personalized Student Support

    Authors: Ryan Hare, Ying Tang

    Abstract: In the search for more effective education, there is a widespread effort to develop better approaches to personalize student education. Unassisted, educators often do not have time or resources to personally support every student in a given classroom. Motivated by this issue, and by recent advancements in artificial intelligence, this paper presents a general-purpose framework for personalized stu… ▽ More

    Submitted 5 September, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, in press for IEEE Systems, Man, and Cybernetics 2024 Conference

  33. arXiv:2407.10068  [pdf, other

    cs.CL

    Multi-Granularity Semantic Revision for Large Language Model Distillation

    Authors: Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

    Abstract: Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art st… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  34. arXiv:2407.09417  [pdf, other

    cs.CL cs.IR

    Mitigating Entity-Level Hallucination in Large Language Models

    Authors: Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu

    Abstract: The emergence of Large Language Models (LLMs) has revolutionized how users access information, shifting from traditional search engines to direct question-and-answer interactions with LLMs. However, the widespread adoption of LLMs has revealed a significant challenge known as hallucination, wherein LLMs generate coherent yet factually inaccurate responses. This hallucination phenomenon has led to… ▽ More

    Submitted 22 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  35. arXiv:2407.08919  [pdf, other

    cs.NI cs.ET eess.SP

    Redefinition of Digital Twin and its Situation Awareness Framework Designing Towards Fourth Paradigm for Energy Internet of Things

    Authors: Xing He, Yuezhong Tang, Shuyan Ma, Qian Ai, Fei Tao, Robert Qiu

    Abstract: Traditional knowledge-based situation awareness (SA) modes struggle to adapt to the escalating complexity of today's Energy Internet of Things (EIoT), necessitating a pivotal paradigm shift. In response, this work introduces a pioneering data-driven SA framework, termed digital twin-based situation awareness (DT-SA), aiming to bridge existing gaps between data and demands, and further to enhance S… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 16 pages, 15 figures Accepted by IEEE Transactions on Systems, Man and Cybernetics: Systems

  36. arXiv:2407.08035  [pdf, other

    cs.CL cs.IR

    FsPONER: Few-shot Prompt Optimization for Named Entity Recognition in Domain-specific Scenarios

    Authors: Yongjian Tang, Rakebul Hasan, Thomas Runkler

    Abstract: Large Language Models (LLMs) have provided a new pathway for Named Entity Recognition (NER) tasks. Compared with fine-tuning, LLM-powered prompting methods avoid the need for training, conserve substantial computational resources, and rely on minimal annotated data. Previous studies have achieved comparable performance to fully supervised BERT-based fine-tuning approaches on general NER benchmarks… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: accepted for publication at the 27th European Conference on Artificial Intelligence (ECAI-2024)

  37. arXiv:2407.07468  [pdf, other

    cs.CV

    Rethinking Few-shot Class-incremental Learning: Learning from Yourself

    Authors: Yu-Ming Tang, Yi-Xing Peng, Jingke Meng, Wei-Shi Zheng

    Abstract: Few-shot class-incremental learning (FSCIL) aims to learn sequential classes with limited samples in a few-shot fashion. Inherited from the classical class-incremental learning setting, the popular benchmark of FSCIL uses averaged accuracy (aAcc) and last-task averaged accuracy (lAcc) as the evaluation metrics. However, we reveal that such evaluation metrics may not provide adequate emphasis on th… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  38. arXiv:2407.06938  [pdf, other

    cs.CV

    RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

    Authors: Bowen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo

    Abstract: We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image. Existing methods fail to capture intricate details such as hairstyles which we tackle in this paper. We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we raise a nov… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://rodinhd.github.io/

  39. arXiv:2407.06513  [pdf, other

    cs.CV

    Computer vision tasks for intelligent aerospace missions: An overview

    Authors: Huilin Chen, Qiyu Sun, Fangfei Li, Yang Tang

    Abstract: Computer vision tasks are crucial for aerospace missions as they help spacecraft to understand and interpret the space environment, such as estimating position and orientation, reconstructing 3D models, and recognizing objects, which have been extensively studied to successfully carry out the missions. However, traditional methods like Kalman Filtering, Structure from Motion, and Multi-View Stereo… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 23 pages, 7 figures, journal

  40. arXiv:2407.05563  [pdf, other

    cs.CL

    LLMBox: A Comprehensive Library for Large Language Models

    Authors: Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Demo

  41. arXiv:2407.05396  [pdf, other

    cs.CR cs.AI

    Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense

    Authors: Qi Zhou, Zipeng Ye, Yubo Tang, Wenjian Luo, Yuhui Shi, Yan Jia

    Abstract: Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effective… ▽ More

    Submitted 14 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  42. arXiv:2407.04622  [pdf, other

    cs.LG

    On scalable oversight with weak LLMs judging strong LLMs

    Authors: Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah

    Abstract: Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI a… ▽ More

    Submitted 12 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 15 pages (53 including appendices). V2: minor correction to Figure 3; add Figure A.9 comparing open vs assigned consultancy; add a reference

  43. arXiv:2407.04381  [pdf, other

    cs.CV cs.AI

    Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection

    Authors: Zhiqiang Yang, Qiu Guan, Keer Zhao, Jianmin Yang, Xinli Xu, Haixia Long, Ying Tang

    Abstract: Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  44. arXiv:2407.03543  [pdf, ps, other

    cs.CR

    Asymmetric Mempool DoS Security: Formal Definitions and Provable Secure Designs

    Authors: Wanning Ding, Yibo Wang, Yuzhe Tang

    Abstract: The mempool plays a crucial role in blockchain systems as a buffer zone for pending transactions before they are executed and included in a block. However, existing works primarily focus on mitigating defenses against already identified real-world attacks. This paper introduces secure blockchain-mempool designs capable of defending against any form of asymmetric eviction DoS attacks. We establish… ▽ More

    Submitted 24 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  45. arXiv:2407.03307  [pdf, other

    eess.IV cs.CV

    HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

    Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

    Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  46. arXiv:2407.02345  [pdf, other

    cs.CL

    MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space

    Authors: Yihong Tang, Bo Wang, Dongming Zhao, Xiaojia Jin, Jijun Zhang, Ruifang He, Yuexian Hou

    Abstract: Personalized Dialogue Generation (PDG) aims to create coherent responses according to roles or personas. Traditional PDG relies on external role data, which can be scarce and raise privacy concerns. Approaches address these issues by extracting role information from dialogue history, which often fail to generically model roles in continuous space. To overcome these limitations, we introduce a nove… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  47. arXiv:2407.01414  [pdf, other

    cs.CV

    StyleShot: A Snapshot on Any Style

    Authors: Junyao Gao, Yanchen Liu, Yanan Sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao

    Abstract: In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: project page:https://styleshot.github.io/

  48. arXiv:2407.01007  [pdf, other

    cs.CV

    GMT: A Robust Global Association Model for Multi-Target Multi-Camera Tracking

    Authors: Huijie Fan, Tinghui Zhao, Qiang Wang, Baojie Fan, Yandong Tang, LianQing Liu

    Abstract: In the task of multi-target multi-camera (MTMC) tracking of pedestrians, the data association problem is a key issue and main challenge, especially with complications arising from camera movements, lighting variations, and obstructions. However, most MTMC models adopt two-step approaches, thus heavily depending on the results of the first-step tracking in practical applications. Moreover, the same… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  49. arXiv:2407.00871  [pdf, other

    cs.DC cs.DS math.NA

    A Reexamination of the Communication Bandwidth Cost Analysis of A Parallel Recursive Algorithm for Solving Triangular Systems of Linear Equations

    Authors: Yuan Tang

    Abstract: This paper presents a reexamination of the research paper titled "Communication-Avoiding Parallel Algorithms for \proc{TRSM}" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify potential issues that require clarification or revision. The problem at hand is the need to address inconsistencies and miscalculations found in the analysis, p… ▽ More

    Submitted 9 April, 2024; originally announced July 2024.

    Comments: 2 pages, comment on arXiv:1612.01855

  50. arXiv:2407.00603  [pdf, other

    cs.CV

    Hierarchical Memory for Long Video QA

    Authors: Yiqin Wang, Haoji Zhang, Yansong Tang, Yong Liu, Jiashi Feng, Jifeng Dai, Xiaojie Jin

    Abstract: This paper describes our champion solution to the LOVEU Challenge @ CVPR'24, Track 1 (Long Video VQA). Processing long sequences of visual tokens is computationally expensive and memory-intensive, making long video question-answering a challenging task. The key is to compress visual tokens effectively, reducing memory footprint and decoding latency, while preserving the essential information for a… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.