Skip to main content

Showing 1–50 of 653 results for author: Shen, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.09808  [pdf, other

    cs.CV cs.AI

    Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion

    Authors: Hui Shen, Zhongwei Wan, Xin Wang, Mi Zhang

    Abstract: Mamba and Vision Mamba (Vim) models have shown their potential as an alternative to methods based on Transformer architecture. This work introduces Fast Mamba for Vision (Famba-V), a cross-layer token fusion technique to enhance the training efficiency of Vim models. The key idea of Famba-V is to identify and fuse similar tokens across different Vim layers based on a suit of cross-layer strategies… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

    Comments: Camera ready version of ECCV 2024 The Fourth Workshop on Computational Aspects of Deep Learning

  2. arXiv:2409.09603  [pdf, other

    cs.AI cs.CL cs.LG

    Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison

    Authors: Judy Hanwen Shen, Archit Sharma, Jun Qin

    Abstract: The goal of aligning language models to human preferences requires data that reveal these preferences. Ideally, time and money can be spent carefully collecting and tailoring bespoke preference data to each downstream application. However, in practice, a select few publicly available preference datasets are often used to train reward models for reinforcement learning from human feedback (RLHF). Wh… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: Working Paper

  3. arXiv:2409.09586  [pdf, other

    cs.HC cs.AI cs.CL

    ValueCompass: A Framework of Fundamental Values for Human-AI Alignment

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Yu-Ju Yang, Tanushree Mitra, Yun Huang

    Abstract: As AI systems become more advanced, ensuring their alignment with a diverse range of individuals and societal values becomes increasingly critical. But how can we capture fundamental human values and assess the degree to which AI systems align with them? We introduce ValueCompass, a framework of fundamental values, grounded in psychological theory and a systematic review, to identify and evaluate… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

  4. arXiv:2409.08775  [pdf, other

    cs.HC cs.AI

    What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMs

    Authors: Qianou Ma, Weirui Peng, Hua Shen, Kenneth Koedinger, Tongshuang Wu

    Abstract: Prompting ChatGPT to achieve complex goals (e.g., creating a customer support chatbot) often demands meticulous prompt engineering, including aspects like fluent writing and chain-of-thought techniques. While emerging prompt optimizers can automatically refine many of these aspects, we argue that clearly conveying customized requirements (e.g., how to handle diverse inputs) remains a human-centric… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 15 pages, 5 figures

  5. arXiv:2409.08330  [pdf, other

    cs.CL cs.CY cs.HC

    Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue

    Authors: Jonathan Ivey, Shivani Kumar, Jiayu Liu, Hua Shen, Sushrita Rakshit, Rohan Raju, Haotian Zhang, Aparna Ananthasubramaniam, Junghwan Kim, Bowen Yi, Dustin Wright, Abraham Israeli, Anders Giovanni Møller, Lechen Zhang, David Jurgens

    Abstract: Studying and building datasets for dialogue tasks is both expensive and time-consuming due to the need to recruit, train, and collect data from study participants. In response, much recent work has sought to use large language models (LLMs) to simulate both human-human and human-LLM interactions, as they have been shown to generate convincingly human-like text in many settings. However, to what ex… ▽ More

    Submitted 16 September, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

  6. arXiv:2409.05840  [pdf, other

    cs.CL

    MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

    Authors: Run Luo, Haonan Zhang, Longze Chen, Ting-En Lin, Xiong Liu, Yuchuan Wu, Min Yang, Minzheng Wang, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Yunshui Li, Xiaobo Xia, Fei Huang, Jingkuan Song, Yongbin Li

    Abstract: The development of Multimodal Large Language Models (MLLMs) has seen significant advancements with increasing demands in various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches attempt to enhance MLLMs capabilities through diverse architectures, the gains have become increasingly marginal. Conversely, data-driven methods, which scale up image-text instruction… ▽ More

    Submitted 15 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

  7. arXiv:2409.04682  [pdf, other

    cs.IT

    Hybrid Beamforming with Widely-spaced-array for Multi-user Cross-Near-and-Far-Field Communications

    Authors: Heyin Shen, Yuhang Chen, Chong Han, Jinhong Yuan

    Abstract: With multi-GHz bandwidth, Terahertz (THz) beamforming has drawn increasing attention in the sixth generation (6G) and beyond communications. Existing beamforming designs mainly focus on a compact antenna array where typical communication occurs in the far-field. However, in dense multi-user scenarios, only relying on far-field angle domain fails to distinguish users at similar angles. Therefore, a… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  8. arXiv:2409.03996  [pdf, other

    cs.LG cs.RO

    Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance

    Authors: RenMing Huang, Shaochong Liu, Yunqiang Pei, Peng Wang, Guoqing Wang, Yang Yang, Hengtao Shen

    Abstract: In this work, we address the challenging problem of long-horizon goal-reaching policy learning from non-expert, action-free observation data. Unlike fully labeled expert data, our data is more accessible and avoids the costly process of action labeling. Additionally, compared to online learning, which often involves aimless exploration, our data provides useful guidance for more efficient explorat… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted to CoRL 2024

  9. arXiv:2409.03386  [pdf, other

    cs.IT eess.SP

    Movable Antennas: Channel Measurement, Modeling, and Performance Evaluation

    Authors: Yiqin Wang, Heyin Shen, Chong Han, Meixia Tao

    Abstract: Since decades ago, multi-antenna has become a key enabling technology in the evolution of wireless communication systems. In contrast to conventional multi-antenna systems that contain antennas at fixed positions, position-flexible antenna systems have been proposed to fully utilize the spatial variation of wireless channels. In this paper, movable antenna (MA) systems are analyzed from channel me… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 12 pages, 31 figures

  10. arXiv:2409.00942  [pdf, other

    cs.CV

    VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization

    Authors: Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen

    Abstract: Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in multi-class anomaly detection, wherein the normal data is compounded with multiple classes without providing class labels. Through the integration of… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  11. arXiv:2409.00941  [pdf, other

    cs.IT eess.SP

    Frequency-Position-Fluid Antenna Array for Ultra-dense Connectivity in Terahertz Beamforming Systems

    Authors: Heyin Shen, Chong Han, Hao Liu, Tao Yang

    Abstract: The position-fluid antenna (PFA) architecture has become one of the appealing technologies to support ubiquitous connectivity demand in next-generation wireless systems. Specifically, allowing the antenna to adjust its physical position to one of the predefined ports within a fixed region can introduce additional spatial diversity and improve the signal-to-interference-plus-noise ratio (SINR). In… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  12. arXiv:2409.00862  [pdf, other

    cs.HC

    User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions

    Authors: Xianzhe Fan, Qing Xiao, Xuhui Zhou, Jiaxin Pei, Maarten Sap, Zhicong Lu, Hong Shen

    Abstract: Large language model-based AI companions are increasingly viewed by users as friends or romantic partners, leading to deep emotional bonds. However, they can generate biased, discriminatory, and harmful outputs. Recently, users are taking the initiative to address these harms and re-align AI companions. We introduce the concept of user-driven value alignment, where users actively identify, challen… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 17 pages, 1 figure

  13. arXiv:2408.17245  [pdf, other

    cs.NE

    Stepwise Weighted Spike Coding for Deep Spiking Neural Networks

    Authors: Yiwen Gu, Junchuan Gu, Haibin Shen, Kejie Huang

    Abstract: Spiking Neural Networks (SNNs) seek to mimic the spiking behavior of biological neurons and are expected to play a key role in the advancement of neural computing and artificial intelligence. The efficiency of SNNs is often determined by the neural coding schemes. Existing coding schemes either cause huge delays and energy consumption or necessitate intricate neuron models and training techniques.… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  14. arXiv:2408.13040  [pdf, other

    eess.AS cs.AI cs.CL cs.LG

    SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks

    Authors: Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-yi Lee

    Abstract: Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address va… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 3730-3744, 2024

  15. arXiv:2408.10666  [pdf, other

    cs.IR

    Accelerating the Surrogate Retraining for Poisoning Attacks against Recommender Systems

    Authors: Yunfan Wu, Qi Cao, Shuchang Tao, Kaike Zhang, Fei Sun, Huawei Shen

    Abstract: Recent studies have demonstrated the vulnerability of recommender systems to data poisoning attacks, where adversaries inject carefully crafted fake user interactions into the training data of recommenders to promote target items. Current attack methods involve iteratively retraining a surrogate recommender on the poisoned data with the latest fake users to optimize the attack. However, this repet… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by RecSys 2024

  16. arXiv:2408.10528  [pdf, other

    cs.CL

    NoMatterXAI: Generating "No Matter What" Alterfactual Examples for Explaining Black-Box Text Classification Models

    Authors: Tuc Nguyen, James Michels, Hua Shen, Thai Le

    Abstract: In Explainable AI (XAI), counterfactual explanations (CEs) are a well-studied method to communicate feature relevance through contrastive reasoning of "what if" to explain AI models' predictions. However, they only focus on important (i.e., relevant) features and largely disregard less important (i.e., irrelevant) ones. Such irrelevant features can be crucial in many applications, especially when… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  17. arXiv:2408.07906  [pdf, other

    cs.LG cs.AI cs.NE math.NA

    KAN versus MLP on Irregular or Noisy Functions

    Authors: Chen Zeng, Jiahui Wang, Haoran Shen, Qiao Wang

    Abstract: In this paper, we compare the performance of Kolmogorov-Arnold Networks (KAN) and Multi-Layer Perceptron (MLP) networks on irregular or noisy functions. We control the number of parameters and the size of the training samples to ensure a fair comparison. For clarity, we categorize the functions into six types: regular functions, continuous functions with local non-differentiable points, functions… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  18. arXiv:2408.06740  [pdf, other

    cs.CV cs.AI

    DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion

    Authors: Yujia Wu, Yiming Shi, Jiwei Wei, Chengwei Sun, Yuyang Zhou, Yang Yang, Heng Tao Shen

    Abstract: Personalized text-to-image generation has gained significant attention for its capability to generate high-fidelity portraits of specific identities conditioned on user-defined prompts. Existing methods typically involve test-time fine-tuning or instead incorporating an additional pre-trained branch. However, these approaches struggle to simultaneously address the demands of efficiency, identity f… ▽ More

    Submitted 18 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 9 pages,8 figures

  19. arXiv:2408.06658  [pdf, other

    cs.SI

    ComGPT: Detecting Local Community Structure with Large Language Models

    Authors: Li Ni, Haowen Shen, Lin Mu, Yiwen Zhang, Wenjian Luo

    Abstract: Large Language Models (LLMs), like GPT, have demonstrated the ability to understand graph structures and have achieved excellent performance in various graph reasoning tasks, such as node classification. Despite their strong abilities in graph reasoning tasks, they lack specific domain knowledge and have a weaker understanding of community-related graph information, which hinders their capabilitie… ▽ More

    Submitted 12 September, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  20. arXiv:2408.06359  [pdf, other

    eess.SP cs.AI cs.LG

    An Adaptive CSI Feedback Model Based on BiLSTM for Massive MIMO-OFDM Systems

    Authors: Hongrui Shen, Long Zhao, Kan Zheng, Yuhua Cao, Pingzhi Fan

    Abstract: Deep learning (DL)-based channel state information (CSI) feedback has the potential to improve the recovery accuracy and reduce the feedback overhead in massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems. However, the length of input CSI and the number of feedback bits should be adjustable in different scenarios, which can not be efficiently achie… ▽ More

    Submitted 26 July, 2024; originally announced August 2024.

    Comments: 13 pages, 14 figures, 3 tables

  21. arXiv:2408.05775  [pdf, other

    cs.CV

    Efficient Test-Time Prompt Tuning for Vision-Language Models

    Authors: Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, Limin Wang

    Abstract: Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods typically require per-image prompt adaptation during inference, which incurs high computational budgets and limits scalability and practical deployment. To overcome this issu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  22. arXiv:2408.04901  [pdf, other

    cs.RO

    CTE-MLO: Continuous-time and Efficient Multi-LiDAR Odometry with Localizability-aware Point Cloud Sampling

    Authors: Hongming Shen, Zhenyu Wu, Wei Wang, Qiyang Lyu, Huiqin Zhou, Tianchen Deng, Yeqing Zhu, Danwei Wang

    Abstract: In recent years, LiDAR-based localization and mapping methods have achieved significant progress thanks to their reliable and real-time localization capability. Considering single LiDAR odometry often faces hardware failures and degradation in practical scenarios, Multi-LiDAR Odometry (MLO), as an emerging technology, is studied to enhance the performance of LiDAR-based localization and mapping sy… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  23. arXiv:2408.04154  [pdf, other

    cs.LG cs.AI stat.ML

    The Data Addition Dilemma

    Authors: Judy Hanwen Shen, Inioluwa Deborah Raji, Irene Y. Chen

    Abstract: In many machine learning for healthcare tasks, standard datasets are constructed by amassing data across many, often fundamentally dissimilar, sources. But when does adding more data help, and when does it hinder progress on desired model outcomes in real-world settings? We identify this situation as the \textit{Data Addition Dilemma}, demonstrating that adding training data in this multi-source s… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Machine Learning For Health Care 2024 (MLHC)

  24. arXiv:2408.04107  [pdf, other

    cs.LG cs.DC

    Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference

    Authors: Zeyu Zhang, Haiying Shen

    Abstract: In large-language models, memory constraints in the key-value cache (KVC) pose a challenge during inference, especially with long prompts. In this work, we observed that compressing KV values is more effective than compressing the model regarding accuracy and job completion time (JCT). However, quantizing KV values and dropping less-important tokens incur significant runtime computational time ove… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  25. arXiv:2408.00491  [pdf, other

    cs.CL cs.CV cs.MM

    GalleryGPT: Analyzing Paintings with Large Multimodal Models

    Authors: Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

    Abstract: Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse interpretations, and complex visual elements, requiring expertise in art history, cultural background, and aesthetic theory. However, limited by the data… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted as Oral Presentation at ACM Multimedia 2024

  26. Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning

    Authors: Yi Bin, Junrong Liao, Yujuan Ding, Haoxuan Li, Yang Yang, See-Kiong Ng, Heng Tao Shen

    Abstract: Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information, thereby understanding and creating content of the physical world coherently like human-beings. Previous work on cross-modal coherence modeling attempted to leverage the order information from another modality to assist the coherence recovering of the target modality. Despite of the… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  27. arXiv:2407.19082  [pdf, other

    cs.LG cs.AI cs.CV cs.GR cs.HC

    Regularized Multi-Decoder Ensemble for an Error-Aware Scene Representation Network

    Authors: Tianyu Xiong, Skylar W. Wurster, Hanqi Guo, Tom Peterka, Han-Wei Shen

    Abstract: Feature grid Scene Representation Networks (SRNs) have been applied to scientific data as compact functional surrogates for analysis and visualization. As SRNs are black-box lossy data representations, assessing the prediction quality is critical for scientific visualization applications to ensure that scientists can trust the information being visualized. Currently, existing architectures do not… ▽ More

    Submitted 5 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: To be published in Proc. IEEE VIS 2024

  28. arXiv:2407.17757  [pdf, other

    cs.CV cs.RO

    CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

    Authors: Haicheng Liao, Haoyu Sun, Huanming Shen, Chengyue Wang, Kahou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

    Abstract: Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  29. arXiv:2407.17730  [pdf, other

    cs.CL

    Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?

    Authors: Hao Shen, Zihan Li, Minqiang Yang, Minghui Ni, Yongfeng Tao, Zhengyang Yu, Weihao Zheng, Chen Xu, Bin Hu

    Abstract: In contemporary society, the issue of psychological health has become increasingly prominent, characterized by the diversification, complexity, and universality of mental disorders. Cognitive Behavioral Therapy (CBT), currently the most influential and clinically effective psychological treatment method with no side effects, has limited coverage and poor quality in most countries. In recent years,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  30. arXiv:2407.16142  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Diffusion Models as Optimizers for Efficient Planning in Offline RL

    Authors: Renming Huang, Yunqiang Pei, Guoqing Wang, Yangming Zhang, Yang Yang, Peng Wang, Hengtao Shen

    Abstract: Diffusion models have shown strong competitiveness in offline reinforcement learning tasks by formulating decision-making as sequential generation. However, the practicality of these methods is limited due to the lengthy inference processes they require. In this paper, we address this problem by decomposing the sampling process of diffusion models into two decoupled subprocesses: 1) generating a f… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: The paper was accepted by ECCV2024

  31. arXiv:2407.14882  [pdf, other

    cs.LG cs.AI math.NA

    Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

    Authors: Haoran Shen, Chen Zeng, Jiahui Wang, Qiao Wang

    Abstract: It has been observed that even a small amount of noise introduced into the dataset can significantly degrade the performance of KAN. In this brief note, we aim to quantitatively evaluate the performance when noise is added to the dataset. We propose an oversampling technique combined with denoising to alleviate the impact of noise. Specifically, we employ kernel filtering based on diffusion maps f… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    MSC Class: 68T07

  32. arXiv:2407.12884  [pdf, other

    cs.LG cs.AI cs.CV cs.GR cs.HC

    SurroFlow: A Flow-Based Surrogate Model for Parameter Space Exploration and Uncertainty Quantification

    Authors: Jingyi Shen, Yuhan Duan, Han-Wei Shen

    Abstract: Existing deep learning-based surrogate models facilitate efficient data generation, but fall short in uncertainty quantification, efficient parameter space exploration, and reverse prediction. In our work, we introduce SurroFlow, a novel normalizing flow-based surrogate model, to learn the invertible transformation between simulation parameters and simulation outputs. The model not only allows acc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: To be published in Proc. IEEE VIS 2024

  33. arXiv:2407.10499  [pdf, other

    cs.CL

    CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

    Authors: Songyang Zhang, Chuyu Zhang, Yingfan Hu, Haowen Shen, Kuikun Liu, Zerun Ma, Fengzhe Zhou, Wenwei Zhang, Xuming He, Dahua Lin, Kai Chen

    Abstract: While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation f… ▽ More

    Submitted 25 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review. The first three authors contribute equally, and Songyang Zhang is the project leader

  34. arXiv:2407.08148  [pdf, other

    cs.CV

    SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

    Authors: Runmin Zhang, Jun Ma, Si-Yuan Cao, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen

    Abstract: We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent fe… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  35. arXiv:2407.07026  [pdf, other

    cs.CV cs.CL cs.MM cs.SI

    Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

    Authors: Daiqing Wu, Dongbao Yang, Huawen Shen, Can Ma, Yu Zhou

    Abstract: With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image and text from the same post can express different or even contradictory sentiments, leading to potential \textbf{sentiment discrepancy}. However, existing works mainly adopt a single-branch fusion structure that… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures

  36. arXiv:2407.06303  [pdf

    cs.CV cs.LG

    Unsupervised Fault Detection using SAM with a Moving Window Approach

    Authors: Ahmed Maged, Herman Shen

    Abstract: Automated f ault detection and monitoring in engineering are critical but frequently difficult owing to the necessity for collecting and labeling large amounts of defective samples . We present an unsupervised method that uses the high end Segment Anything Model (SAM) and a moving window approach. SAM has gained recognition in AI image segmentation communities for its accuracy and versatility. How… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  37. arXiv:2407.03106  [pdf, other

    cs.CV

    Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric

    Authors: Xiruo Jiang, Yazhou Yao, Xili Dai, Fumin Shen, Xian-Sheng Hua, Heng-Tao Shen

    Abstract: Deep metric learning (DML) aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. Prior literature predominantly focuses on pair-based and proxy-based methods to maximize inter-class discrepancy and minimize intra-class diversity. However, these methods tend to suffer from the collapse of the embedding space due to their… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted by IEEE Transactions on Multimedia

  38. arXiv:2407.00499  [pdf, other

    cs.CL cs.AI cs.LG

    ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees

    Authors: Zhiyuan Wang, Jinhao Duan, Lu Cheng, Yue Zhang, Qingni Wang, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

    Abstract: Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures, 6 tables

  39. arXiv:2407.00132  [pdf, other

    cs.SE cs.AI

    ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

    Authors: Haiyang Shen, Yue Li, Desong Meng, Dongqi Cai, Sheng Qi, Li Zhang, Mengwei Xu, Yun Ma

    Abstract: Recent advancements in integrating large language models (LLMs) with application programming interfaces (APIs) have gained significant interest in both academia and industry. These API-based agents, leveraging the strong autonomy and planning capabilities of LLMs, can efficiently solve problems requiring multi-step actions. However, their ability to handle multi-dimensional difficulty levels, dive… ▽ More

    Submitted 22 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

  40. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  41. arXiv:2406.13375  [pdf, other

    cs.CL

    ALiiCE: Evaluating Positional Fine-grained Citation Generation

    Authors: Yilong Xu, Jinhua Gao, Xiaoming Yu, Baolong Bi, Huawei Shen, Xueqi Cheng

    Abstract: Large Language Models (LLMs) can enhance the credibility and verifiability by generating text with citations. However, existing tasks and evaluation methods are predominantly limited to sentence-level statement, neglecting the significance of positional fine-grained citations that can appear anywhere within sentences. To facilitate further exploration of the fine-grained citation generation, we pr… ▽ More

    Submitted 10 September, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  42. arXiv:2406.11263  [pdf, other

    cs.CL cs.AI

    The Fall of ROME: Understanding the Collapse of LLMs in Model Editing

    Authors: Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Du Su, Dawei Yin, Huawei Shen

    Abstract: Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that con… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  43. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 10 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: proposing "bidirectional human-AI alignment" framework after a systematic review of over 400 alignment papers

  44. arXiv:2406.07146  [pdf, other

    cs.CV cs.AI

    Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

    Authors: Che Liu, Zhongwei Wan, Yuqi Wang, Hui Shen, Haozhe Wang, Kangyu Zheng, Mi Zhang, Rossella Arcucci

    Abstract: Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, whi… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  45. arXiv:2406.06305  [pdf, other

    cs.CV cs.AI

    NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking Neural Networks

    Authors: Yuqi Ma, Huamin Wang, Hangchi Shen, Xuemei Chen, Shukai Duan, Shiping Wen

    Abstract: Recently, brain-inspired spiking neural networks (SNNs) have attracted great research attention owing to their inherent bio-interpretability, event-triggered properties and powerful perception of spatiotemporal information, which is beneficial to handling event-based neuromorphic datasets. In contrast to conventional static image datasets, event-based neuromorphic datasets present heightened compl… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 32 pages,4 figures,4 tables

  46. arXiv:2406.05271  [pdf, other

    cs.CV

    USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

    Authors: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren

    Abstract: The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  47. arXiv:2406.03888  [pdf, ps, other

    cs.IT eess.SP

    MSE-Based Training and Transmission Optimization for MIMO ISAC Systems

    Authors: Zhenyao He, Wei Xu, Hong Shen, Yonina C. Eldar, Xiaohu You

    Abstract: In this paper, we investigate a multiple-input multiple-output (MIMO) integrated sensing and communication (ISAC) system under typical block-fading channels. As a non-trivial extension to most existing works on ISAC, both the training and transmission signals sent by the ISAC transmitter are exploited for sensing. Specifically, we develop two training and transmission design schemes to minimize a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  48. arXiv:2406.00944  [pdf, other

    cs.CL cs.AI cs.IR

    Unveil the Duality of Retrieval-Augmented Generation: Theoretical Analysis and Practical Solution

    Authors: Shicheng Xu, Liang Pang, Huawei Shen, Xueqi Cheng

    Abstract: Retrieval-augmented generation (RAG) utilizes retrieved texts to enhance large language models (LLMs). However, studies show that RAG is not consistently effective and can even mislead LLMs due to noisy or incorrect retrieved texts. This suggests that RAG possesses a duality including both benefit and detriment. Although many existing methods attempt to address this issue, they lack a theoretical… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 23 pages

  49. arXiv:2405.20071  [pdf

    physics.med-ph cs.LG

    A Staged Approach using Machine Learning and Uncertainty Quantification to Predict the Risk of Hip Fracture

    Authors: Anjum Shaik, Kristoffer Larsen, Nancy E. Lane, Chen Zhao, Kuan-Jui Su, Joyce H. Keyak, Qing Tian, Qiuying Sha, Hui Shen, Hong-Wen Deng, Weihua Zhou

    Abstract: Despite advancements in medical care, hip fractures impose a significant burden on individuals and healthcare systems. This paper focuses on the prediction of hip fracture risk in older and middle-aged adults, where falls and compromised bone quality are predominant factors. We propose a novel staged model that combines advanced imaging and clinical data to improve predictive performance. By using… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 29 pages, 5 figures, 6 tables

  50. arXiv:2405.19660  [pdf, other

    cs.CL

    PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals

    Authors: Ruiyi Wang, Stephanie Milani, Jamie C. Chiu, Jiayin Zhi, Shaun M. Eack, Travis Labrum, Samuel M. Murphy, Nev Jones, Kate Hardy, Hong Shen, Fei Fang, Zhiyu Zoey Chen

    Abstract: Mental illness remains one of the most critical public health issues. Despite its importance, many mental health professionals highlight a disconnect between their training and actual real-world patient practice. To help bridge this gap, we propose PATIENT-Ψ, a novel patient simulation framework for cognitive behavior therapy (CBT) training. To build PATIENT-Ψ, we construct diverse patient cogniti… ▽ More

    Submitted 18 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 9 pages, 5 figures