Skip to main content

Showing 1–50 of 829 results for author: He, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02152  [pdf

    cs.SI cs.AI

    Fair Railway Network Design

    Authors: Zixu He, Sirin Botan, Jérôme Lang, Abdallah Saffidine, Florian Sikora, Silas Workman

    Abstract: When designing a public transportation network in a country, one may want to minimise the sum of travel duration of all inhabitants. This corresponds to a purely utilitarian view and does not involve any fairness consideration, as the resulting network will typically benefit the capital city and/or large central cities while leaving some peripheral cities behind. On the other hand, a more egalitar… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 32 pages, 18 figures

  2. arXiv:2409.01559  [pdf, other

    cs.RO

    PR2: A Physics- and Photo-realistic Testbed for Embodied AI and Humanoid Robots

    Authors: Hangxin Liu, Qi Xie, Zeyu Zhang, Tao Yuan, Xiaokun Leng, Lining Sun, Song-Chun Zhu, Jingwen Zhang, Zhicheng He, Yao Su

    Abstract: This paper presents the development of a Physics-realistic and Photo-\underline{r}ealistic humanoid robot testbed, PR2, to facilitate collaborative research between Embodied Artificial Intelligence (Embodied AI) and robotics. PR2 offers high-quality scene rendering and robot dynamic simulation, enabling (i) the creation of diverse scenes using various digital assets, (ii) the integration of advanc… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  3. arXiv:2409.00968  [pdf, other

    math.OC cs.AI cs.LG

    Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning

    Authors: Hongpei Li, Han Zhang, Ziyan He, Yunkai Jia, Bo Jiang, Xiang Huang, Dongdong Ge

    Abstract: The Integrated Process Planning and Scheduling (IPPS) problem combines process route planning and shop scheduling to achieve high efficiency in manufacturing and maximize resource utilization, which is crucial for modern manufacturing systems. Traditional methods using Mixed Integer Linear Programming (MILP) and heuristic algorithms can not well balance solution quality and speed when solving IPPS… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 24 pages, 13 figures

  4. arXiv:2409.00755  [pdf, other

    cs.CV cs.AI

    Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification

    Authors: Haojian Huang, Chuanyu Qin, Zhe Liu, Kaijing Ma, Jin Chen, Han Fang, Chao Ban, Hao Sun, Zhongjiang He

    Abstract: Multi-view classification (MVC) faces inherent challenges due to domain gaps and inconsistencies across different views, often resulting in uncertainties during the fusion process. While Evidential Deep Learning (EDL) has been effective in addressing view uncertainty, existing methods predominantly rely on the Dempster-Shafer combination rule, which is sensitive to conflicting evidence and often n… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Ongoing work: 13pages, 13figures, 12 tables

  5. arXiv:2409.00743  [pdf, other

    cs.LG cs.AI

    Interpretable Clustering: A Survey

    Authors: Lianyu Hu, Mudi Jiang, Junjie Dong, Xinying Liu, Zengyou He

    Abstract: In recent years, much of the research on clustering algorithms has primarily focused on enhancing their accuracy and efficiency, frequently at the expense of interpretability. However, as these methods are increasingly being applied in high-stakes domains such as healthcare, finance, and autonomous systems, the need for transparent and interpretable clustering outcomes has become a critical concer… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 11 pages, 2 figures

  6. arXiv:2408.16431  [pdf, other

    cs.CV

    Discriminative Spatial-Semantic VOS Solution: 1st Place Solution for 6th LSVOS

    Authors: Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang

    Abstract: Video object segmentation (VOS) is a crucial task in computer vision, but current VOS methods struggle with complex scenes and prolonged object motions. To address these challenges, the MOSE dataset aims to enhance object recognition and differentiation in complex environments, while the LVOS dataset focuses on segmenting objects exhibiting long-term, intricate movements. This report introduces a… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 1st Place Solution for 6th LSVOS VOS Track. arXiv admin note: substantial text overlap with arXiv:2406.04600

  7. arXiv:2408.15549  [pdf, other

    cs.CL

    WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

    Authors: Taiwei Shi, Zhuoer Wang, Longqi Yang, Ying-Chun Lin, Zexue He, Mengting Wan, Pei Zhou, Sujay Jauhar, Xiaofeng Xu, Xia Song, Jennifer Neville

    Abstract: As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a n… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 24 pages

  8. arXiv:2408.13983  [pdf, other

    cs.CV

    Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation

    Authors: Yushun Tang, Shuoshuo Chen, Zhihe Lu, Xinchao Wang, Zhihai He

    Abstract: Transformer-based methods have achieved remarkable success in various machine learning tasks. How to design efficient test-time adaptation methods for transformer models becomes an important research task. In this work, motivated by the dual-subband wavelet lifting scheme developed in multi-scale signal processing which is able to efficiently separate the input signals into principal components an… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  9. arXiv:2408.13898  [pdf, other

    cs.CV

    Evaluating Attribute Comprehension in Large Vision-Language Models

    Authors: Haiwen Zhang, Zixi Yang, Yuanzhi Liu, Xinran Wang, Zheqi He, Kongming Liang, Zhanyu Ma

    Abstract: Currently, large vision-language models have gained promising progress on many downstream tasks. However, they still suffer many challenges in fine-grained visual understanding tasks, such as object attribute comprehension. Besides, there have been growing efforts on the evaluations of large vision-language models, but lack of in-depth study of attribute comprehension and the visual language fine-… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 15 pages, 4 figures

  10. arXiv:2408.12664  [pdf, other

    cs.AI q-bio.NC

    Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

    Authors: Zhonghao He, Jascha Achterberg, Katie Collins, Kevin Nejad, Danyal Akarca, Yinzhu Yang, Wes Gurnee, Ilia Sucholutsky, Yuhan Tang, Rebeca Ianov, George Ogden, Chole Li, Kai Sandbrink, Stephen Casper, Anna Ivanova, Grace W. Lindsay

    Abstract: As deep learning systems are scaled up to many billions of parameters, relating their internal structure to external behaviors becomes very challenging. Although daunting, this problem is not new: Neuroscientists and cognitive scientists have accumulated decades of experience analyzing a particularly complex system - the brain. In this work, we argue that interpreting both biological and artificia… ▽ More

    Submitted 25 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  11. arXiv:2408.11795  [pdf, other

    cs.CV

    EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

    Authors: Feipeng Ma, Yizhou Zhou, Hebei Li, Zilong He, Siying Wu, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun

    Abstract: In the realm of multimodal research, numerous studies leverage substantial image-text pairs to conduct modal alignment learning, transforming Large Language Models (LLMs) into Multimodal LLMs and excelling in a variety of visual-language tasks. The prevailing methodologies primarily fall into two categories: self-attention-based and cross-attention-based methods. While self-attention-based methods… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  12. arXiv:2408.10946  [pdf, other

    cs.AI

    Large Language Model Driven Recommendation

    Authors: Anton Korikov, Scott Sanner, Yashar Deldjoo, Zhankui He, Julian McAuley, Arnau Ramisa, Rene Vidal, Mahesh Sathiamoorthy, Atoosa Kasrizadeh, Silvia Milano, Francesco Ricci

    Abstract: While previous chapters focused on recommendation systems (RSs) based on standardized, non-verbal user feedback such as purchases, views, and clicks -- the advent of LLMs has unlocked the use of natural language (NL) interactions for recommendation. This chapter discusses how LLMs' abilities for general NL reasoning present novel opportunities to build highly personalized RSs -- which can effectiv… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  13. arXiv:2408.09538  [pdf, other

    quant-ph cs.ET

    Parameter Setting Heuristics Make the Quantum Approximate Optimization Algorithm Suitable for the Early Fault-Tolerant Era

    Authors: Zichang He, Ruslan Shaydulin, Dylan Herman, Changhao Li, Rudy Raymond, Shree Hari Sureshbabu, Marco Pistoia

    Abstract: Quantum Approximate Optimization Algorithm (QAOA) is one of the most promising quantum heuristics for combinatorial optimization. While QAOA has been shown to perform well on small-scale instances and to provide an asymptotic speedup over state-of-the-art classical algorithms for some problems, fault-tolerance is understood to be required to realize this speedup in practice. The low resource requi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 7 pages, an invited paper at ICCAD 2024 "Exploring Quantum Technologies in Practical Applications" special session

  14. arXiv:2408.09403  [pdf, other

    cs.AI cs.CV

    Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

    Authors: Jiahao Su, Kang You, Zekai Xu, Weizhi Xu, Zhezhi He

    Abstract: Spiking neural networks (SNNs) are becoming a promising alternative to conventional artificial neural networks (ANNs) due to their rich neural dynamics and the implementation of energy-efficient neuromorphic chips. However, the non-differential binary communication mechanism makes SNN hard to converge to an ANN-level accuracy. When SNN encounters sequence learning, the situation becomes worse due… ▽ More

    Submitted 25 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by 33rd International Conference on Artificial Neural Networks

  15. arXiv:2408.09366  [pdf, other

    cs.CL cs.CY cs.SI

    Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities

    Authors: Minh Duc Chu, Zihao He, Rebecca Dorn, Kristina Lerman

    Abstract: Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and compr… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  16. arXiv:2408.08933  [pdf, other

    cs.IR cs.AI cs.DB

    RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

    Authors: Meng Chen, Kai Zhang, Zhenying He, Yinan Jing, X. Sean Wang

    Abstract: Approximate Nearest Neighbor Search (ANNS) is a fundamental and critical component in many applications, including recommendation systems and large language model-based applications. With the advancement of multimodal neural models, which transform data from different modalities into a shared high-dimensional space as feature vectors, cross-modal ANNS aims to use the data vector from one modality… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: to be published in PVLDB

  17. arXiv:2408.08930  [pdf, other

    cs.CR cs.AI cs.CL

    DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts

    Authors: Xiongtao Sun, Gan Liu, Zhipeng He, Hui Li, Xiaoguang Li

    Abstract: Prompt serves as a crucial link in interacting with large language models (LLMs), widely impacting the accuracy and interpretability of model outputs. However, acquiring accurate and high-quality responses necessitates precise prompts, which inevitably pose significant risks of personal identifiable information (PII) leakage. Therefore, this paper proposes DePrompt, a desensitization protection an… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  18. arXiv:2408.07600  [pdf, other

    cs.CV

    Disentangle and denoise: Tackling context misalignment for video moment retrieval

    Authors: Kaijing Ma, Han Fang, Xianghao Zang, Chao Ban, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun, Zerun Feng, Xingsong Hou

    Abstract: Video Moment Retrieval, which aims to locate in-context video moments according to a natural language query, is an essential task for cross-modal grounding. Existing methods focus on enhancing the cross-modal interactions between all moments and the textual description for video understanding. However, constantly interacting with all locations is unreasonable because of uneven semantic distributio… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  19. arXiv:2408.06494  [pdf, other

    cs.HC cs.CL cs.CV

    What Color Scheme is More Effective in Assisting Readers to Locate Information in a Color-Coded Article?

    Authors: Ho Yin Ng, Zeyu He, Ting-Hao 'Kenneth' Huang

    Abstract: Color coding, a technique assigning specific colors to cluster information types, has proven advantages in aiding human cognitive activities, especially reading and comprehension. The rise of Large Language Models (LLMs) has streamlined document coding, enabling simple automatic text labeling with various schemes. This has the potential to make color-coding more accessible and benefit more users.… ▽ More

    Submitted 26 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: This paper will appear at IEEE VIS 2024

  20. arXiv:2408.05842  [pdf, other

    cs.AI cs.HC

    Evolving Virtual World with Delta-Engine

    Authors: Hongqiu Wu, Zekai Xu, Tianyang Xu, Shize Wei, Yan Wang, Jiale Hong, Weiqi Wu, Hai Zhao, Min Zhang, Zhezhi He

    Abstract: In this paper, we focus on the \emph{virtual world}, a cyberspace where people can live in. An ideal virtual world shares great similarity with our real world. One of the crucial aspects is its evolving nature, reflected by individuals' capability to grow and thereby influence the objective world. Such dynamics is unpredictable and beyond the reach of existing systems. For this, we propose a speci… ▽ More

    Submitted 2 September, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

  21. arXiv:2408.03748  [pdf, other

    cs.CV

    Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model

    Authors: Guoqing Zhu, Honghu Pan, Qiang Wang, Chao Tian, Chao Yang, Zhenyu He

    Abstract: In challenging low light and adverse weather conditions,thermal vision algorithms,especially object detection,have exhibited remarkable potential,contrasting with the frequent struggles encountered by visible vision algorithms. Nevertheless,the efficacy of thermal vision algorithms driven by deep learning models remains constrained by the paucity of available training data samples. To this end,thi… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: accepted by ACM MM 2024/ACM MM24

  22. arXiv:2408.02263  [pdf, other

    cs.CV

    VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking

    Authors: Yuxuan Lu, Jiahao Nie, Zhiwei He, Hongjie Gu, Xudong Lv

    Abstract: Current LiDAR point cloud-based 3D single object tracking (SOT) methods typically rely on point-based representation network. Despite demonstrated success, such networks suffer from some fundamental problems: 1) It contains pooling operation to cope with inherently disordered point clouds, hindering the capture of 3D spatial information that is useful for tracking, a regression task. 2) The adopte… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  23. arXiv:2408.01800  [pdf, other

    cs.CV

    MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of par… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: preprint

  24. arXiv:2408.01137  [pdf, other

    cs.CV

    PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network

    Authors: Changqun Xia, Chenxi Xie, Zhentao He, Tianshu Yu, Jia Li

    Abstract: We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD, containing 5,920 images from real-world complex scenarios at 4K-8K resolutions. All the images are fi… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  25. arXiv:2408.00557  [pdf, other

    quant-ph cs.ET

    End-to-End Protocol for High-Quality QAOA Parameters with Few Shots

    Authors: Tianyi Hao, Zichang He, Ruslan Shaydulin, Jeffrey Larson, Marco Pistoia

    Abstract: The quantum approximate optimization algorithm (QAOA) is a quantum heuristic for combinatorial optimization that has been demonstrated to scale better than state-of-the-art classical solvers for some problems. For a given problem instance, QAOA performance depends crucially on the choice of the parameters. While average-case optimal parameters are available in many cases, meaningful performance ga… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 14 pages, 14 figures

  26. arXiv:2407.20265  [pdf, other

    cs.LG cs.CE

    COEFF-KANs: A Paradigm to Address the Electrolyte Field with KANs

    Authors: Xinhe Li, Zhuoying Feng, Yezeng Chen, Weichen Dai, Zixu He, Yi Zhou, Shuhong Jiao

    Abstract: To reduce the experimental validation workload for chemical researchers and accelerate the design and optimization of high-energy-density lithium metal batteries, we aim to leverage models to automatically predict Coulombic Efficiency (CE) based on the composition of liquid electrolytes. There are mainly two representative paradigms in existing methods: machine learning and deep learning. However,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

  27. arXiv:2407.20172  [pdf, other

    eess.IV cs.AI cs.CV

    LatentArtiFusion: An Effective and Efficient Histological Artifacts Restoration Framework

    Authors: Zhenqi He, Wenrui Liu, Minghao Yin, Kai Han

    Abstract: Histological artifacts pose challenges for both pathologists and Computer-Aided Diagnosis (CAD) systems, leading to errors in analysis. Current approaches for histological artifact restoration, based on Generative Adversarial Networks (GANs) and pixel-level Diffusion Models, suffer from performance limitations and computational inefficiencies. In this paper, we propose a novel framework, LatentArt… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accept to DGM4MICCAI2024

  28. arXiv:2407.17638  [pdf

    cs.CL

    Time Matters: Examine Temporal Effects on Biomedical Language Models

    Authors: Weisi Liu, Zhe He, Xiaolei Huang

    Abstract: Time roots in applying language models for biomedical applications: models are trained on historical data and will be deployed for new or future data, which may vary from training data. While increasing biomedical tasks have employed state-of-the-art language models, there are very few studies have examined temporal effects on biomedical models when data usually shifts across development and deplo… ▽ More

    Submitted 11 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted to AMIA 2024 Annual Symposium

  29. arXiv:2407.16418  [pdf, other

    eess.IV cs.CV

    Accelerating Learned Video Compression via Low-Resolution Representation Learning

    Authors: Zidian Qiu, Zongyao He, Zhi Jin

    Abstract: In recent years, the field of learned video compression has witnessed rapid advancement, exemplified by the latest neural video codecs DCVC-DC that has outperformed the upcoming next-generation codec ECM in terms of compression ratio. Despite this, learned video compression frameworks often exhibit low encoding and decoding speeds primarily due to their increased computational complexity and unnec… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  30. arXiv:2407.15714  [pdf

    cs.CV cs.AI

    Mamba meets crack segmentation

    Authors: Zhili He, Yu-Hsing Wang

    Abstract: Cracks pose safety risks to infrastructure and cannot be overlooked. The prevailing structures in existing crack segmentation networks predominantly consist of CNNs or Transformers. However, CNNs exhibit a deficiency in global modeling capability, hindering the representation to entire crack features. Transformers can capture long-range dependencies but suffer from high and quadratic complexity. R… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 32 pages, 8 figures. Preprint submitted to Elsevier

  31. arXiv:2407.15353  [pdf, other

    cs.CL cs.AR

    Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA

    Authors: Yuan Pu, Zhuolun He, Tairu Qiu, Haoyuan Wu, Bei Yu

    Abstract: Retrieval augmented generation (RAG) enhances the accuracy and reliability of generative AI models by sourcing factual information from external databases, which is extensively employed in document-grounded question-answering (QA) tasks. Off-the-shelf RAG flows are well pretrained on general-purpose documents, yet they encounter significant challenges when being applied to knowledge-intensive vert… ▽ More

    Submitted 26 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

  32. arXiv:2407.15202  [pdf, other

    q-bio.BM cs.AI cs.LG

    Exploiting Pre-trained Models for Drug Target Affinity Prediction with Nearest Neighbors

    Authors: Qizhi Pei, Lijun Wu, Zhenyu He, Jinhua Zhu, Yingce Xia, Shufang Xie, Rui Yan

    Abstract: Drug-Target binding Affinity (DTA) prediction is essential for drug discovery. Despite the application of deep learning methods to DTA prediction, the achieved accuracy remain suboptimal. In this work, inspired by the recent success of retrieval methods, we propose $k$NN-DTA, a non-parametric embedding-based retrieval method adopted on a pre-trained DTA prediction model, which can extend the power… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted by 33rd ACM International Conference on Information and Knowledge Management 2024 (CIKM 2024)

  33. arXiv:2407.11463  [pdf, other

    cs.LG cs.AI cs.CR

    Investigating Imperceptibility of Adversarial Attacks on Tabular Data: An Empirical Analysis

    Authors: Zhipeng He, Chun Ouyang, Laith Alzubaidi, Alistair Barros, Catarina Moreira

    Abstract: Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular d… ▽ More

    Submitted 20 August, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 33 pages

  34. arXiv:2407.10625  [pdf, other

    cs.CV

    WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models

    Authors: Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li, Philip H. S. Torr, Liang Lin

    Abstract: Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions, limiting their effectiveness in video try-on applications. Moreover, video-based models require extensive, high-quality data and… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  35. arXiv:2407.10193  [pdf, other

    cs.CV

    GRAPE: Generalizable and Robust Multi-view Facial Capture

    Authors: Jing Li, Di Kang, Zhenyu He

    Abstract: Deep learning-based multi-view facial capture methods have shown impressive accuracy while being several orders of magnitude faster than a traditional mesh registration pipeline. However, the existing systems (e.g. TEMPEH) are strictly restricted to inference on the data captured by the same camera array used to capture their training data. In this study, we aim to improve the generalization abili… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  36. arXiv:2407.09722  [pdf, other

    cs.CL cs.LG

    Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference

    Authors: Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun

    Abstract: Transformer-based Large language models (LLMs) have demonstrated their power in various tasks, but their inference incurs significant time and energy costs. To accelerate LLM inference, speculative decoding uses a smaller model to propose one sequence of tokens, which are subsequently validated in batch by the target large model. Compared with autoregressive decoding, speculative decoding generate… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  37. arXiv:2407.09083  [pdf, other

    cs.NE

    BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

    Authors: Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He

    Abstract: Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to th… ▽ More

    Submitted 14 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: accepted by European Conference on Computer Vision (ECCV) 2024

    Journal ref: European Conference on Computer Vision 2024

  38. arXiv:2407.08672  [pdf, other

    cs.CV

    NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning

    Authors: Yi Zhang, Chun-Wun Cheng, Ke Yu, Zhihai He, Carola-Bibiane Schönlieb, Angelica I. Aviles-Rivero

    Abstract: In this paper, we consider the problem of prototype-based vision-language reasoning problem. We observe that existing methods encounter three major challenges: 1) escalating resource demands and prolonging training times, 2) contending with excessive learnable parameters, and 3) fine-tuning based only on a single modality. These challenges will hinder their capability to adapt Vision-Language Mode… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  39. arXiv:2407.08273   

    cs.CL

    RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

    Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

    Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Further improvement and modification are needed.

  40. arXiv:2407.07760  [pdf, other

    cs.CV cs.AI

    Learning Spatial-Semantic Features for Robust Video Object Segmentation

    Authors: Xin Li, Deshui Miao, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang

    Abstract: Tracking and segmenting multiple similar objects with complex or separate parts in long-term videos is inherently challenging due to the ambiguity of target parts and identity confusion caused by occlusion, background clutter, and long-term variations. In this paper, we propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries to… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Winner solution of the VOTS2024 Challenge

  41. arXiv:2407.07506  [pdf, other

    eess.SP cs.AI

    Generative AI for RF Sensing in IoT systems

    Authors: Li Wang, Chao Zhang, Qiyang Zhao, Hang Zou, Samson Lasaulce, Giuseppe Valenzise, Zhuo He, Merouane Debbah

    Abstract: The development of wireless sensing technologies, using signals such as Wi-Fi, infrared, and RF to gather environmental data, has significantly advanced within Internet of Things (IoT) systems. Among these, Radio Frequency (RF) sensing stands out for its cost-effective and non-intrusive monitoring of human activities and environmental changes. However, traditional RF sensing methods face significa… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  42. arXiv:2407.06043  [pdf, other

    cs.CV

    Test-time adaptation for geospatial point cloud semantic segmentation with distinct domain shifts

    Authors: Puzuo Wang, Wei Yao, Jie Shao, Zhiyi He

    Abstract: Domain adaptation (DA) techniques help deep learning models generalize across data shifts for point cloud semantic segmentation (PCSS). Test-time adaptation (TTA) allows direct adaptation of a pre-trained model to unlabeled data during inference stage without access to source data or additional training, avoiding privacy issues and large computational resources. We address TTA for geospatial PCSS… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  43. arXiv:2407.05238  [pdf, other

    cs.CV

    P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds

    Authors: Jiahao Nie, Fei Xie, Sifan Zhou, Xueyi Zhou, Dong-Kyu Chae, Zhiwei He

    Abstract: 3D single object tracking (SOT) methods based on appearance matching has long suffered from insufficient appearance information incurred by incomplete, textureless and semantically deficient LiDAR point clouds. While motion paradigm exploits motion cues instead of appearance matching for tracking, it incurs complex multi-stage processing and segmentation module. In this paper, we first provide in-… ▽ More

    Submitted 8 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: The source code and pre-trained models are available at https://github.com/haooozi/P2P

  44. arXiv:2407.05205  [pdf, other

    cs.CY cs.AI cs.LG

    The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering

    Authors: Zhangying He, Thomas Nguyen, Tahereh Miari, Mehrdad Aliasgari, Setareh Rafatirad, Hossein Sayadi

    Abstract: Artificial Intelligence (AI), with ChatGPT as a prominent example, has recently taken center stage in various domains including higher education, particularly in Computer Science and Engineering (CSE). The AI revolution brings both convenience and controversy, offering substantial benefits while lacking formal guidance on their application. The primary objective of this work is to comprehensively… ▽ More

    Submitted 23 April, 2024; originally announced July 2024.

    Comments: conference, 13 pages

  45. arXiv:2407.03551  [pdf, other

    cs.SI cs.CL cs.CY

    Feelings about Bodies: Emotions on Diet and Fitness Forums Reveal Gendered Stereotypes and Body Image Concerns

    Authors: Cinthia Sánchez, Minh Duc Chu, Zihao He, Rebecca Dorn, Stuart Murray, Kristina Lerman

    Abstract: The gendered expectations about ideal body types can lead to body image concerns, dissatisfaction, and in extreme cases, disordered eating and other psychopathologies across the gender spectrum. While research has focused on pro-anorexia online communities that glorify the 'thin ideal', less attention has been given to the broader spectrum of body image concerns or how emerging disorders like musc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  46. arXiv:2407.03157  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    Let the Code LLM Edit Itself When You Edit the Code

    Authors: Zhenyu He, Jun Zhang, Shengjie Luo, Jingjing Xu, Zhi Zhang, Di He

    Abstract: In this work, we investigate a typical scenario in code generation where a developer edits existing code in real time and requests a code assistant, e.g., a large language model, to re-predict the next token or next line on the fly. Naively, the LLM needs to re-encode the entire KV cache to provide an accurate prediction. However, this process is computationally expensive, especially when the sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in Progress

  47. arXiv:2407.02783  [pdf, ps, other

    cs.CL cs.AI

    52B to 1T: Lessons Learned via Tele-FLM Series

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: For the Tele-FLM-52B tech report, see also 2404.16645

  48. arXiv:2407.02350  [pdf, other

    cs.CV

    Conceptual Codebook Learning for Vision-Language Models

    Authors: Yi Zhang, Ke Yu, Siqi Wu, Zhihai He

    Abstract: In this paper, we propose Conceptual Codebook Learning (CoCoLe), a novel fine-tuning method for vision-language models (VLMs) to address the challenge of improving the generalization capability of VLMs while fine-tuning them on downstream tasks in a few-shot setting. We recognize that visual concepts, such as textures, shapes, and colors are naturally transferable across domains and play a crucial… ▽ More

    Submitted 15 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  49. arXiv:2407.01358  [pdf, other

    cs.CL

    Evaluating Knowledge-based Cross-lingual Inconsistency in Large Language Models

    Authors: Xiaolin Xing, Zhiwei He, Haoyu Xu, Xing Wang, Rui Wang, Yu Hong

    Abstract: This paper investigates the cross-lingual inconsistencies observed in Large Language Models (LLMs), such as ChatGPT, Llama, and Baichuan, which have shown exceptional performance in various Natural Language Processing (NLP) tasks. Despite their successes, these models often exhibit significant inconsistencies when processing the same concepts across different languages. This study focuses on three… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  50. Generative Iris Prior Embedded Transformer for Iris Restoration

    Authors: Yubo Huang, Jia Wang, Peipei Li, Liuyu Xiang, Peigang Li, Zhaofeng He

    Abstract: Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Our code is available at https://github.com/sawyercharlton/Gformer

    Journal ref: 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 510-515