Skip to main content

Showing 1–50 of 970 results for author: He, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03404  [pdf, other

    cs.CV cs.AI

    KAN See In the Dark

    Authors: Aoxiang Ning, Minglong Xue, Jinhong He, Chengyun Song

    Abstract: Existing low-light image enhancement methods are difficult to fit the complex nonlinear relationship between normal and low-light images due to uneven illumination and noise effects. The recently proposed Kolmogorov-Arnold networks (KANs) feature spline-based convolutional layers and learnable activation functions, which can effectively capture nonlinear dependencies. In this paper, we design a KA… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.01966  [pdf, other

    cs.CV

    MetaFood3D: Large 3D Food Object Dataset with Nutrition Values

    Authors: Yuhao Chen, Jiangpeng He, Chris Czarnecki, Gautham Vinod, Talha Ibn Mahmud, Siddeshwar Raghavan, Jinge Ma, Dayou Mao, Saeejith Nair, Pengcheng Xi, Alexander Wong, Edward Delp, Fengqing Zhu

    Abstract: Food computing is both important and challenging in computer vision (CV). It significantly contributes to the development of CV algorithms due to its frequent presence in datasets across various applications, ranging from classification and instance segmentation to 3D reconstruction. The polymorphic shapes and textures of food, coupled with high variation in forms and vast multimodal information,… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Dataset is coming soon

  3. arXiv:2409.01366  [pdf, other

    cs.CL cs.AI cs.LG

    CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

    Authors: Junhui He, Shangyu Wu, Weidong Wen, Chun Jason Xue, Qingan Li

    Abstract: Deploying large language models (LLMs) on edge devices presents significant challenges due to the substantial computational overhead and memory requirements. Activation sparsification can mitigate these challenges by reducing the number of activated neurons during inference. Existing methods typically employ thresholding-based sparsification based on the statistics of activation tensors. However,… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  4. arXiv:2409.01207  [pdf, other

    cs.LG

    Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models

    Authors: Jiao Chen, Jiayi He, Fangfang Chen, Zuohong Lv, Jianhua Tang, Weihua Li, Zuozhu Liu, Howard H. Yang, Guangjie Han

    Abstract: Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based neural networks. Although Transformer-based large models (LMs), including language, vision, and multimodal models, have demonstrated impressive capabilities in AI-generated content (AIGC), their application in industrial domains, such as detection, planning, and control, remains relatively limited. Dep… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  5. arXiv:2409.01184  [pdf, other

    cs.CV

    PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery

    Authors: Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon Płotka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa , et al. (7 additional authors not shown)

    Abstract: The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operat… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  6. arXiv:2409.00346  [pdf, other

    cs.CV

    SMAFormer: Synergistic Multi-Attention Transformer for Medical Image Segmentation

    Authors: Fuchen Zheng, Xuhang Chen, Weihuang Liu, Haolun Li, Yingtie Lei, Jiahui He, Chi-Man Pun, Shounjun Zhou

    Abstract: In medical image segmentation, specialized computer vision techniques, notably transformers grounded in attention mechanisms and residual networks employing skip connections, have been instrumental in advancing performance. Nonetheless, previous models often falter when segmenting small, irregularly shaped tumors. To this end, we introduce SMAFormer, an efficient, Transformer-based architecture th… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Accepted by BIBM 2024

  7. arXiv:2409.00343  [pdf, other

    cs.CV

    EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System

    Authors: Bonan Liu, Handi Yin, Manuel Kaufmann, Jinhao He, Sammy Christen, Jie Song, Pan Hui

    Abstract: We present EgoHDM, an online egocentric-inertial human motion capture (mocap), localization, and dense mapping system. Our system uses 6 inertial measurement units (IMUs) and a commodity head-mounted RGB camera. EgoHDM is the first human mocap system that offers dense scene mapping in near real-time. Further, it is fast and robust to initialize and fully closes the loop between physically plausibl… ▽ More

    Submitted 5 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Project Page: https://handiyin.github.io/EgoHDM/

  8. arXiv:2409.00133  [pdf, other

    cs.CL cs.AI

    A Survey for Large Language Models in Biomedicine

    Authors: Chong Wang, Mengyao Li, Junjun He, Zhongruo Wang, Erfan Darzi, Zan Chen, Jin Ye, Tianbin Li, Yanzhou Su, Jing Ke, Kaili Qu, Shuxin Li, Yi Yu, Pietro Liò, Tianyun Wang, Yu Guang Wang, Yiqing Shen

    Abstract: Recent breakthroughs in large language models (LLMs) offer unprecedented natural language understanding and generation capabilities. However, existing surveys on LLMs in biomedicine often focus on specific applications or model architectures, lacking a comprehensive analysis that integrates the latest advancements across various biomedical domains. This review, based on an analysis of 484 publicat… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

  9. arXiv:2409.00107  [pdf, other

    eess.SY cs.AI cs.LG econ.GN math.OC

    Evaluating the Impact of Multiple DER Aggregators on Wholesale Energy Markets: A Hybrid Mean Field Approach

    Authors: Jun He, Andrew L. Liu

    Abstract: The integration of distributed energy resources (DERs) into wholesale energy markets can greatly enhance grid flexibility, improve market efficiency, and contribute to a more sustainable energy future. As DERs -- such as solar PV panels and energy storage -- proliferate, effective mechanisms are needed to ensure that small prosumers can participate meaningfully in these markets. We study a wholesa… ▽ More

    Submitted 27 August, 2024; originally announced September 2024.

  10. arXiv:2408.17258  [pdf, other

    cs.LG

    Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach

    Authors: Tong Nie, Junlin He, Yuewen Mei, Guoyang Qin, Guilong Li, Jian Sun, Wei Ma

    Abstract: The proliferation of e-commerce and urbanization has significantly intensified delivery operations in urban areas, boosting the volume and complexity of delivery demand. Data-driven predictive methods, especially those utilizing machine learning techniques, have emerged to handle these complexities in urban delivery demand management problems. One particularly pressing problem that has not yet bee… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  11. arXiv:2408.17005  [pdf, other

    cs.RO cs.CV

    Efficient Camera Exposure Control for Visual Odometry via Deep Reinforcement Learning

    Authors: Shuyang Zhang, Jinhao He, Yilong Zhu, Jin Wu, Jie Yuan

    Abstract: The stability of visual odometry (VO) systems is undermined by degraded image quality, especially in environments with significant illumination changes. This study employs a deep reinforcement learning (DRL) framework to train agents for exposure control, aiming to enhance imaging performance in challenging conditions. A lightweight image simulator is developed to facilitate the training process,… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages, 7 figures

  12. arXiv:2408.15299  [pdf, other

    q-bio.BM cs.AI cs.LG

    TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein Engineering

    Authors: Yiqing Shen, Zan Chen, Michail Mamalakis, Yungeng Liu, Tianbin Li, Yanzhou Su, Junjun He, Pietro Liò, Yu Guang Wang

    Abstract: The structural similarities between protein sequences and natural languages have led to parallel advancements in deep learning across both domains. While large language models (LLMs) have achieved much progress in the domain of natural language processing, their potential in protein engineering remains largely unexplored. Previous approaches have equipped LLMs with protein understanding capabiliti… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  13. arXiv:2408.14765  [pdf, other

    cs.CV

    CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

    Authors: Weijia Li, Jun He, Junyan Ye, Huaping Zhong, Zhimeng Zheng, Zilong Huang, Dahua Lin, Conghui He

    Abstract: Satellite-to-street view synthesis aims at generating a realistic street-view image from its corresponding satellite-view image. Although stable diffusion models have exhibit remarkable performance in a variety of image generation applications, their reliance on similar-view inputs to control the generated structure or texture restricts their application to the challenging cross-view synthesis tas… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 21 pages, 11 figures

  14. arXiv:2408.14057  [pdf, other

    math.NA cs.DC cs.NE eess.SY nlin.CD

    Revisiting time-variant complex conjugate matrix equations with their corresponding real field time-variant large-scale linear equations, neural hypercomplex numbers space compressive approximation approach

    Authors: Jiakuang He, Dongqing Wu

    Abstract: Large-scale linear equations and high dimension have been hot topics in deep learning, machine learning, control,and scientific computing. Because of special conjugate operation characteristics, time-variant complex conjugate matrix equations need to be transformed into corresponding real field time-variant large-scale linear equations. In this paper, zeroing neural dynamic models based on complex… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  15. arXiv:2408.12534  [pdf, other

    eess.IV cs.AI cs.CV

    Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

    Abstract: Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a lar… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024 FLARE Challenge Summary

  16. arXiv:2408.12116  [pdf, other

    cs.AI

    Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning

    Authors: Junlin He, Tong Nie, Wei Ma

    Abstract: In the geospatial domain, universal representation models are significantly less prevalent than their extensive use in natural language processing and computer vision. This discrepancy arises primarily from the high costs associated with the input of existing representation models, which often require street views and mobility data. To address this, we develop a novel, training-free method that le… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  17. arXiv:2408.10995  [pdf, other

    cs.CL

    CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models

    Authors: Michael Reinisch, Jianfeng He, Chenxi Liao, Sauleh Ahmad Siddiqui, Bei Xiao

    Abstract: New medical treatment development requires multiple phases of clinical trials. Despite the significant human and financial costs of bringing a drug to market, less than 20% of drugs in testing will make it from the first phase to final approval. Recent literature indicates that the design of the trial protocols significantly contributes to trial performance. We investigated Clinical Trial Outcome… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  18. arXiv:2408.10703  [pdf, other

    cs.CV

    Large Language Models for Multimodal Deformable Image Registration

    Authors: Mingrui Ma, Weijie Wang, Jie Ning, Jianfeng He, Nicu Sebe, Bruno Lepri

    Abstract: The challenge of Multimodal Deformable Image Registration (MDIR) lies in the conversion and alignment of features between images of different modalities. Generative models (GMs) cannot retain the necessary information enough from the source modality to the target one, while non-GMs struggle to align features across these two modalities. In this paper, we propose a novel coarse-to-fine MDIR framewo… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  19. arXiv:2408.10473  [pdf, other

    cs.CL cs.LG

    Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

    Authors: Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum

    Abstract: Pre-trained language models (PLMs) are engineered to be robust in contextual understanding and exhibit outstanding performance in various natural language processing tasks. However, their considerable size incurs significant computational and storage costs. Modern pruning strategies employ one-shot techniques to compress PLMs without the need for retraining on task-specific or otherwise general da… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  20. arXiv:2408.09357  [pdf, other

    cs.GR cs.AI cs.SD eess.AS

    Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

    Authors: Xukun Zhou, Fengxin Li, Ziqiao Peng, Kejian Wu, Jun He, Biao Qin, Zhaoxin Fan, Hongyan Liu

    Abstract: Audio-driven 3D face animation is increasingly vital in live streaming and augmented reality applications. While remarkable progress has been observed, most existing approaches are designed for specific individuals with predefined speaking styles, thus neglecting the adaptability to varied speaking styles. To address this limitation, this paper introduces MetaFace, a novel methodology meticulously… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  21. arXiv:2408.09330  [pdf, other

    cs.CL

    Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset

    Authors: Renliang Sun, Mengyuan Liu, Shiping Yang, Rui Wang, Junqing He, Jiaxing Zhang

    Abstract: Benefiting from diverse instruction datasets, contemporary Large Language Models (LLMs) perform effectively as AI assistants in collaborating with humans. However, LLMs still struggle to generate natural and colloquial responses in real-world applications such as chatbots and psychological counseling that require more human-like interactions. To address these limitations, we introduce NICO, a Natu… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 16 pages, 3 figures, 10 tables

  22. arXiv:2408.08321  [pdf

    cs.HC cs.CV

    Can ChatGPT assist visually impaired people with micro-navigation?

    Authors: Junxian He, Shrinivas Pundlik, Gang Luo

    Abstract: Objective: Micro-navigation poses challenges for blind and visually impaired individuals. They often need to ask for sighted assistance. We explored the feasibility of utilizing ChatGPT as a virtual assistant to provide navigation directions. Methods: We created a test set of outdoor and indoor micro-navigation scenarios consisting of 113 scene images and their human-generated text descriptions. A… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  23. arXiv:2408.07790  [pdf, other

    cs.CV

    Cropper: Vision-Language Model for Image Cropping through In-Context Learning

    Authors: Seung Hyun Lee, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang

    Abstract: The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large vision-language models (VLMs) have enabled visual in-context learning without explicit training. However, effective strategies for vision downstream ta… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  24. arXiv:2408.07675  [pdf, other

    cs.CV

    G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing

    Authors: Jingyi Yang, Zitong Yu, Xiuming Ni, Jia He, Hui Li

    Abstract: In videos containing spoofed faces, we may uncover the spoofing evidence based on either photometric or dynamic abnormality, even a combination of both. Prevailing face anti-spoofing (FAS) approaches generally concentrate on the single-frame scenario, however, purely photometric-driven methods overlook the dynamic spoofing clues that may be exposed over time. This may lead FAS systems to conclude… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 11 pages, 5 figures

  25. arXiv:2408.05939  [pdf, other

    cs.CV

    UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

    Authors: Junjie He, Yifeng Geng, Liefeng Bo

    Abstract: This paper presents UniPortrait, an innovative human image personalization framework that unifies single- and multi-ID customization with high face fidelity, extensive facial editability, free-form input description, and diverse layout generation. UniPortrait consists of only two plug-and-play modules: an ID embedding module and an ID routing module. The ID embedding module extracts versatile edit… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Tech report; Project page: https://aigcdesigngroup.github.io/UniPortrait-Page/

  26. arXiv:2408.05586  [pdf, other

    cs.LG cs.IR

    Meta Clustering of Neural Bandits

    Authors: Yikun Ban, Yunzhe Qi, Tianxin Wei, Lihui Liu, Jingrui He

    Abstract: The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of $T$ rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance betwee… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: KDD 2024

  27. arXiv:2408.04708  [pdf, other

    cs.SD cs.AI eess.AS

    MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

    Authors: Jiawei Huang, Chen Zhang, Yi Ren, Ziyue Jiang, Zhenhui Ye, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao

    Abstract: Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monolingual and cross-lingual scenarios) has yet to be extensively studied. It faces two main challenges: 1) the considerable variability in prosody and art… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  28. arXiv:2408.04254  [pdf, other

    cs.LG

    Generating Fine-Grained Causality in Climate Time Series Data for Forecasting and Anomaly Detection

    Authors: Dongqi Fu, Yada Zhu, Hanghang Tong, Kommy Weldemariam, Onkar Bhardwaj, Jingrui He

    Abstract: Understanding the causal interaction of time series variables can contribute to time series data analysis for many real-world applications, such as climate forecasting and extreme weather alerts. However, causal relationships are difficult to be fully observed in real-world complex settings, such as spatial-temporal data from deployed sensor networks. Therefore, to capture fine-grained causal rela… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: ICML 2024 AI for Science Workshop

  29. arXiv:2408.03922  [pdf, other

    cs.CV

    FMiFood: Multi-modal Contrastive Learning for Food Image Classification

    Authors: Xinyue Pan, Jiangpeng He, Fengqing Zhu

    Abstract: Food image classification is the fundamental step in image-based dietary assessment, which aims to estimate participants' nutrient intake from eating occasion images. A common challenge of food images is the intra-class diversity and inter-class similarity, which can significantly hinder classification performance. To address this issue, we introduce a novel multi-modal contrastive learning framew… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  30. arXiv:2408.03361  [pdf, other

    eess.IV cs.CV

    GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

    Authors: Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao

    Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Curren… ▽ More

    Submitted 9 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  31. arXiv:2408.03131  [pdf, other

    cs.RO eess.SY

    Stochastic Trajectory Optimization for Demonstration Imitation

    Authors: Chenlin Ming, Zitong Wang, Boxuan Zhang, Xiaoming Duan, Jianping He

    Abstract: Humans often learn new skills by imitating the experts and gradually developing their proficiency. In this work, we introduce Stochastic Trajectory Optimization for Demonstration Imitation (STODI), a trajectory optimization framework for robots to imitate the shape of demonstration trajectories with improved dynamic performance. Consistent with the human learning process, demonstration imitation s… ▽ More

    Submitted 6 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  32. arXiv:2408.02509  [pdf, other

    cs.CR cs.LG cs.PL cs.SE

    Practical Attacks against Black-box Code Completion Engines

    Authors: Slobodan Jenko, Jingxuan He, Niels Mündler, Mark Vero, Martin Vechev

    Abstract: Modern code completion engines, powered by large language models, have demonstrated impressive capabilities to generate functionally correct code based on surrounding context. As these tools are extensively used by millions of developers, it is crucial to investigate their security implications. In this work, we present INSEC, a novel attack that directs code completion engines towards generating… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  33. arXiv:2408.02311  [pdf, other

    cs.SE

    PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

    Authors: Junda He, Bowen Xu, Zhou Yang, DongGyun Han, Chengran Yang, Jiakun Liu, Zhipeng Zhao, David Lo

    Abstract: Stack Overflow is one of the most influential Software Question & Answer (SQA) websites, hosting millions of programming-related questions and answers. Tags play a critical role in efficiently organizing the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant content. Poorly selected tags often raise problems like tag ambiguity and tag explosion.… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2203.10965

  34. arXiv:2408.01812  [pdf, other

    cs.CV

    SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm

    Authors: Junyan Ye, Jun He, Weijia Li, Zhutao Lv, Jinhua Yu, Haote Yang, Conghui He

    Abstract: Street-to-satellite image synthesis focuses on generating realistic satellite images from corresponding ground street-view images while maintaining a consistent content layout, similar to looking down from the sky. The significant differences in perspectives create a substantial domain gap between the views, making this cross-view generation task particularly challenging. In this paper, we introdu… ▽ More

    Submitted 17 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures

  35. HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection

    Authors: Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Yuchen Sun, Qingming Huang

    Abstract: With the progressive advancements in deep graph learning, out-of-distribution (OOD) detection for graph data has emerged as a critical challenge. While the efficacy of auxiliary datasets in enhancing OOD detection has been extensively studied for image and text data, such approaches have not yet been explored for graph data. Unlike Euclidean data, graph data exhibits greater diversity but lower ro… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 32nd ACM International Conference on Multimedia

  36. arXiv:2407.21369  [pdf, other

    cs.SE

    An LLM-based Readability Measurement for Unit Tests' Context-aware Inputs

    Authors: Zhichao Zhou, Yutian Tang, Yun Lin, Jingzhu He

    Abstract: Automated test techniques usually generate unit tests with higher code coverage than manual tests. However, the readability of automated tests is crucial for code comprehension and maintenance. The readability of unit tests involves many aspects. In this paper, we focus on test inputs. The central limitation of existing studies on input readability is that they focus on test codes alone without ta… ▽ More

    Submitted 18 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  37. arXiv:2407.21159  [pdf, other

    cs.LG cs.CV

    Embedding Space Selection for Detecting Memorization and Fingerprinting in Generative Models

    Authors: Jack He, Jianxing Zhao, Andrew Bai, Cho-Jui Hsieh

    Abstract: In the rapidly evolving landscape of artificial intelligence, generative models such as Generative Adversarial Networks (GANs) and Diffusion Models have become cornerstone technologies, driving innovation in diverse fields from art creation to healthcare. Despite their potential, these models face the significant challenge of data memorization, which poses risks to privacy and the integrity of gen… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  38. arXiv:2407.20843  [pdf

    cs.CV

    DFE-IANet: A Method for Polyp Image Classification Based on Dual-domain Feature Extraction and Interaction Attention

    Authors: Wei Wang, Jixing He, Xin Wang

    Abstract: It is helpful in preventing colorectal cancer to detect and treat polyps in the gastrointestinal tract early. However, there have been few studies to date on designing polyp image classification networks that balance efficiency and accuracy. This challenge is mainly attributed to the fact that polyps are similar to other pathologies and have complex features influenced by texture, color, and morph… ▽ More

    Submitted 1 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by 2024 International Conference on Intelligent Computing (ICIC 2024). It can be accessed at http://poster-openaccess.com

    Journal ref: ICIC 2024, Tianjin, China, Poster Volume 1, pp.492-509, August 5-8, 2024

  39. arXiv:2407.20529  [pdf, other

    cs.LG cs.CR

    Can LLMs be Fooled? Investigating Vulnerabilities in LLMs

    Authors: Sara Abdali, Jia He, CJ Barberan, Richard Anarfi

    Abstract: The advent of Large Language Models (LLMs) has garnered significant popularity and wielded immense power across various domains within Natural Language Processing (NLP). While their capabilities are undeniably impressive, it is crucial to identify and scrutinize their vulnerabilities especially when those vulnerabilities can have costly consequences. One such LLM, trained to provide a concise summ… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 14 pages, 1 figure. arXiv admin note: text overlap with arXiv:2403.12503

  40. arXiv:2407.19979  [pdf, other

    cs.CR

    Private and Secure Fuzzy Name Matching

    Authors: Harsh Kasyap, Ugur Ilker Atmaca, Carsten Maple, Graham Cormode, Jiancong He

    Abstract: Modern financial institutions rely on data for many operations, including a need to drive efficiency, enhance services and prevent financial crime. Data sharing across an organisation or between institutions can facilitate rapid, evidence-based decision making, including identifying money laundering and fraud. However, data privacy regulations impose restrictions on data sharing. Privacy-enhancing… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 13 pages

  41. FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning

    Authors: Jinhui Pang, Changqing Lin, Xiaoshuai Hao, Rong Yin, Zixuan Wang, Zhihui Zhang, Jinglin He, Huang Tai Sheng

    Abstract: Continual graph learning (CGL) is an important and challenging task that aims to extend static GNNs to dynamic task flow scenarios. As one of the mainstream CGL methods, the experience replay (ER) method receives widespread attention due to its superior performance. However, existing ER methods focus on identifying samples by feature significance or topological relevance, which limits their utiliz… ▽ More

    Submitted 8 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM Multimedia 2024

  42. arXiv:2407.18324  [pdf, other

    cs.LG cs.CL eess.AS q-fin.CP q-fin.ST

    AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction

    Authors: Shengkun Wang, Taoran Ji, Jianfeng He, Mariam Almutairi, Dan Wang, Linhan Wang, Min Zhang, Chang-Tien Lu

    Abstract: Stock volatility prediction is an important task in the financial industry. Recent advancements in multimodal methodologies, which integrate both textual and auditory data, have demonstrated significant improvements in this domain, such as earnings calls (Earnings calls are public available and often involve the management team of a public company and interested parties to discuss the company's ea… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  43. arXiv:2407.17915  [pdf, other

    cs.CR cs.AI

    The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

    Authors: Zihui Wu, Haichang Gao, Jianping He, Ping Wang

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but their power comes with significant security considerations. While extensive research has been conducted on the safety of LLMs in chat mode, the security implications of their function calling feature have been largely overlooked. This paper uncovers a critical vulnerability in the function calling process of LLMs, introduc… ▽ More

    Submitted 29 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  44. arXiv:2407.17211  [pdf, other

    cs.AI cs.NI cs.RO

    Testing Large Language Models on Driving Theory Knowledge and Skills for Connected Autonomous Vehicles

    Authors: Zuoyin Tang, Jianhua He, Dashuai Pei, Kezhong Liu, Tao Gao

    Abstract: Handling long tail corner cases is a major challenge faced by autonomous vehicles (AVs). While large language models (LLMs) hold great potentials to handle the corner cases with excellent generalization and explanation capabilities and received increasing research interest on application to autonomous driving, there are still technical barriers to be tackled, such as strict model performance and h… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  45. arXiv:2407.16719  [pdf, other

    cs.OH

    A Brief Discussion on the Philosophical Principles and Development Directions of Data Circulation

    Authors: Zhi Li, Lei Zhang, Junyi Xin, Jianfei He, Yan Li, Zhenjun Ma, Qi Sun

    Abstract: The data circulation is a complex scenario involving a large number of participants and different types of requirements, which not only has to comply with the laws and regulations, but also faces multiple challenges in technical and business areas. In order to systematically and comprehensively address these issues, it is essential to have a comprehensive and profound understanding of 'data circul… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  46. arXiv:2407.15791  [pdf, other

    cs.CV

    RADA: Robust and Accurate Feature Learning with Domain Adaptation

    Authors: Jingtai He, Gehao Zhang, Tingting Liu, Songlin Du

    Abstract: Recent advancements in keypoint detection and descriptor extraction have shown impressive performance in local feature learning tasks. However, existing methods generally exhibit suboptimal performance under extreme conditions such as significant appearance changes and domain shifts. In this study, we introduce a multi-level feature aggregation network that incorporates two pivotal components to f… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  47. arXiv:2407.13690  [pdf, other

    cs.CL cs.AI

    DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

    Authors: Yuxuan Tong, Xiwen Zhang, Rui Wang, Ruidong Wu, Junxian He

    Abstract: Solving mathematical problems requires advanced reasoning abilities and presents notable challenges for large language models. Previous works usually synthesize data from proprietary models to augment existing datasets, followed by instruction tuning to achieve top-tier results. However, our analysis of these datasets reveals severe biases towards easy queries, with frequent failures to generate a… ▽ More

    Submitted 18 June, 2024; originally announced July 2024.

    Comments: Preprint. Data and model checkpoints are available at https://github.com/hkust-nlp/dart-math

  48. arXiv:2407.12854  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

    Authors: Rulin Shao, Jacqueline He, Akari Asai, Weijia Shi, Tim Dettmers, Sewon Min, Luke Zettlemoyer, Pang Wei Koh

    Abstract: Scaling laws with respect to the amount of training data and the number of parameters allow us to predict the cost-benefit trade-offs of pretraining language models (LMs) in different configurations. In this paper, we consider another dimension of scaling: the amount of data available at inference time. Specifically, we find that increasing the size of the datastore used by a retrieval-based LM mo… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  49. arXiv:2407.12538  [pdf, other

    eess.IV cs.CV

    High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

    Authors: Juan Song, Jiaxiang He, Mingtao Feng, Keyan Wang, Yunsong Li, Ajmal Mian

    Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compressio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  50. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon