Skip to main content

Showing 1–50 of 4,874 results for author: Zhang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03487  [pdf, other

    cs.CV

    ScreenMark: Watermarking Arbitrary Visual Content on Screen

    Authors: Xiujian Liang, Gaozhi Liu, Yichao Si, Xiaoxiao Hu, Zhenxing Qian, Xinpeng Zhang

    Abstract: Digital watermarking has demonstrated its effectiveness in protecting multimedia content. However, existing watermarking are predominantly tailored for specific media types, rendering them less effective for the protection of content displayed on computer screens, which is often multimodal and dynamic. Visual Screen Content (VSC), is particularly susceptible to theft and leakage via screenshots, a… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.03247  [pdf, other

    cs.HC

    End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting

    Authors: Leijie Wang, Kathryn Yurechko, Pranati Dani, Quan Ze Chen, Amy X. Zhang

    Abstract: Existing tools for laypeople to create personal classifiers often assume a motivated user working uninterrupted in a single, lengthy session. However, users tend to engage with social media casually, with many short sessions on an ongoing, daily basis. To make creating personal classifiers for content curation easier for such users, tools should support rapid initialization and iterative refinemen… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.02969  [pdf, other

    cs.MS cs.LG math.OC

    LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

    Authors: Xiaoyuan Zhang, Liang Zhao, Yingying Yu, Xi Lin, Zhenkun Wang, Han Zhao, Qingfu Zhang

    Abstract: Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultane… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.02708  [pdf, other

    cs.LG stat.ME

    Few-shot Multi-Task Learning of Linear Invariant Features with Meta Subspace Pursuit

    Authors: Chaozhi Zhang, Lin Liu, Xiaoqun Zhang

    Abstract: Data scarcity poses a serious threat to modern machine learning and artificial intelligence, as their practical success typically relies on the availability of big datasets. One effective strategy to mitigate the issue of insufficient data is to first harness information from other data sources possessing certain similarities in the study design stage, and then employ the multi-task or meta learni… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  5. arXiv:2409.02608  [pdf, other

    cs.CV

    A Medical Multimodal Large Language Model for Pediatric Pneumonia

    Authors: Weiwei Tian, Xinyu Huang, Tianhao Cheng, Wen He, Jinwu Fang, Rui Feng, Daoying Geng, Xiaobo Zhang

    Abstract: Pediatric pneumonia is the leading cause of death among children under five years worldwide, imposing a substantial burden on affected families. Currently, there are three significant hurdles in diagnosing and treating pediatric pneumonia. Firstly, pediatric pneumonia shares similar symptoms with other respiratory diseases, making rapid and accurate differential diagnosis challenging. Secondly, pr… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 18 pages, 10 figures

  6. arXiv:2409.02492  [pdf

    cs.CV cs.LG eess.IV

    Reliable Deep Diffusion Tensor Estimation: Rethinking the Power of Data-Driven Optimization Routine

    Authors: Jialong Li, Zhicheng Zhang, Yunwei Chen, Qiqi Lu, Ye Wu, Xiaoming Liu, QianJin Feng, Yanqiu Feng, Xinyuan Zhang

    Abstract: Diffusion tensor imaging (DTI) holds significant importance in clinical diagnosis and neuroscience research. However, conventional model-based fitting methods often suffer from sensitivity to noise, leading to decreased accuracy in estimating DTI parameters. While traditional data-driven deep learning methods have shown potential in terms of accuracy and efficiency, their limited generalization to… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  7. arXiv:2409.02396  [pdf, other

    cs.NI eess.SP

    A Dynamic Resource Scheduling Algorithm Based on Traffic Prediction for Coexistence of eMBB and Random Arrival URLLC

    Authors: Yizhou Jiang, Xiujun Zhang, Xiaofeng Zhong, Shidong Zhou

    Abstract: In this paper, we propose a joint design for the coexistence of enhanced mobile broadband (eMBB) and ultra-reliable and random low-latency communication (URLLC) with different transmission time intervals (TTI): an eMBB scheduler operating at the beginning of each eMBB TTI to decide the coding redundancy of eMBB code blocks, and a URLLC scheduler at the beginning of each mini-slot to perform immedi… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  8. arXiv:2409.02041  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge

    Authors: Shutong Niu, Ruoyu Wang, Jun Du, Gaobin Yang, Yanhui Tu, Siyuan Wu, Shuangqing Qian, Huaxin Wu, Haitao Xu, Xueyang Zhang, Guolong Zhong, Xindi Yu, Jieru Chen, Mengzhi Wang, Di Cai, Tian Gao, Genshun Wan, Feng Ma, Jia Pan, Jianqing Gao

    Abstract: This technical report outlines our submission system for the CHiME-8 NOTSOFAR-1 Challenge. The primary difficulty of this challenge is the dataset recorded across various conference rooms, which captures real-world complexities such as high overlap rates, background noises, a variable number of speakers, and natural conversation styles. To address these issues, we optimized the system in several a… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  9. arXiv:2409.01994  [pdf, other

    cs.SE cs.CR

    BinPRE: Enhancing Field Inference in Binary Analysis Based Protocol Reverse Engineering

    Authors: Jiayi Jiang, Xiyuan Zhang, Chengcheng Wan, Haoyi Chen, Haiying Sun, Ting Su

    Abstract: Protocol reverse engineering (PRE) aims to infer the specification of network protocols when the source code is not available. Specifically, field inference is one crucial step in PRE to infer the field formats and semantics. To perform field inference, binary analysis based PRE techniques are one major approach category. However, such techniques face two key challenges - (1) the format inference… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024

  10. Focus Agent: LLM-Powered Virtual Focus Group

    Authors: Taiyu Zhang, Xuesong Zhang, Robbe Cools, Adalberto L. Simeone

    Abstract: In the domain of Human-Computer Interaction, focus groups represent a widely utilised yet resource-intensive methodology, often demanding the expertise of skilled moderators and meticulous preparatory efforts. This study introduces the ``Focus Agent,'' a Large Language Model (LLM) powered framework that simulates both the focus group (for data collection) and acts as a moderator in a focus group s… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 8 pages, the 24th Intelligent Virtual Agent Conference

    Journal ref: Taiyu Zhang, Xuesong Zhang, Robbe Cools, and Adalberto Simeone. 2024. Focus Agent: LLM-Powered Virtual Focus Group. In ACM International Conference on Intelligent Virtual Agents (IVA '24), September 16--19, 2024, GLASGOW, United Kingdom

  11. arXiv:2409.01704  [pdf, other

    cs.CV

    General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

    Authors: Haoran Wei, Chenglong Liu, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jianjian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang

    Abstract: Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain texts, math/molecular formulas, tables, charts, sheet music, and even geometric shapes) as "characters" and propose the General OCR Theory along with… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  12. arXiv:2409.01667  [pdf, other

    cs.CV

    VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning

    Authors: Muye Huang, Lingling Zhang, Lai Han, Wenjun Wu, Xinyu Zhang, Jun Liu

    Abstract: Charts are widely used for data visualization across various fields, including education, research, and business. Chart Question Answering (CQA) is an emerging task focused on the automatic interpretation and reasoning of data presented in charts. However, chart images are inherently difficult to interpret, and chart-related questions often involve complex logical and numerical reasoning, which hi… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  13. arXiv:2409.01605  [pdf, other

    cs.IR cs.AI

    Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information

    Authors: Xinyu Zhang, Linmei Hu, Luhao Zhang, Dandan Song, Heyan Huang, Liqiang Nie

    Abstract: Sequential recommender systems are essential for discerning user preferences from historical interactions and facilitating targeted recommendations. Recent innovations employing Large Language Models (LLMs) have advanced the field by encoding item semantics, yet they often necessitate substantial parameter tuning and are resource-demanding. Moreover, these works fails to consider the diverse chara… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 11 pages, 4 figures

  14. arXiv:2409.01577  [pdf, other

    cs.CV

    EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

    Authors: Muye Huang, Lai Han, Xinyu Zhang, Wenjun Wu, Jie Ma, Lingling Zhang, Jun Liu

    Abstract: Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual Language Models (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training meth… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  15. arXiv:2409.01327  [pdf, other

    cs.CV

    SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation

    Authors: Yang Zhang, Rui Zhang, Xuecheng Nie, Haochen Li, Jikun Chen, Yifan Hao, Xin Zhang, Luoqi Liu, Ling Li

    Abstract: Recent text-to-image models have achieved remarkable success in generating high-quality images. However, when tasked with multi-concept generation which creates images containing multiple characters or objects, existing methods often suffer from attribute confusion, resulting in severe text-image inconsistency. We found that attribute confusion occurs when a certain region of the latent features a… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  16. arXiv:2409.01193  [pdf, other

    cs.CR cs.CL cs.LG

    CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models

    Authors: Rui Zeng, Xi Chen, Yuwen Pu, Xuhong Zhang, Tianyu Du, Shouling Ji

    Abstract: Backdoors can be injected into NLP models to induce misbehavior when the input text contains a specific feature, known as a trigger, which the attacker secretly selects. Unlike fixed words, phrases, or sentences used in the static text trigger, NLP dynamic backdoor attacks design triggers associated with abstract and latent text features, making them considerably stealthier than traditional static… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: To appear in the Network and Distributed System Security (NDSS) Symposium, February, 2025

  17. arXiv:2409.01179  [pdf, other

    cs.CV

    Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information

    Authors: Yi Chen, Jian Xu, Xu-Yao Zhang, Wen-Zhuo Liu, Yang-Yang Liu, Cheng-Lin Liu

    Abstract: With the advancement of large-scale language modeling techniques, large multimodal models combining visual encoders with large language models have demonstrated exceptional performance in various visual tasks. Most of the current large-scale multimodal models achieve this by mapping visual features obtained from the visual encoder into a large language model and using them as inputs alongside text… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  18. arXiv:2409.00839  [pdf, other

    cs.CV cs.AI cs.IT

    Entropy Loss: An Interpretability Amplifier of 3D Object Detection Network for Intelligent Driving

    Authors: Haobo Yang, Shiyan Zhang, Zhuoyi Yang, Xinyu Zhang, Li Wang, Yifan Tang, Jilong Guo, Jun Li

    Abstract: With the increasing complexity of the traffic environment, the significance of safety perception in intelligent driving is intensifying. Traditional methods in the field of intelligent driving perception rely on deep learning, which suffers from limited interpretability, often described as a "black box." This paper introduces a novel type of loss function, termed "Entropy Loss," along with an inno… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  19. arXiv:2409.00740  [pdf, other

    cs.CR

    VPVet: Vetting Privacy Policies of Virtual Reality Apps

    Authors: Yuxia Zhan, Yan Meng, Lu Zhou, Yichang Xiong, Xiaokuan Zhang, Lichuan Ma, Guoxing Chen, Qingqi Pei, Haojin Zhu

    Abstract: Virtual reality (VR) apps can harvest a wider range of user data than web/mobile apps running on personal computers or smartphones. Existing law and privacy regulations emphasize that VR developers should inform users of what data are collected/used/shared (CUS) through privacy policies. However, privacy policies in the VR ecosystem are still in their early stages, and many developers fail to writ… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 18 pages, 13 figures (including subfigures), 13 tables. To appear on ACM CCS 2024

  20. arXiv:2409.00657  [pdf, other

    cs.DC

    HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration

    Authors: Weijian Chen, Shuibing He, Haoyang Qu, Xuechen Zhang, Dan Feng

    Abstract: Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a featu… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  21. arXiv:2409.00649  [pdf, other

    eess.IV cs.CV

    DeReStainer: H&E to IHC Pathological Image Translation via Decoupled Staining Channels

    Authors: Linda Wei, Shengyi Hua, Shaoting Zhang, Xiaofan Zhang

    Abstract: Breast cancer is a highly fatal disease among cancers in women, and early detection is crucial for treatment. HER2 status, a valuable diagnostic marker based on Immunohistochemistry (IHC) staining, is instrumental in determining breast cancer status. The high cost of IHC staining and the ubiquity of Hematoxylin and Eosin (H&E) staining make the conversion from H&E to IHC staining essential. In thi… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  22. arXiv:2409.00620  [pdf, other

    cs.CV cs.AI

    Enhancing Vectorized Map Perception with Historical Rasterized Maps

    Authors: Xiaoyu Zhang, Guangwei Liu, Zihao Liu, Ningyi Xu, Yunhui Liu, Ji Zhao

    Abstract: In autonomous driving, there is growing interest in end-to-end online vectorized map perception in bird's-eye-view (BEV) space, with an expectation that it could replace traditional high-cost offline high-definition (HD) maps. However, the accuracy and robustness of these methods can be easily compromised in challenging conditions, such as occlusion or adverse weather, when relying only on onboard… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024

  23. arXiv:2409.00499  [pdf, other

    cs.RO cs.CV

    DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

    Authors: Haonan Chang, Kowndinya Boyalakuntla, Yuhan Liu, Xinyu Zhang, Liam Schramm, Abdeslam Boularias

    Abstract: Solving storage problem: where objects must be accurately placed into containers with precise orientations and positions, presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same st… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Paper Accepted by IROS2024. Arxiv version is 8 pages

  24. arXiv:2409.00338  [pdf, other

    cs.LG cs.AI cs.SI

    GSpect: Spectral Filtering for Cross-Scale Graph Classification

    Authors: Xiaoyu Zhang, Wenchuan Yang, Jiawei Feng, Bitao Dai, Tianci Bu, Xin Lu

    Abstract: Identifying structures in common forms the basis for networked systems design and optimization. However, real structures represented by graphs are often of varying sizes, leading to the low accuracy of traditional graph classification methods. These graphs are called cross-scale graphs. To overcome this limitation, in this study, we propose GSpect, an advanced spectral graph filtering model for cr… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  25. arXiv:2409.00032  [pdf, other

    eess.SP cs.CE cs.LG

    ADformer: A Multi-Granularity Transformer for EEG-Based Alzheimer's Disease Assessment

    Authors: Yihe Wang, Nadia Mammone, Darina Petrovsky, Alexandros T. Tzallas, Francesco C. Morabito, Xiang Zhang

    Abstract: Electroencephalogram (EEG) has emerged as a cost-effective and efficient method for supporting neurologists in assessing Alzheimer's disease (AD). Existing approaches predominantly utilize handcrafted features or Convolutional Neural Network (CNN)-based methods. However, the potential of the transformer architecture, which has shown promising results in various time series analysis tasks, remains… ▽ More

    Submitted 17 August, 2024; originally announced September 2024.

    Comments: 17 pages main paper + 3 pages supplementary materials. This work will submit to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  26. arXiv:2408.17053  [pdf, other

    cs.LG

    Estimating Conditional Average Treatment Effects via Sufficient Representation Learning

    Authors: Pengfei Shi, Wei Zhong, Xinyu Zhang, Ningtao Wang, Xing Fu, Weiqiang Wang, Yin Jin

    Abstract: Estimating the conditional average treatment effects (CATE) is very important in causal inference and has a wide range of applications across many fields. In the estimation process of CATE, the unconfoundedness assumption is typically required to ensure the identifiability of the regression problems. When estimating CATE using high-dimensional data, there have been many variable selection methods… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  27. arXiv:2408.17027  [pdf, other

    cs.CV

    ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

    Authors: Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou, Soham Ghosh, Danushen Gnanapragasam, Varun Jampani, Hao Su, Leonidas Guibas

    Abstract: To advance the state of the art in the creation of 3D foundation models, this paper introduces the ConDense framework for 3D pre-training utilizing existing pre-trained 2D networks and large-scale multi-view datasets. We propose a novel 2D-3D joint training scheme to extract co-embedded 2D and 3D features in an end-to-end pipeline, where 2D-3D feature consistency is enforced through a volume rende… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  28. arXiv:2408.17009  [pdf, other

    cs.SD eess.AS

    Utilizing Speaker Profiles for Impersonation Audio Detection

    Authors: Hao Gu, JiangYan Yi, Chenglong Wang, Yong Ren, Jianhua Tao, Xinrui Yan, Yujie Chen, Xiaohui Zhang

    Abstract: Fake audio detection is an emerging active topic. A growing number of literatures have aimed to detect fake utterance, which are mostly generated by Text-to-speech (TTS) or voice conversion (VC). However, countermeasures against impersonation remain an underexplored area. Impersonation is a fake type that involves an imitator replicating specific traits and speech style of a target speaker. Unlike… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM2024

  29. arXiv:2408.16975  [pdf, other

    q-bio.BM cs.AI cs.LG

    Technical Report of HelixFold3 for Biomolecular Structure Prediction

    Authors: Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Xiaonan Zhang, Xiaomin Fang

    Abstract: The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predicti… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  30. arXiv:2408.16958  [pdf, other

    cs.LG cs.AI

    Discovery of False Data Injection Schemes on Frequency Controllers with Reinforcement Learning

    Authors: Romesh Prasad, Malik Hassanaly, Xiangyu Zhang, Abhijeet Sahu

    Abstract: While inverter-based distributed energy resources (DERs) play a crucial role in integrating renewable energy into the power system, they concurrently diminish the grid's system inertia, elevating the risk of frequency instabilities. Furthermore, smart inverters, interfaced via communication networks, pose a potential vulnerability to cyber threats if not diligently managed. To proactively fortify… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  31. arXiv:2408.16500  [pdf, other

    cs.CV

    CogVLM2: Visual Language Models for Image and Video Understanding

    Authors: Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

    Abstract: Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  32. arXiv:2408.16486  [pdf, other

    cs.CV

    Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning

    Authors: Zhengqing Gao, Xiang Ao, Xu-Yao Zhang, Cheng-Lin Liu

    Abstract: Adapting pre-trained models to open classes is a challenging problem in machine learning. Vision-language models fully explore the knowledge of text modality, demonstrating strong zero-shot recognition performance, which is naturally suited for various open-set problems. More recently, some research focuses on fine-tuning such models to downstream tasks. Prompt tuning methods achieved huge improve… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: PRCV 2024

  33. arXiv:2408.16315  [pdf, other

    cs.HC cs.LG eess.SP

    Passenger hazard perception based on EEG signals for highly automated driving vehicles

    Authors: Ashton Yu Xuan Tan, Yingkai Yang, Xiaofei Zhang, Bowen Li, Xiaorong Gao, Sifa Zheng, Jianqiang Wang, Xinyu Gu, Jun Li, Yang Zhao, Yuxin Zhang, Tania Stathaki

    Abstract: Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  34. arXiv:2408.16277  [pdf

    eess.IV cs.CV

    Fine-grained Classification of Port Wine Stains Using Optical Coherence Tomography Angiography

    Authors: Xiaofeng Deng, Defu Chen, Bowen Liu, Xiwan Zhang, Haixia Qiu, Wu Yuan, Hongliang Ren

    Abstract: Accurate classification of port wine stains (PWS, vascular malformations present at birth), is critical for subsequent treatment planning. However, the current method of classifying PWS based on the external skin appearance rarely reflects the underlying angiopathological heterogeneity of PWS lesions, resulting in inconsistent outcomes with the common vascular-targeted photodynamic therapy (V-PDT)… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  35. arXiv:2408.16265  [pdf, other

    cs.CV

    Low Saturation Confidence Distribution-based Test-Time Adaptation for Cross-Domain Remote Sensing Image Classification

    Authors: Yu Liang, Xiucheng Zhang, Juepeng Zheng, Jianxi Huang, Haohuan Fu

    Abstract: Although the Unsupervised Domain Adaptation (UDA) method has improved the effect of remote sensing image classification tasks, most of them are still limited by access to the source domain (SD) data. Designs such as Source-free Domain Adaptation (SFDA) solve the challenge of a lack of SD data, however, they still rely on a large amount of target domain data and thus cannot achieve fast adaptations… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  36. arXiv:2408.16233  [pdf, other

    cs.CV

    PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

    Authors: Shiguang Wang, Tao Xie, Haijun Liu, Xingcheng Zhang, Jian Cheng

    Abstract: Channel Pruning is one of the most widespread techniques used to compress deep neural networks while maintaining their performances. Currently, a typical pruning algorithm leverages neural architecture search to directly find networks with a configurable width, the key step of which is to identify representative subnet for various pruning ratios by training a supernet. However, current methods mai… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 10pages, Neural Networks

  37. arXiv:2408.15994  [pdf, other

    cs.CV

    Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration

    Authors: Xu Zhang, Jiaqi Ma, Guoli Wang, Qian Zhang, Huan Zhang, Lefei Zhang

    Abstract: The limitations of task-specific and general image restoration methods for specific degradation have prompted the development of all-in-one image restoration techniques. However, the diversity of patterns among multiple degradation, along with the significant uncertainties in mapping between degraded images of different severities and their corresponding undistorted versions, pose significant chal… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

  38. arXiv:2408.15844  [pdf, other

    cs.CV cs.IT

    Shot Segmentation Based on Von Neumann Entropy for Key Frame Extraction

    Authors: Xueqing Zhang, Di Fu, Naihao Liu

    Abstract: Video key frame extraction is important in various fields, such as video summary, retrieval, and compression. Therefore, we suggest a video key frame extraction algorithm based on shot segmentation using Von Neumann entropy. The segmentation of shots is achieved through the computation of Von Neumann entropy of the similarity matrix among frames within the video sequence. The initial frame of each… ▽ More

    Submitted 29 August, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: 14 pages, 5 figures

  39. arXiv:2408.15310  [pdf, other

    q-bio.MN cs.CE cs.LG

    RGDA-DDI: Residual graph attention network and dual-attention based framework for drug-drug interaction prediction

    Authors: Changjian Zhou, Xin Zhang, Jiafeng Li, Jia Song, Wensheng Xiang

    Abstract: Recent studies suggest that drug-drug interaction (DDI) prediction via computational approaches has significant importance for understanding the functions and co-prescriptions of multiple drugs. However, the existing silico DDI prediction methods either ignore the potential interactions among drug-drug pairs (DDPs), or fail to explicitly model and fuse the multi-scale drug feature representations… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  40. arXiv:2408.15032  [pdf, other

    cs.CV cs.AI

    Mamba2MIL: State Space Duality Based Multiple Instance Learning for Computational Pathology

    Authors: Yuqi Zhang, Xiaoqian Zhang, Jiakai Wang, Yuancheng Yang, Taiying Peng, Chao Tong

    Abstract: Computational pathology (CPath) has significantly advanced the clinical practice of pathology. Despite the progress made, Multiple Instance Learning (MIL), a promising paradigm within CPath, continues to face challenges, particularly related to incomplete information utilization. Existing frameworks, such as those based on Convolutional Neural Networks (CNNs), attention, and selective scan space s… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  41. arXiv:2408.15018  [pdf, other

    cs.HC cs.AI

    Cross-subject Brain Functional Connectivity Analysis for Multi-task Cognitive State Evaluation

    Authors: Jun Chen, Anqi Chen, Bingkun Jiang, Mohammad S. Obaidat, Ni Li, Xinyu Zhang

    Abstract: Cognition refers to the function of information perception and processing, which is the fundamental psychological essence of human beings. It is responsible for reasoning and decision-making, while its evaluation is significant for the aviation domain in mitigating potential safety risks. Existing studies tend to use varied methods for cognitive state evaluation yet have limitations in timeliness,… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  42. arXiv:2408.14735  [pdf, other

    cs.MM cs.CR cs.DC

    PPVF: An Efficient Privacy-Preserving Online Video Fetching Framework with Correlated Differential Privacy

    Authors: Xianzhi Zhang, Yipeng Zhou, Di Wu, Quan Z. Sheng, Miao Hu, Linchang Xiao

    Abstract: Online video streaming has evolved into an integral component of the contemporary Internet landscape. Yet, the disclosure of user requests presents formidable privacy challenges. As users stream their preferred online videos, their requests are automatically seized by video content providers, potentially leaking users' privacy. Unfortunately, current protection methods are not well-suited to pre… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  43. arXiv:2408.14608  [pdf, other

    cs.LG stat.ML

    Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

    Authors: Lazar Atanackovic, Xi Zhang, Brandon Amos, Mathieu Blanchette, Leo J. Lee, Yoshua Bengio, Alexander Tong, Kirill Neklyudov

    Abstract: Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynamics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the p… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  44. arXiv:2408.14397  [pdf, other

    cs.AI cs.CL cs.CV

    Uncovering Knowledge Gaps in Radiology Report Generation Models through Knowledge Graphs

    Authors: Xiaoman Zhang, Julián N. Acosta, Hong-Yu Zhou, Pranav Rajpurkar

    Abstract: Recent advancements in artificial intelligence have significantly improved the automatic generation of radiology reports. However, existing evaluation methods fail to reveal the models' understanding of radiological images and their capacity to achieve human-level granularity in descriptions. To bridge this gap, we introduce a system, named ReXKG, which extracts structured information from process… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Code is available at: https://github.com/rajpurkarlab/ReXKG

  45. arXiv:2408.14342  [pdf, other

    cs.CV physics.med-ph

    Dual-Domain CLIP-Assisted Residual Optimization Perception Model for Metal Artifact Reduction

    Authors: Xinrui Zhang, Ailong Cai, Shaoyu Wang, Linyuan Wang, Zhizhong Zheng, Lei Li, Bin Yan

    Abstract: Metal artifacts in computed tomography (CT) imaging pose significant challenges to accurate clinical diagnosis. The presence of high-density metallic implants results in artifacts that deteriorate image quality, manifesting in the forms of streaking, blurring, or beam hardening effects, etc. Nowadays, various deep learning-based approaches, particularly generative models, have been proposed for me… ▽ More

    Submitted 29 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 14 pages, 18 figures

  46. arXiv:2408.13922  [pdf, other

    cs.CV

    COMPOSE: Comprehensive Portrait Shadow Editing

    Authors: Andrew Hou, Zhixin Shu, Xuaner Zhang, He Zhang, Yannick Hold-Geoffroy, Jae Shin Yoon, Xiaoming Liu

    Abstract: Existing portrait relighting methods struggle with precise control over facial shadows, particularly when faced with challenges such as handling hard shadows from directional light sources or adjusting shadows while remaining in harmony with existing lighting conditions. In many situations, completely altering input lighting is undesirable for portrait retouching applications: one may want to pres… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted at ECCV 2024

  47. arXiv:2408.13771  [pdf, other

    cs.CV

    ICFRNet: Image Complexity Prior Guided Feature Refinement for Real-time Semantic Segmentation

    Authors: Xin Zhang, Teodor Boyadzhiev, Jinglei Shi, Jufeng Yang

    Abstract: In this paper, we leverage image complexity as a prior for refining segmentation features to achieve accurate real-time semantic segmentation. The design philosophy is based on the observation that different pixel regions within an image exhibit varying levels of complexity, with higher complexities posing a greater challenge for accurate segmentation. We thus introduce image complexity as prior g… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  48. Enhancing Adaptive Deep Networks for Image Classification via Uncertainty-aware Decision Fusion

    Authors: Xu Zhang, Zhipeng Xie, Haiyang Yu, Qitong Wang, Peng Wang, Wei Wang

    Abstract: Handling varying computational resources is a critical issue in modern AI applications. Adaptive deep networks, featuring the dynamic employment of multiple classifier heads among different layers, have been proposed to address classification tasks under varying computing resources. Existing approaches typically utilize the last classifier supported by the available resources for inference, as the… ▽ More

    Submitted 29 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 13 pages, 27 figures. In ACM Multimedia 2024

  49. arXiv:2408.13681  [pdf, other

    cs.CE cs.SI

    Smart Home Cyber Insurance Pricing

    Authors: Xiaoyu Zhang, Maochao Xu, Shouhuai Xu

    Abstract: Our homes are increasingly employing various kinds of Internet of Things (IoT) devices, leading to the notion of smart homes. While this trend brings convenience to our daily life, it also introduces cyber risks. To mitigate such risks, the demand for smart home cyber insurance has been growing rapidly. However, there are no studies on analyzing the competency of smart home cyber insurance policie… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  50. arXiv:2408.13460  [pdf, other

    cs.LG cs.CR stat.ML

    DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction

    Authors: Xinwei Zhang, Zhiqi Bu, Mingyi Hong, Meisam Razaviyayn

    Abstract: Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP optimizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and DP noise injection. Howev… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.