Skip to main content

Showing 1–50 of 8,517 results for author: Zhang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03741  [pdf, other

    cs.LG cs.CR

    Understanding Data Importance in Machine Learning Attacks: Does Valuable Data Pose Greater Harm?

    Authors: Rui Wen, Michael Backes, Yang Zhang

    Abstract: Machine learning has revolutionized numerous domains, playing a crucial role in driving advancements and enabling data-centric processes. The significance of data in training models and shaping their performance cannot be overstated. Recent research has highlighted the heterogeneous impact of individual data samples, particularly the presence of valuable data that significantly contributes to the… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: To Appear in Network and Distributed System Security (NDSS) Symposium 2025

  2. arXiv:2409.03650  [pdf, other

    cs.LG cs.CL

    On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

    Authors: Yong Lin, Skyler Seto, Maartje ter Hoeve, Katherine Metcalf, Barry-John Theobald, Xuan Wang, Yizhe Zhang, Chen Huang, Tong Zhang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is an effective approach for aligning language models to human preferences. Central to RLHF is learning a reward function for scoring human preferences. Two main approaches for learning a reward model are 1) training an EXplicit Reward Model (EXRM) as in RLHF, and 2) using an implicit reward learned from preference data through methods such as Dire… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 12 pages, 8 tables, 2 figures

  3. arXiv:2409.03597  [pdf, other

    cs.SD cs.AI eess.AS

    Multimodal Laryngoscopic Video Analysis for Assisted Diagnosis of Vocal Cord Paralysis

    Authors: Yucong Zhang, Xin Zou, Jinshan Yang, Wenjun Chen, Faya Liang, Ming Li

    Abstract: This paper presents the Multimodal Analyzing System for Laryngoscope (MASL), a system that combines audio and video data to automatically extract key segments and metrics from laryngeal videostroboscopic videos for clinical assessment. MASL integrates glottis detection with keyword spotting to analyze patient vocalizations and refine video highlights for better inspection of vocal cord movements.… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  4. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  5. arXiv:2409.03370  [pdf, ps, other

    cs.IT

    Identification of non-causal systems with arbitrary switching modes

    Authors: Yanxin Zhang, Chengpu Yu, Filippo Fabiani

    Abstract: We consider the identification of non-causal systems with arbitrary switching modes (NCS-ASM), a class of models essential for describing typical power load management and department store inventory dynamics. The simultaneous identification of causal-and-anticausal subsystems, along with the presence of possibly random switching sequences, however, make the overall identification problem particula… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  6. arXiv:2409.03344  [pdf, other

    cs.CR

    Rethinking Improved Privacy-Utility Trade-off with Pre-existing Knowledge for DP Training

    Authors: Yu Zheng, Wenchao Zhang, Yonggang Zhang, Wei Song, Kai Zhou, Bo Han

    Abstract: Differential privacy (DP) provides a provable framework for protecting individuals by customizing a random mechanism over a privacy-sensitive dataset. Deep learning models have demonstrated privacy risks in model exposure as an established learning model unintentionally records membership-level privacy leakage. Differentially private stochastic gradient descent (DP- SGD) has been proposed to safeg… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 13 pages

  7. arXiv:2409.03271  [pdf, other

    cs.AI cs.CL cs.HC

    Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

    Authors: Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu

    Abstract: The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs). However, despite their widespread adoption and success, CoT methods often exhibit instability due to their inability to consistently ensure the quality of generated reasoning paths, leading to sub-optimal reasoning performance. To address this challenge,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  8. arXiv:2409.03190  [pdf, other

    cs.CV cs.GR

    Mastoidectomy Multi-View Synthesis from a Single Microscopy Image

    Authors: Yike Zhang, Jack Noble

    Abstract: Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy to insert an electrode array into the cochlea. In this paper, we introduce a novel pipeline that is capable of generating synthetic multi-view videos from a single CI microscope image. In our approach, we use a patient's pre-operative CT scan to predict the post-mastoidectomy surface using a method designed for this purp… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Submitted to Medical Imaging 2025: Image-Guided Procedures, Robotic Interventions, and Modeling

  9. arXiv:2409.02834  [pdf, other

    cs.CL

    CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models

    Authors: Wentao Liu, Qianjun Pan, Yi Zhang, Zhuo Liu, Ji Wu, Jie Zhou, Aimin Zhou, Qin Chen, Bo Jiang, Liang He

    Abstract: Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual math reasoning datasets (e.g., MATH, GSM8K). Recently, a few researchers have released English multimodal math datasets (e.g., MATHVISTA and MATH-V) to evaluate t… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  10. arXiv:2409.02648  [pdf, other

    cond-mat.mtrl-sci cs.CV

    Creating a Microstructure Latent Space with Rich Material Information for Multiphase Alloy Design

    Authors: Xudong Ma, Yuqi Zhang, Chenchong Wang, Ming Wang, Mingxin Huang, Wei Xu

    Abstract: The intricate microstructure serves as the cornerstone for the composition/processing-structure-property (CPSP) connection in multiphase alloys. Traditional alloy design methods often overlook microstructural details, which diminishes the reliability and effectiveness of the outcomes. This study introduces an improved alloy design algorithm that integrates authentic microstructural information to… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  11. arXiv:2409.02418  [pdf, other

    cs.CV

    MOSMOS: Multi-organ segmentation facilitated by medical report supervision

    Authors: Weiwei Tian, Xinyu Huang, Junlin Hou, Caiyue Ren, Longquan Jiang, Rui-Wei Zhao, Gang Jin, Yuejie Zhang, Daoying Geng

    Abstract: Owing to a large amount of multi-modal data in modern medical systems, such as medical images and reports, Medical Vision-Language Pre-training (Med-VLP) has demonstrated incredible achievements in coarse-grained downstream tasks (i.e., medical classification, retrieval, and visual question answering). However, the problem of transferring knowledge learned from Med-VLP to fine-grained multi-organ… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 14 pages, 7 figures

  12. arXiv:2409.02322  [pdf, other

    cs.LG cs.AI

    TimeDiT: General-purpose Diffusion Transformers for Time Series Foundation Model

    Authors: Defu Cao, Wen Ye, Yizhou Zhang, Yan Liu

    Abstract: With recent advances in building foundation models for texts and video data, there is a surge of interest in foundation models for time series. A family of models have been developed, utilizing a temporal auto-regressive generative Transformer architecture, whose effectiveness has been proven in Large Language Models. While the empirical results are promising, almost all existing time series found… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 23 Pages, 6 Figures, 11 Tables. First present at ICML 2024 Workshop on Foundation Models in the Wild

  13. arXiv:2409.02095  [pdf, other

    cs.CV cs.AI cs.GR

    DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

    Authors: Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan

    Abstract: Despite significant advancements in monocular depth estimation for static images, estimating video depth in the open world remains challenging, since open-world videos are extremely diverse in content, motion, camera movement, and length. We present DepthCrafter, an innovative method for generating temporally consistent long depth sequences with intricate details for open-world videos, without req… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Project webpage: https://depthcrafter.github.io

  14. arXiv:2409.02046  [pdf, other

    cs.CV

    Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

    Authors: Hu Wang, David Butler, Yuan Zhang, Jodie Avery, Steven Knox, Congbo Ma, Louise Hull, Gustavo Carneiro

    Abstract: Endometriosis, affecting about 10\% of individuals assigned female at birth, is challenging to diagnose and manage. Diagnosis typically involves the identification of various signs of the disease using either laparoscopic surgery or the analysis of T1/T2 MRI images, with the latter being quicker and cheaper but less accurate. A key diagnostic sign of endometriosis is the obliteration of the Pouch… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  15. arXiv:2409.02038  [pdf, other

    cs.CL cs.AI cs.DB

    BEAVER: An Enterprise Benchmark for Text-to-SQL

    Authors: Peter Baile Chen, Fabian Wenz, Yi Zhang, Moe Kayali, Nesime Tatbul, Michael Cafarella, Çağatay Demiralp, Michael Stonebraker

    Abstract: Existing text-to-SQL benchmarks have largely been constructed using publicly available tables from the web with human-generated tests containing question and SQL statement pairs. They typically show very good results and lead people to think that LLMs are effective at text-to-SQL tasks. In this paper, we apply off-the-shelf LLMs to a benchmark containing enterprise data warehouse data. In this env… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  16. arXiv:2409.01908  [pdf, other

    stat.ME cs.LG q-fin.ST stat.AP stat.ML

    Bayesian CART models for aggregate claim modeling

    Authors: Yaojun Zhang, Lanpeng Ji, Georgios Aivaliotis, Charles C. Taylor

    Abstract: This paper proposes three types of Bayesian CART (or BCART) models for aggregate claim amount, namely, frequency-severity models, sequential models and joint models. We propose a general framework for the BCART models applicable to data with multivariate responses, which is particularly useful for the joint BCART models with a bivariate response: the number of claims and aggregate claim amount. To… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  17. arXiv:2409.01816  [pdf, other

    cs.CV

    GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection

    Authors: Jinqing Zhang, Yanan Zhang, Yunlong Qi, Zehua Fu, Qingjie Liu, Yunhong Wang

    Abstract: Bird's-Eye-View (BEV) representation has emerged as a mainstream paradigm for multi-view 3D object detection, demonstrating impressive perceptual capabilities. However, existing methods overlook the geometric quality of BEV representation, leaving it in a low-resolution state and failing to restore the authentic geometric information of the scene. In this paper, we identify the reasons why previou… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  18. arXiv:2409.01661  [pdf, other

    cs.CR cs.CV cs.LG

    $S^2$NeRF: Privacy-preserving Training Framework for NeRF

    Authors: Bokang Zhang, Yanglin Zhang, Zhikun Zhang, Jinglan Yang, Lingying Huang, Junfeng Wu

    Abstract: Neural Radiance Fields (NeRF) have revolutionized 3D computer vision and graphics, facilitating novel view synthesis and influencing sectors like extended reality and e-commerce. However, NeRF's dependence on extensive data collection, including sensitive scene image data, introduces significant privacy risks when users upload this data for model training. To address this concern, we first propose… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

  19. arXiv:2409.01659  [pdf, other

    cs.CL

    Interpreting and Improving Large Language Models in Arithmetic Calculation

    Authors: Wei Zhang, Chaoqun Wan, Yonggang Zhang, Yiu-ming Cheung, Xinmei Tian, Xu Shen, Jieping Ye

    Abstract: Large language models (LLMs) have demonstrated remarkable potential across numerous applications and have shown an emergent ability to tackle complex reasoning tasks, such as mathematical computations. However, even for the simplest arithmetic calculations, the intrinsic mechanisms behind LLMs remain mysterious, making it challenging to ensure reliability. In this work, we delve into uncovering a… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by ICML 2024 (oral)

  20. arXiv:2409.01658  [pdf, other

    cs.CL

    From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

    Authors: Wei Chen, Zhen Huang, Liang Xie, Binbin Lin, Houqiang Li, Le Lu, Xinmei Tian, Deng Cai, Yonggang Zhang, Wenxiao Wan, Xu Shen, Jieping Ye

    Abstract: Large Language Models (LLMs) tend to prioritize adherence to user prompts over providing veracious responses, leading to the sycophancy issue. When challenged by users, LLMs tend to admit mistakes and provide inaccurate responses even if they initially provided the correct answer. Recent works propose to employ supervised fine-tuning (SFT) to mitigate the sycophancy issue, while it typically leads… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by ICML 2024

  21. arXiv:2409.01557  [pdf, other

    cs.CV

    TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video

    Authors: Chengqian Zhao, Zhao Yao, Zhaoyu Hu, Yuanxin Xie, Yafang Zhang, Yuanyuan Wang, Shuo Li, Jianhua Zhou, Jianqiao Zhou, Yin Wang, Jinhua Yu

    Abstract: In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  22. arXiv:2409.01515  [pdf, other

    cs.CY

    METcross: A framework for short-term forecasting of cross-city metro passenger flow

    Authors: Wenbo Lu, Jinhua Xu, Peikun Li, Ting Wang, Yong Zhang

    Abstract: Metro operation management relies on accurate predictions of passenger flow in the future. This study begins by integrating cross-city (including source and target city) knowledge and developing a short-term passenger flow prediction framework (METcross) for the metro. Firstly, we propose a basic framework for modeling cross-city metro passenger flow prediction from the perspectives of data fusion… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  23. arXiv:2409.01380  [pdf, other

    cs.CR cs.CL

    Membership Inference Attacks Against In-Context Learning

    Authors: Rui Wen, Zheng Li, Michael Backes, Yang Zhang

    Abstract: Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on gen… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: To Appear in the ACM Conference on Computer and Communications Security, October 14-18, 2024

  24. arXiv:2409.01327  [pdf, other

    cs.CV

    SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation

    Authors: Yang Zhang, Rui Zhang, Xuecheng Nie, Haochen Li, Jikun Chen, Yifan Hao, Xin Zhang, Luoqi Liu, Ling Li

    Abstract: Recent text-to-image models have achieved remarkable success in generating high-quality images. However, when tasked with multi-concept generation which creates images containing multiple characters or objects, existing methods often suffer from attribute confusion, resulting in severe text-image inconsistency. We found that attribute confusion occurs when a certain region of the latent features a… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  25. arXiv:2409.01236  [pdf, other

    cs.CV

    Spatial-Aware Conformal Prediction for Trustworthy Hyperspectral Image Classification

    Authors: Kangdao Liu, Tianhao Sun, Hao Zeng, Yongshan Zhang, Chi-Man Pun, Chi-Man Vong

    Abstract: Hyperspectral image (HSI) classification involves assigning specific labels to each pixel to identify various land cover categories. Although deep classifiers have shown high predictive accuracy in this field, quantifying their uncertainty remains a significant challenge, which hinders their application in critical contexts. This study first theoretically evaluates the applicability of \textit{Con… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  26. arXiv:2409.01192  [pdf, other

    cs.IR

    SSD4Rec: A Structured State Space Duality Model for Efficient Sequential Recommendation

    Authors: Haohao Qu, Yifeng Zhang, Liangbo Ning, Wenqi Fan, Qing Li

    Abstract: Sequential recommendation methods are crucial in modern recommender systems for their remarkable capability to understand a user's changing interests based on past interactions. However, a significant challenge faced by current methods (e.g., RNN- or Transformer-based models) is to effectively and efficiently capture users' preferences by modeling long behavior sequences, which impedes their vario… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  27. arXiv:2409.01184  [pdf, other

    cs.CV

    PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery

    Authors: Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon PÅ‚otka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa , et al. (7 additional authors not shown)

    Abstract: The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operat… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  28. arXiv:2409.01156  [pdf, other

    cs.CV

    TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval

    Authors: Leqi Shen, Tianxiang Hao, Sicheng Zhao, Yifeng Zhang, Pengzhang Liu, Yongjun Bao, Guiguang Ding

    Abstract: Most text-video retrieval methods utilize the text-image pre-trained CLIP as a backbone, incorporating complex modules that result in high computational overhead. As a result, many studies focus on efficient fine-tuning. The primary challenge in efficient adaption arises from the inherent differences between image and video modalities. Each sampled video frame must be processed by the image encode… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  29. arXiv:2409.01092  [pdf, other

    cs.ET cs.AI cs.NI

    Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach

    Authors: Wenshuai Liu, Yaru Fu, Yongna Guo, Fu Lee Wang, Wen Sun, Yan Zhang

    Abstract: Digital twins (DTs) have emerged as a promising enabler for representing the real-time states of physical worlds and realizing self-sustaining systems. In practice, DTs of physical devices, such as mobile users (MUs), are commonly deployed in multi-access edge computing (MEC) networks for the sake of reducing latency. To ensure the accuracy and fidelity of DTs, it is essential for MUs to regularly… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 15 pages, 14 figures

    ACM Class: C.2.3; C.2.4

  30. arXiv:2409.00924  [pdf, other

    cs.CV

    MedSAM-U: Uncertainty-Guided Auto Multi-Prompt Adaptation for Reliable MedSAM

    Authors: Nan Zhou, Ke Zou, Kai Ren, Mengting Luo, Linchao He, Meng Wang, Yidi Chen, Yi Zhang, Hu Chen, Huazhu Fu

    Abstract: The Medical Segment Anything Model (MedSAM) has shown remarkable performance in medical image segmentation, drawing significant attention in the field. However, its sensitivity to varying prompt types and locations poses challenges. This paper addresses these challenges by focusing on the development of reliable prompts that enhance MedSAM's accuracy. We introduce MedSAM-U, an uncertainty-guided f… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 10 pages, 4 figures

  31. arXiv:2409.00917  [pdf, other

    cs.CV

    Large Scale Unsupervised Brain MRI Image Registration Solution for Learn2Reg 2024

    Authors: Yuxi Zhang, Xiang Chen, Jiazheng Wang, Min Liu, Yaonan Wang, Dongdong Liu, Renjiu Hu, Hang Zhang

    Abstract: In this paper, we summarize the methods and experimental results we proposed for Task 2 in the learn2reg 2024 Challenge. This task focuses on unsupervised registration of anatomical structures in brain MRI images between different patients. The difficulty lies in: (1) without segmentation labels, and (2) a large amount of data. To address these challenges, we built an efficient backbone network an… ▽ More

    Submitted 4 September, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

    Comments: MICCAI Learn2Reg 2024 Challenge & WBIR 2024 Workshop on Biomedical Imaging Registration

  32. arXiv:2409.00695  [pdf, other

    cs.CV cs.AI

    Curriculum Prompting Foundation Models for Medical Image Segmentation

    Authors: Xiuqi Zheng, Yuhang Zhang, Haoran Zhang, Hongrui Liang, Xueqi Bao, Zhuqing Jiang, Qicheng Lao

    Abstract: Adapting large pre-trained foundation models, e.g., SAM, for medical image segmentation remains a significant challenge. A crucial step involves the formulation of a series of specialized prompts that incorporate specific clinical instructions. Past works have been heavily reliant on a singular type of prompt for each instance, necessitating manual input of an ideally correct prompt, which is less… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Accepted by MICCAI 2024

  33. arXiv:2409.00617  [pdf, other

    cs.CL cs.AI

    Does Knowledge Localization Hold True? Surprising Differences Between Entity and Relation Perspectives in Language Models

    Authors: Yifan Wei, Xiaoyan Yu, Yixuan Weng, Huanhuan Ma, Yuanzhe Zhang, Jun Zhao, Kang Liu

    Abstract: Large language models encapsulate knowledge and have demonstrated superior performance on various natural language processing tasks. Recent studies have localized this knowledge to specific model parameters, such as the MLP weights in intermediate layers. This study investigates the differences between entity and relational knowledge through knowledge editing. Our findings reveal that entity and r… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: CIKM 2024

  34. arXiv:2409.00606  [pdf, other

    cs.CV

    Style Transfer: From Stitching to Neural Networks

    Authors: Xinhe Xu, Zhuoer Wang, Yihan Zhang, Yizhou Liu, Zhaoyue Wang, Zhihao Xu, Muhan Zhao

    Abstract: This article compares two style transfer methods in image processing: the traditional method, which synthesizes new images by stitching together small patches from existing images, and a modern machine learning-based approach that uses a segmentation network to isolate foreground objects and apply style transfer solely to the background. The traditional method excels in creating artistic abstracti… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  35. arXiv:2409.00314  [pdf, other

    cs.CV

    Towards Secure and Usable 3D Assets: A Novel Framework for Automatic Visible Watermarking

    Authors: Gursimran Singh, Tianxi Hu, Mohammad Akbari, Qiang Tang, Yong Zhang

    Abstract: 3D models, particularly AI-generated ones, have witnessed a recent surge across various industries such as entertainment. Hence, there is an alarming need to protect the intellectual property and avoid the misuse of these valuable assets. As a viable solution to address these concerns, we rigorously define the novel task of automated 3D visible watermarking in terms of two competing aspects: water… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: Accepted to WACV2025

  36. arXiv:2409.00287  [pdf, other

    cs.DC

    Benchmarking the Performance of Large Language Models on the Cerebras Wafer Scale Engine

    Authors: Zuoning Zhang, Dhruv Parikh, Youning Zhang, Viktor Prasanna

    Abstract: Transformer based Large Language Models (LLMs) have recently reached state of the art performance in Natural Language Processing (NLP) and Computer Vision (CV) domains. LLMs use the Multi-Headed Self-Attention (MHSA) mechanism to capture long-range global attention relationships among input words or image patches, drastically improving its performance over prior deep learning approaches. In this p… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: IEEE HPEC 2024

  37. arXiv:2408.17397  [pdf, other

    cs.IT eess.SP

    End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

    Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

    Abstract: This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: major revision in IEEE JSAC

  38. arXiv:2408.17285  [pdf, other

    cs.CR cs.LG

    Image-Perfect Imperfections: Safety, Bias, and Authenticity in the Shadow of Text-To-Image Model Evolution

    Authors: Yixin Wu, Yun Shen, Michael Backes, Yang Zhang

    Abstract: Text-to-image models, such as Stable Diffusion (SD), undergo iterative updates to improve image quality and address concerns such as safety. Improvements in image quality are straightforward to assess. However, how model updates resolve existing concerns and whether they raise new questions remain unexplored. This study takes an initial step in investigating the evolution of text-to-image models f… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: To Appear in the ACM Conference on Computer and Communications Security, October 14-18, 2024

  39. arXiv:2408.17168  [pdf, other

    cs.CV

    EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

    Authors: Zhen Fan, Peng Dai, Zhuo Su, Xu Gao, Zheng Lv, Jiarui Zhang, Tianyuan Du, Guidong Wang, Yang Zhang

    Abstract: Egocentric human pose estimation (HPE) using wearable sensors is essential for VR/AR applications. Most methods rely solely on either egocentric-view images or sparse Inertial Measurement Unit (IMU) signals, leading to inaccuracies due to self-occlusion in images or the sparseness and drift of inertial sensors. Most importantly, the lack of real-world datasets containing both modalities is a major… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  40. arXiv:2408.17154  [pdf, other

    cs.CV

    Self-supervised Anomaly Detection Pretraining Enhances Long-tail ECG Diagnosis

    Authors: Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya Zhang, Yanfeng Wang

    Abstract: Current computer-aided ECG diagnostic systems struggle with the underdetection of rare but critical cardiac anomalies due to the imbalanced nature of ECG datasets. This study introduces a novel approach using self-supervised anomaly detection pretraining to address this limitation. The anomaly detection model is specifically designed to detect and localize subtle deviations from normal cardiac pat… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.04935

  41. arXiv:2408.17052  [pdf, other

    cs.CV

    Can We Leave Deepfake Data Behind in Training Deepfake Detector?

    Authors: Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, Chen Li

    Abstract: The generalization ability of deepfake detectors is vital for their applications in real-world scenarios. One effective solution to enhance this ability is to train the models with manually-blended data, which we termed "blendfake", encouraging models to learn generic forgery artifacts like blending boundary. Interestingly, current SoTA methods utilize blendfake without incorporating any deepfake… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  42. arXiv:2408.17042  [pdf, other

    cs.DS

    E-Graphs as Circuits, and Optimal Extraction via Treewidth

    Authors: Glenn Sun, Yihong Zhang, Haobin Ni

    Abstract: We solve the optimal extraction problem for e-graphs by first showing a connection between e-graphs and cyclic monotone Boolean circuits, then solving the weighted satisfiability problem for such circuits. The solution is a parameterized algorithm based on treewidth. Additionally, we show how the circuit view of e-graphs allows us to apply simplification techniques that are not possible when opera… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  43. arXiv:2408.16946  [pdf, other

    cs.CG

    Best of two worlds: Cartesian sampling and volume computation for distance-constrained configuration spaces using Cayley coordinates

    Authors: Yichi Zhang, Meera Sitharam

    Abstract: Volume calculation of configurational spaces acts as a vital part in configurational entropy calculation, which contributes towards calculating free energy landscape for molecular systems. In this article, we present our sampling-based volume computation method using distance-based Cayley coordinate, mitigating drawbacks: our method guarantees that the sampling procedure stays in lower-dimensional… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  44. arXiv:2408.16582  [pdf, other

    cs.CV cs.CR

    FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection

    Authors: Yangxiang Zhang, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong

    Abstract: With the rise in popularity of portable devices, the spread of falsified media on social platforms has become rampant. This necessitates the timely identification of authentic content. However, most advanced detection methods are computationally heavy, hindering their real-time application. In this paper, we describe an efficient two-stream architecture for real-time image manipulation detection.… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: BMVC 2024

  45. arXiv:2408.16451  [pdf, other

    cs.CV

    Weakly Supervised Object Detection for Automatic Tooth-marked Tongue Recognition

    Authors: Yongcun Zhang, Jiajun Xu, Yina He, Shaozi Li, Zhiming Luo, Huangwei Lei

    Abstract: Tongue diagnosis in Traditional Chinese Medicine (TCM) is a crucial diagnostic method that can reflect an individual's health status. Traditional methods for identifying tooth-marked tongues are subjective and inconsistent because they rely on practitioner experience. We propose a novel fully automated Weakly Supervised method using Vision transformer and Multiple instance learning WSVM for tongue… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  46. arXiv:2408.16315  [pdf, other

    cs.HC cs.LG eess.SP

    Passenger hazard perception based on EEG signals for highly automated driving vehicles

    Authors: Ashton Yu Xuan Tan, Yingkai Yang, Xiaofei Zhang, Bowen Li, Xiaorong Gao, Sifa Zheng, Jianqiang Wang, Xinyu Gu, Jun Li, Yang Zhao, Yuxin Zhang, Tania Stathaki

    Abstract: Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  47. arXiv:2408.16237  [pdf, other

    cs.DB

    MQRLD: A Multimodal Data Retrieval Platform with Query-aware Feature Representation and Learned Index Based on Data Lake

    Authors: Ming Sheng, Shuliang Wang, Yong Zhang, Kaige Wang, Jingyi Wang, Yi Luo, Rui Hao

    Abstract: Multimodal data has become a crucial element in the realm of big data analytics, driving advancements in data exploration, data mining, and empowering artificial intelligence applications. To support high-quality retrieval for these cutting-edge applications, a robust data retrieval platform should meet the requirements for transparent data storage, rich hybrid queries, effective feature represent… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 36 pages, 28 figures

  48. arXiv:2408.16132  [pdf, other

    eess.AS cs.MM cs.SD

    SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge

    Authors: You Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, Zhiyao Duan

    Abstract: With the advancements in singing voice generation and the growing presence of AI singers on media platforms, the inaugural Singing Voice Deepfake Detection (SVDD) Challenge aims to advance research in identifying AI-generated singing voices from authentic singers. This challenge features two tracks: a controlled setting track (CtrSVDD) and an in-the-wild scenario track (WildSVDD). The CtrSVDD trac… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  49. arXiv:2408.16094  [pdf, ps, other

    cs.DC

    Monadring: A lightweight consensus protocol to offer Validation-as-a-Service to AVS nodes

    Authors: Yu Zhang, Xiao Yan, Gang Tang, Helena Wang

    Abstract: Existing blockchain networks are often large-scale, requiring transactions to be synchronized across the entire network to reach consensus. On-chain computations can be prohibitively expensive, making many CPU-intensive computations infeasible. Inspired by the structure of IBM's token ring networks, we propose a lightweight consensus protocol called Monadring to address these issues. Monadring all… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 23 pages, 3 figures

  50. arXiv:2408.15978  [pdf, other

    cs.AI

    WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration

    Authors: Yao Zhang, Zijian Ma, Yunpu Ma, Zhen Han, Yu Wu, Volker Tresp

    Abstract: LLM-based autonomous agents often fail to execute complex web tasks that require dynamic interaction due to the inherent uncertainty and complexity of these environments. Existing LLM-based web agents typically rely on rigid, expert-designed policies specific to certain states and actions, which lack the flexibility and generalizability needed to adapt to unseen tasks. In contrast, humans excel by… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.