Skip to main content

Showing 1–50 of 2,531 results for author: Zhou, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03594  [pdf, other

    cs.GT

    A Complete Landscape of EFX Allocations of Mixed Manna on Graphs

    Authors: Yu Zhou, Tianze Wei, Minming Li, Bo Li

    Abstract: We study envy-free up to any item (EFX) allocations on graphs where vertices and edges represent agents and items respectively. An agent is only interested in items that are incident to her and all other items have zero marginal values to her. Christodoulou et al. [EC, 2023] first proposed this setting and studied the case of goods. We extend this setting to the case of mixed manna where an item m… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted in IJCAI 2024

  2. arXiv:2409.02386  [pdf, other

    cs.CR cs.SE

    Dissecting Payload-based Transaction Phishing on Ethereum

    Authors: Zhuo Chen, Yufeng Hu, Bowen He, Dong Luo, Lei Wu, Yajin Zhou

    Abstract: In recent years, a more advanced form of phishing has arisen on Ethereum, surpassing early-stage, simple transaction phishing. This new form, which we refer to as payload-based transaction phishing (PTXPHISH), manipulates smart contract interactions through the execution of malicious payloads to deceive users. PTXPHISH has rapidly emerged as a significant threat, leading to incidents that caused l… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  3. arXiv:2409.02123  [pdf, other

    cs.LG cs.AI physics.ao-ph

    PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

    Authors: Shengchen Zhu, Yiming Chen, Peiying Yu, Xiang Qu, Yuxiao Zhou, Yiming Ma, Zhizhan Zhao, Yukai Liu, Hao Mi, Bin Wang

    Abstract: Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechani… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  4. arXiv:2409.01971  [pdf, other

    cs.CV

    Snapshot: Towards Application-centered Models for Pedestrian Trajectory Prediction in Urban Traffic Environments

    Authors: Nico Uhlemann, Yipeng Zhou, Tobias Mohr, Markus Lienkamp

    Abstract: This paper explores pedestrian trajectory prediction in urban traffic while focusing on both model accuracy and real-world applicability. While promising approaches exist, they are often not publicly available, revolve around pedestrian datasets excluding traffic-related information, or resemble architectures that are either not real-time capable or robust. To address these limitations, we first i… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 8 Pages, 9 Figures

  5. arXiv:2409.01075  [pdf, other

    cs.DC

    Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization

    Authors: Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Cong Guo, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Wenxi Zhu, Minwen Deng

    Abstract: Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely heavily on predefined samples to guide the compilation process, which restricts their adaptability and efficiency. These sample-driven methods struggle to effi… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  6. arXiv:2409.00960  [pdf, other

    cs.CR

    Unveiling the Vulnerability of Private Fine-Tuning in Split-Based Frameworks for Large Language Models: A Bidirectionally Enhanced Attack

    Authors: Guanzhong Chen, Zhenghan Qin, Mingxin Yang, Yajie Zhou, Tao Fan, Tianyu Du, Zenglin Xu

    Abstract: Recent advancements in pre-trained large language models (LLMs) have significantly influenced various domains. Adapting these models for specific tasks often involves fine-tuning (FT) with private, domain-specific data. However, privacy concerns keep this data undisclosed, and the computational demands for deploying LLMs pose challenges for resource-limited data holders. This has sparked interest… ▽ More

    Submitted 4 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: ACM Conference on Computer and Communications Security 2024 (CCS 24)

    ACM Class: K.6.5

  7. arXiv:2409.00947  [pdf, other

    cs.CV cs.AI

    XNet v2: Fewer Limitations, Better Results and Greater Universality

    Authors: Yanfeng Zhou, Lingrui Li, Zichen Wang, Guole Liu, Ziwen Liu, Ge Yang

    Abstract: XNet introduces a wavelet-based X-shaped unified architecture for fully- and semi-supervised biomedical segmentation. So far, however, XNet still faces the limitations, including performance degradation when images lack high-frequency (HF) information, underutilization of raw images and insufficient fusion. To address these issues, we propose XNet v2, a low- and high-frequency complementary model.… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  8. arXiv:2409.00942  [pdf, other

    cs.CV

    VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization

    Authors: Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen

    Abstract: Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in multi-class anomaly detection, wherein the normal data is compounded with multiple classes without providing class labels. Through the integration of… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  9. GroomCap: High-Fidelity Prior-Free Hair Capture

    Authors: Yuxiao Zhou, Menglei Chai, Daoye Wang, Sebastian Winberg, Erroll Wood, Kripasindhu Sarkar, Markus Gross, Thabo Beeler

    Abstract: Despite recent advances in multi-view hair reconstruction, achieving strand-level precision remains a significant challenge due to inherent limitations in existing capture pipelines. We introduce GroomCap, a novel multi-view hair capture method that reconstructs faithful and high-fidelity hair geometry without relying on external data priors. To address the limitations of conventional reconstructi… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Accepted by SIGGRAPH Asia 2024

  10. arXiv:2409.00031  [pdf, other

    cs.HC cs.AI

    Quality Assessment in the Era of Large Models: A Survey

    Authors: Zicheng Zhang, Yingjie Zhou, Chunyi Li, Baixuan Zhao, Xiaohong Liu, Guangtao Zhai

    Abstract: Quality assessment, which evaluates the visual quality level of multimedia experiences, has garnered significant attention from researchers and has evolved substantially through dedicated efforts. Before the advent of large models, quality assessment typically relied on small expert models tailored for specific tasks. While these smaller models are effective at handling their designated tasks and… ▽ More

    Submitted 17 August, 2024; originally announced September 2024.

  11. arXiv:2408.16719  [pdf, other

    cs.CV

    H-SGANet: Hybrid Sparse Graph Attention Network for Deformable Medical Image Registration

    Authors: Yufeng Zhou, Wenming Cao

    Abstract: The integration of Convolutional Neural Network (ConvNet) and Transformer has emerged as a strong candidate for image registration, leveraging the strengths of both models and a large parameter space. However, this hybrid model, treating brain MRI volumes as grid or sequence structures, faces challenges in accurately representing anatomical connectivity, diverse brain regions, and vital connection… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  12. arXiv:2408.16403  [pdf, other

    cs.LG

    DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

    Authors: Kai Du, Yongle Xie, Tao Zhou, Yuancheng Zhou

    Abstract: Sequential propagation of chaos (SPoC) is a recently developed tool to solve mean-field stochastic differential equations and their related nonlinear Fokker-Planck equations. Based on the theory of SPoC, we present a new method (deepSPoC) that combines the interacting particle system of SPoC and deep learning. Under the framework of deepSPoC, two classes of frequently used deep models include full… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  13. arXiv:2408.16256  [pdf

    cs.LG cs.AI cs.NE q-bio.QM

    Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9

    Authors: Xia Jiang, Yijun Zhou, Alan Wells, Adam Brufsky

    Abstract: Breast cancer is one of the two cancers responsible for the most deaths in women, with about 42,000 deaths each year in the US. That there are over 300,000 breast cancers newly diagnosed each year suggests that only a fraction of the cancers result in mortality. Thus, most of the women undergo seemingly curative treatment for localized cancers, but a significant later succumb to metastatic disease… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  14. arXiv:2408.16181  [pdf, other

    math.OC cs.LG

    A Minibatch-SGD-Based Learning Meta-Policy for Inventory Systems with Myopic Optimal Policy

    Authors: Jiameng Lyu, Jinxing Xie, Shilin Yuan, Yuan Zhou

    Abstract: Stochastic gradient descent (SGD) has proven effective in solving many inventory control problems with demand learning. However, it often faces the pitfall of an infeasible target inventory level that is lower than the current inventory level. Several recent works (e.g., Huh and Rusmevichientong (2009), Shi et al.(2016)) are successful to resolve this issue in various inventory systems. However, t… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Forthcoming in Management Science

  15. VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

    Authors: Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia

    Abstract: Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generation exhibit two limitations. Firstly, they require the division of inputs into content prompt (transcript) and description prompt (style and speaker), i… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  16. arXiv:2408.15498  [pdf

    cs.LG cs.AI cs.NE q-bio.QM

    Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network

    Authors: Yijun Zhou, Om Arora-Jain, Xia Jiang

    Abstract: While machine learning has advanced in medicine, its widespread use in clinical applications, especially in predicting breast cancer metastasis, is still limited. We have been dedicated to constructing a DFNN model to predict breast cancer metastasis n years in advance. However, the challenge lies in efficiently identifying optimal hyperparameter values through grid search, given the constraints o… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  17. arXiv:2408.15287  [pdf, other

    quant-ph cs.LG

    Quantum-Powered Personalized Learning

    Authors: Yifan Zhou, Chong Cheng Xu, Mingi Song, Yew Kee Wong

    Abstract: This paper explores the transformative potential of quantum computing in the realm of personalized learning. Traditional machine learning models and GPU-based approaches have long been utilized to tailor educational experiences to individual student needs. However, these methods face significant challenges in terms of scalability, computational efficiency, and real-time adaptation to the dynamic n… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 9 pages, 2 figures

  18. arXiv:2408.14735  [pdf, other

    cs.MM cs.CR cs.DC

    PPVF: An Efficient Privacy-Preserving Online Video Fetching Framework with Correlated Differential Privacy

    Authors: Xianzhi Zhang, Yipeng Zhou, Di Wu, Quan Z. Sheng, Miao Hu, Linchang Xiao

    Abstract: Online video streaming has evolved into an integral component of the contemporary Internet landscape. Yet, the disclosure of user requests presents formidable privacy challenges. As users stream their preferred online videos, their requests are automatically seized by video content providers, potentially leaking users' privacy. Unfortunately, current protection methods are not well-suited to pre… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  19. arXiv:2408.14594  [pdf, other

    cs.CV

    MMR: Evaluating Reading Ability of Large Multimodal Models

    Authors: Jian Chen, Ruiyi Zhang, Yufan Zhou, Ryan Rossi, Jiuxiang Gu, Changyou Chen

    Abstract: Large multimodal models (LMMs) have demonstrated impressive capabilities in understanding various types of image, including text-rich images. Most existing text-rich image benchmarks are simple extraction-based question answering, and many LMMs now easily achieve high scores. This means that current benchmarks fail to accurately reflect performance of different models, and a natural idea is to bui… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  20. arXiv:2408.14520  [pdf, other

    cs.LG cs.AI cs.SI

    Towards Graph Prompt Learning: A Survey and Beyond

    Authors: Qingqing Long, Yuchen Yan, Peiyan Zhang, Chen Fang, Wentao Cui, Zhiyuan Ning, Meng Xiao, Ning Cao, Xiao Luo, Lingjun Xu, Shiyue Jiang, Zheng Fang, Chong Chen, Xian-Sheng Hua, Yuanchun Zhou

    Abstract: Large-scale "pre-train and prompt learning" paradigms have demonstrated remarkable adaptability, enabling broad applications across diverse domains such as question answering, image recognition, and multimodal retrieval. This approach fully leverages the potential of large-scale pre-trained models, reducing downstream data requirements and computational costs while enhancing model applicability ac… ▽ More

    Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: 19 pages, 2 figures

  21. arXiv:2408.14493  [pdf

    cs.LG eess.SY

    Extraction of Typical Operating Scenarios of New Power System Based on Deep Time Series Aggregation

    Authors: Zhaoyang Qu, Zhenming Zhang, Nan Qu, Yuguang Zhou, Yang Li, Tao Jiang, Min Li, Chao Long

    Abstract: Extracting typical operational scenarios is essential for making flexible decisions in the dispatch of a new power system. This study proposed a novel deep time series aggregation scheme (DTSAs) to generate typical operational scenarios, considering the large amount of historical operational snapshot data. Specifically, DTSAs analyze the intrinsic mechanisms of different scheduling operational sce… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by CAAI Transactions on Intelligence Technology

  22. arXiv:2408.14116  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    Hierarchical Learning and Computing over Space-Ground Integrated Networks

    Authors: Jingyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Linling Kuang

    Abstract: Space-ground integrated networks hold great promise for providing global connectivity, particularly in remote areas where large amounts of valuable data are generated by Internet of Things (IoT) devices, but lacking terrestrial communication infrastructure. The massive data is conventionally transferred to the cloud server for centralized artificial intelligence (AI) models training, raising huge… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 14 pages, 10 figures

  23. arXiv:2408.13586  [pdf, other

    cs.CL cs.AI

    Balancing Diversity and Risk in LLM Sampling: How to Select Your Method and Parameter for Open-Ended Text Generation

    Authors: Yuxuan Zhou, Margret Keuper, Mario Fritz

    Abstract: Sampling-based decoding strategies have been widely adopted for Large Language Models (LLMs) in numerous applications, which target a balance between diversity and quality via temperature tuning and tail truncation (e.g., top-k and top-p sampling). Considering the high dynamic range of the candidate next-token given different prefixes, recent studies propose to adaptively truncate the tail of LLM'… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  24. arXiv:2408.13479  [pdf, other

    quant-ph cs.LG q-bio.BM

    Quantum-machine-assisted Drug Discovery: Survey and Perspective

    Authors: Yidong Zhou, Jintai Chen, Jinglei Cheng, Gopal Karemore, Marinka Zitnik, Frederic T. Chong, Junyu Liu, Tianfan Fu, Zhiding Liang

    Abstract: Drug discovery and development is a highly complex and costly endeavor, typically requiring over a decade and substantial financial investment to bring a new drug to market. Traditional computer-aided drug design (CADD) has made significant progress in accelerating this process, but the development of quantum computing offers potential due to its unique capabilities. This paper discusses the integ… ▽ More

    Submitted 27 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: 27 pages, 10 figures

  25. arXiv:2408.13395  [pdf, other

    cs.CV

    Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing

    Authors: Yangyang Xu, Wenqi Shao, Yong Du, Haiming Zhu, Yang Zhou, Ping Luo, Shengfeng He

    Abstract: Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities, yet balancing reconstruction fidelity and editability for real images remains a significant challenge. In this work, we introduce \textbf{T}ask-\textbf{O}riented \textbf{D}iffusion \textbf{I}nversion (\textbf{TODInv}), a novel framework that inverts and edits real images tailored to specific… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  26. arXiv:2408.13233  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

    Authors: Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

    Abstract: The quadratic computational complexity in the self-attention mechanism of popular transformer architectures poses significant challenges for training and inference, particularly in terms of efficiency and memory requirements. Towards addressing these challenges, this paper introduces a novel fast computation method for gradient calculation in multi-layer transformer models. Our approach enables th… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  27. arXiv:2408.12615  [pdf, other

    eess.IV cs.CV cs.LG

    Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

    Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

    Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More

    Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

  28. arXiv:2408.12590  [pdf, other

    cs.CV cs.AI

    xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

    Authors: Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, Ran Xu, Caiming Xiong

    Abstract: We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of vi… ▽ More

    Submitted 31 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV24 AI4VA

  29. arXiv:2408.12300  [pdf, other

    cs.LG

    Tackling Data Heterogeneity in Federated Learning via Loss Decomposition

    Authors: Shuang Zeng, Pengxin Guo, Shuai Wang, Jianbo Wang, Yuyin Zhou, Liangqiong Qu

    Abstract: Federated Learning (FL) is a rising approach towards collaborative and privacy-preserving machine learning where large-scale medical datasets remain localized to each client. However, the issue of data heterogeneity among clients often compels local models to diverge, leading to suboptimal global models. To mitigate the impact of data heterogeneity on FL performance, we start with analyzing how FL… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted at MICCAI 2024

  30. arXiv:2408.12161  [pdf, other

    cs.CV

    Rebalancing Multi-Label Class-Incremental Learning

    Authors: Kaile Du, Yifan Zhou, Fan Lyu, Yuyang Li, Junzhou Xie, Yixi Shen, Fuyuan Hu, Guangcan Liu

    Abstract: Multi-label class-incremental learning (MLCIL) is essential for real-world multi-label applications, allowing models to learn new labels while retaining previously learned knowledge continuously. However, recent MLCIL approaches can only achieve suboptimal performance due to the oversight of the positive-negative imbalance problem, which manifests at both the label and loss levels because of the t… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  31. arXiv:2408.12115  [pdf

    cs.LG cs.CE econ.GN

    Cross-border Commodity Pricing Strategy Optimization via Mixed Neural Network for Time Series Analysis

    Authors: Lijuan Wang, Yijia Hu, Yan Zhou

    Abstract: In the context of global trade, cross-border commodity pricing largely determines the competitiveness and market share of businesses. However, existing methodologies often prove inadequate, as they lack the agility and precision required to effectively respond to the dynamic international markets. Time series data is of great significance in commodity pricing and can reveal market dynamics and tre… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 30 pages

  32. arXiv:2408.12113  [pdf

    cs.LG cs.AI cs.CY

    Risk Analysis in Customer Relationship Management via Quantile Region Convolutional Neural Network-Long Short-Term Memory and Cross-Attention Mechanism

    Authors: Yaowen Huang, Jun Der Leu, Baoli Lu, Yan Zhou

    Abstract: Risk analysis is an important business decision support task in customer relationship management (CRM), involving the identification of potential risks or challenges that may affect customer satisfaction, retention rates, and overall business performance. To enhance risk analysis in CRM, this paper combines the advantages of quantile region convolutional neural network-long short-term memory (QRCN… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 44 pages

  33. arXiv:2408.11907  [pdf, other

    cs.IT

    Higher-order Interpretations of Deepcode, a Learned Feedback Code

    Authors: Yingyao Zhou, Natasha Devroye, Gyorgy Turan, Milos Zefran

    Abstract: We present an interpretation of Deepcode, a learned feedback code that showcases higher-order error correction relative to an earlier interpretable model. By interpretation, we mean succinct analytical encoder and decoder expressions (albeit with learned parameters) in which the role of feedback in achieving error correction is easy to understand. By higher-order, we mean that longer sequences of… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: accepted to 60th Annual Allerton Conference

  34. arXiv:2408.11840  [pdf

    cs.CV cs.AI

    Joint PET-MRI Reconstruction with Diffusion Stochastic Differential Model

    Authors: Taofeng Xie, Zhuoxu Cui, Congcong Liu, Chen Luo, Huayu Wang, Yuanzhi Zhang, Xuemei Wang, Yihang Zhou, Qiyu Jin, Guoqing Chen, Dong Liang, Haifeng Wang

    Abstract: PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming by PET-MRI systems. We aim to accelerate MRI and improve PET image quality. This paper proposed a novel joint reconstruction model by diffusion stochastic differential equations based on learning the joint probability distribution of PET and MRI. Compare the results underscore the… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted as ISMRM 2024 Digital poster 6575. 04-09 May 2024 Singapore

    Journal ref: ISMRM 2024 Digital poster 6575

  35. arXiv:2408.11795  [pdf, other

    cs.CV

    EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model

    Authors: Feipeng Ma, Yizhou Zhou, Hebei Li, Zilong He, Siying Wu, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun

    Abstract: In the realm of multimodal research, numerous studies leverage substantial image-text pairs to conduct modal alignment learning, transforming Large Language Models (LLMs) into Multimodal LLMs and excelling in a variety of visual-language tasks. The prevailing methodologies primarily fall into two categories: self-attention-based and cross-attention-based methods. While self-attention-based methods… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  36. arXiv:2408.11552  [pdf, other

    cs.AI

    Explainable Deep Learning Framework for Human Activity Recognition

    Authors: Yiran Huang, Yexu Zhou, Haibin Zhao, Till Riedel, Michael Beigl

    Abstract: In the realm of human activity recognition (HAR), the integration of explainable Artificial Intelligence (XAI) emerges as a critical necessity to elucidate the decision-making processes of complex models, fostering transparency and trust. Traditional explanatory methods like Class Activation Mapping (CAM) and attention mechanisms, although effective in highlighting regions vital for decisions in v… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  37. arXiv:2408.10819  [pdf, other

    cs.CL cs.AI

    Exploiting Large Language Models Capabilities for Question Answer-Driven Knowledge Graph Completion Across Static and Temporal Domains

    Authors: Rui Yang, Jiahao Zhu, Jianping Man, Li Fang, Yi Zhou

    Abstract: Knowledge graph completion (KGC) aims to identify missing triples in a knowledge graph (KG). This is typically achieved through tasks such as link prediction and instance completion. However, these methods often focus on either static knowledge graphs (SKGs) or temporal knowledge graphs (TKGs), addressing only within-scope triples. This paper introduces a new generative completion framework called… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  38. arXiv:2408.10680  [pdf, other

    cs.CL cs.SD eess.AS

    Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper

    Authors: Tianyi Xu, Kaixun Huang, Pengcheng Guo, Yu Zhou, Longtao Huang, Hui Xue, Lei Xie

    Abstract: Pre-trained multilingual speech foundation models, like Whisper, have shown impressive performance across different languages. However, adapting these models to new or specific languages is computationally extensive and faces catastrophic forgetting problems. Addressing these issues, our study investigates strategies to enhance the model on new languages in the absence of original training data, w… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  39. arXiv:2408.10086  [pdf, other

    cs.AI

    ARMADA: Attribute-Based Multimodal Data Augmentation

    Authors: Xiaomeng Jin, Jeonghwan Kim, Yu Zhou, Kuan-Hao Huang, Te-Lin Wu, Nanyun Peng, Heng Ji

    Abstract: In Multimodal Language Models (MLMs), the cost of manually annotating high-quality image-text pair data for fine-tuning and alignment is extremely high. While existing multimodal data augmentation frameworks propose ways to augment image-text pairs, they either suffer from semantic inconsistency between texts and images, or generate unrealistic images, causing knowledge gap with real world example… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  40. arXiv:2408.09478  [pdf, other

    cs.LG cs.CR

    Mitigating Noise Detriment in Differentially Private Federated Learning with Model Pre-training

    Authors: Huitong Jin, Yipeng Zhou, Laizhong Cui, Quan Z. Sheng

    Abstract: Pre-training exploits public datasets to pre-train an advanced machine learning model, so that the model can be easily tuned to adapt to various downstream tasks. Pre-training has been extensively explored to mitigate computation and communication resource consumption. Inspired by these advantages, we are the first to explore how model pre-training can mitigate noise detriment in differentially pr… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  41. arXiv:2408.09241  [pdf, other

    cs.CV eess.IV

    Re-boosting Self-Collaboration Parallel Prompt GAN for Unsupervised Image Restoration

    Authors: Xin Lin, Yuyan Zhou, Jingtong Yue, Chao Ren, Kelvin C. K. Chan, Lu Qi, Ming-Hsuan Yang

    Abstract: Unsupervised restoration approaches based on generative adversarial networks (GANs) offer a promising solution without requiring paired datasets. Yet, these GAN-based approaches struggle to surpass the performance of conventional unsupervised GAN-based frameworks without significantly modifying model structures or increasing the computational complexity. To address these issues, we propose a self-… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: This paper is an extended and revised version of our previous work "Unsupervised Image Denoising in Real-World Scenarios via Self-Collaboration Parallel Generative Adversarial Branches"(https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_Unsupervised_Image_Denoising_in_Real-World_Scenarios_via_Self-Collaboration_Parallel_Generative_ICCV_2023_paper.pdf)

  42. arXiv:2408.09097  [pdf, other

    cs.CV cs.AI

    Depth-guided Texture Diffusion for Image Semantic Segmentation

    Authors: Wei Sun, Yuan Li, Qixiang Ye, Jianbin Jiao, Yanzhao Zhou

    Abstract: Depth information provides valuable insights into the 3D structure especially the outline of objects, which can be utilized to improve the semantic segmentation tasks. However, a naive fusion of depth information can disrupt feature and compromise accuracy due to the modality gap between the depth and the vision. In this work, we introduce a Depth-guided Texture Diffusion approach that effectively… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  43. arXiv:2408.08736  [pdf, other

    cs.CV

    Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution

    Authors: Tianyi Xu, Yiji Zhou, Xiaotao Hu, Kai Zhang, Anran Zhang, Xingye Qiu, Jun Xu

    Abstract: Arbitrary-scale super-resolution (ASSR) aims to learn a single model for image super-resolution at arbitrary magnifying scales. Existing ASSR networks typically comprise an off-the-shelf scale-agnostic feature extractor and an arbitrary scale upsampler. These feature extractors often use fixed network architectures to address different ASSR inference tasks, each of which is characterized by an inp… ▽ More

    Submitted 25 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: ECAI 2024

  44. arXiv:2408.08723  [pdf, other

    cs.CV cs.AI

    Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS

    Authors: Wei Sun, Xiaosong Zhang, Fang Wan, Yanzhao Zhou, Yuan Li, Qixiang Ye, Jianbin Jiao

    Abstract: Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing wor… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.07504 by other authors

  45. arXiv:2408.08642  [pdf, other

    cs.LG

    The Power of Bias: Optimizing Client Selection in Federated Learning with Heterogeneous Differential Privacy

    Authors: Jiating Ma, Yipeng Zhou, Qi Li, Quan Z. Sheng, Laizhong Cui, Jiangchuan Liu

    Abstract: To preserve the data privacy, the federated learning (FL) paradigm emerges in which clients only expose model gradients rather than original data for conducting model training. To enhance the protection of model gradients in FL, differentially private federated learning (DPFL) is proposed which incorporates differentially private (DP) noises to obfuscate gradients before they are exposed. Yet, an… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  46. arXiv:2408.08074  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    A Survey on Integrated Sensing, Communication, and Computation

    Authors: Dingzhu Wen, Yong Zhou, Xiaoyang Li, Yuanming Shi, Kaibin Huang, Khaled B. Letaief

    Abstract: The forthcoming generation of wireless technology, 6G, promises a revolutionary leap beyond traditional data-centric services. It aims to usher in an era of ubiquitous intelligent services, where everything is interconnected and intelligent. This vision requires the seamless integration of three fundamental modules: Sensing for information acquisition, communication for information sharing, and co… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  47. arXiv:2408.07673  [pdf

    cs.LG cs.AI cs.NE q-bio.QM

    Deep Learning: a Heuristic Three-stage Mechanism for Grid Searches to Optimize the Future Risk Prediction of Breast Cancer Metastasis Using EHR-based Clinical Data

    Authors: Xia Jiang, Yijun Zhou, Chuhan Xu, Adam Brufsky, Alan Wells

    Abstract: A grid search, at the cost of training and testing a large number of models, is an effective way to optimize the prediction performance of deep learning models. A challenging task concerning grid search is the time management. Without a good time management scheme, a grid search can easily be set off as a mission that will not finish in our lifetime. In this study, we introduce a heuristic three-s… ▽ More

    Submitted 15 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  48. arXiv:2408.07543  [pdf, other

    cs.CV cs.CL

    MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark

    Authors: Minxuan Zhou, Hao Liang, Tianpeng Li, Zhiyu Wu, Mingan Lin, Linzhuang Sun, Yaqi Zhou, Yan Zhang, Xiaoqin Huang, Yicong Chen, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchma… ▽ More

    Submitted 23 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  49. arXiv:2408.07516  [pdf, other

    cs.CV eess.IV

    DIffSteISR: Harnessing Diffusion Prior for Superior Real-world Stereo Image Super-Resolution

    Authors: Yuanbo Zhou, Xinlin Zhang, Wei Deng, Tao Wang, Tao Tan, Qinquan Gao, Tong Tong

    Abstract: We introduce DiffSteISR, a pioneering framework for reconstructing real-world stereo images. DiffSteISR utilizes the powerful prior knowledge embedded in pre-trained text-to-image model to efficiently recover the lost texture details in low-resolution stereo images. Specifically, DiffSteISR implements a time-aware stereo cross attention with temperature adapter (TASCATA) to guide the diffusion pro… ▽ More

    Submitted 14 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  50. arXiv:2408.07060  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

    Authors: Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong

    Abstract: Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agent… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.