Skip to main content

Showing 1–50 of 1,996 results for author: Zhou, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03755  [pdf, other

    cs.CV

    DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

    Authors: Wenliang Zhao, Haolin Wang, Jie Zhou, Jiwen Lu

    Abstract: Diffusion probabilistic models (DPMs) have shown remarkable performance in visual synthesis but are computationally expensive due to the need for multiple evaluations during the sampling. Recent predictor-corrector diffusion samplers have significantly reduced the required number of function evaluations (NFE), but inherently suffer from a misalignment issue caused by the extra corrector step, espe… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024

  2. arXiv:2409.03644  [pdf, other

    cs.CV

    RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images

    Authors: Benzhi Wang, Jingkai Zhou, Jingqi Bai, Yang Yang, Weihua Chen, Fan Wang, Zhen Lei

    Abstract: In recent years, diffusion models have revolutionized visual generation, outperforming traditional frameworks like Generative Adversarial Networks (GANs). However, generating images of humans with realistic semantic parts, such as hands and faces, remains a significant challenge due to their intricate structural complexity. To address this issue, we propose a novel post-processing solution named R… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  4. arXiv:2409.03420  [pdf, other

    cs.CV

    mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

    Authors: Anwen Hu, Haiyang Xu, Liang Zhang, Jiabo Ye, Ming Yan, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

    Abstract: Multimodel Large Language Models(MLLMs) have achieved promising OCR-free Document Understanding performance by increasing the supported resolution of document images. However, this comes at the cost of generating thousands of visual tokens for a single document image, leading to excessive GPU memory and slower inference times, particularly in multi-page document comprehension. In this work, to add… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 15 pages, 7 figures

  5. arXiv:2409.03213  [pdf, other

    cs.CV

    Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

    Authors: Shen Chen, Jiale Zhou, Lei Li

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a promising approach for 3D scene representation, offering a reduction in computational overhead compared to Neural Radiance Fields (NeRF). However, 3DGS is susceptible to high-frequency artifacts and demonstrates suboptimal performance under sparse viewpoint conditions, thereby limiting its applicability in robotics and computer vision. To address these… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  6. arXiv:2409.03179  [pdf, other

    eess.IV cs.CV

    Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem

    Authors: Qiwen Zhu, Yanjie Wang, Shilv Cai, Liqun Chen, Jiahuan Zhou, Luxin Yan, Sheng Zhong, Xu Zou

    Abstract: Training Single-Image Super-Resolution (SISR) models using pixel-based regression losses can achieve high distortion metrics scores (e.g., PSNR and SSIM), but often results in blurry images due to insufficient recovery of high-frequency details. Conversely, using GAN or perceptual losses can produce sharp images with high perceptual metric scores (e.g., LPIPS), but may introduce artifacts and inco… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  7. arXiv:2409.02834  [pdf, other

    cs.CL

    CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models

    Authors: Wentao Liu, Qianjun Pan, Yi Zhang, Zhuo Liu, Ji Wu, Jie Zhou, Aimin Zhou, Qin Chen, Bo Jiang, Liang He

    Abstract: Large language models (LLMs) have obtained promising results in mathematical reasoning, which is a foundational skill for human intelligence. Most previous studies focus on improving and measuring the performance of LLMs based on textual math reasoning datasets (e.g., MATH, GSM8K). Recently, a few researchers have released English multimodal math datasets (e.g., MATHVISTA and MATH-V) to evaluate t… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  8. arXiv:2409.02738  [pdf, other

    cs.RO

    SOAR: Simultaneous Exploration and Photographing with Heterogeneous UAVs for Fast Autonomous Reconstruction

    Authors: Mingjie Zhang, Chen Feng, Zengzhi Li, Guiyong Zheng, Yiming Luo, Zhu Wang, Jinni Zhou, Shaojie Shen, Boyu Zhou

    Abstract: Unmanned Aerial Vehicles (UAVs) have gained significant popularity in scene reconstruction. This paper presents SOAR, a LiDAR-Visual heterogeneous multi-UAV system specifically designed for fast autonomous reconstruction of complex environments. Our system comprises a LiDAR-equipped explorer with a large field-of-view (FoV), alongside photographers equipped with cameras. To ensure rapid acquisitio… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted to IROS2024. Code: https://github.com/SYSU-STAR/SOAR. Project page: http://sysu-star.com/SOAR/

  9. arXiv:2409.01557  [pdf, other

    cs.CV

    TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video

    Authors: Chengqian Zhao, Zhao Yao, Zhaoyu Hu, Yuanxin Xie, Yafang Zhang, Yuanyuan Wang, Shuo Li, Jianhua Zhou, Jianqiao Zhou, Yin Wang, Jinhua Yu

    Abstract: In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  10. arXiv:2409.00410  [pdf, other

    cs.CV

    A Hybrid Transformer-Mamba Network for Single Image Deraining

    Authors: Shangquan Sun, Wenqi Ren, Juxiang Zhou, Jianhou Gan, Rui Wang, Xiaochun Cao

    Abstract: Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions, limiting the exploitation of non-local receptive fields. In response to this issue, we introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies. Based on the prior of distinct spectra… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 12 pages, 9 figures

  11. arXiv:2409.00162  [pdf, other

    cs.CL cs.AI

    Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

    Authors: Jiayi Zhou, Jiaming Ji, Juntao Dai, Yaodong Yang

    Abstract: Aligning the behavior of Large language models (LLMs) with human intentions and values remains a critical challenge. Reinforcement learning from human feedback (RLHF) aligns LLMs by training a reward model (RM) on human preferences and fine-tuning the LLMs to maximize RM feedback. Despite its effectiveness and popularity, RLHF is prone to biased local optimization. It means RM fails to provide fee… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: 7 pages

  12. arXiv:2409.00121  [pdf, other

    eess.SP cs.AI cs.LG eess.AS

    BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding

    Authors: Jinzhao Zhou, Yiqun Duan, Fred Chang, Thomas Do, Yu-Kai Wang, Chin-Teng Lin

    Abstract: The remarkable success of large language models (LLMs) across various multi-modality applications is well established. However, integrating large language models with humans, or brain dynamics, remains relatively unexplored. In this paper, we introduce BELT-2, a pioneering multi-task model designed to enhance both encoding and decoding performance from EEG signals. To bolster the quality of the EE… ▽ More

    Submitted 28 August, 2024; originally announced September 2024.

  13. arXiv:2408.17248  [pdf, other

    cs.CR

    DeTRAP: RISC-V Return Address Protection With Debug Triggers

    Authors: Isaac Richter, Jie Zhou, John Criswell

    Abstract: Modern microcontroller software is often written in C/C++ and suffers from control-flow hijacking vulnerabilities. Previous mitigations suffer from high performance and memory overheads and require either the presence of memory protection hardware or sophisticated program analysis in the compiler. This paper presents DeTRAP (Debug Trigger Return Address Protection). DeTRAP utilizes a full implem… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: To appear at IEEE Secure Development Conference 2024

  14. arXiv:2408.17054  [pdf

    cs.CV

    BTMuda: A Bi-level Multi-source unsupervised domain adaptation framework for breast cancer diagnosis

    Authors: Yuxiang Yang, Xinyi Zeng, Pinxian Zeng, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Deep learning has revolutionized the early detection of breast cancer, resulting in a significant decrease in mortality rates. However, difficulties in obtaining annotations and huge variations in distribution between training sets and real scenes have limited their clinical applications. To address these limitations, unsupervised domain adaptation (UDA) methods have been used to transfer knowledg… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  15. arXiv:2408.16582  [pdf, other

    cs.CV cs.CR

    FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection

    Authors: Yangxiang Zhang, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong

    Abstract: With the rise in popularity of portable devices, the spread of falsified media on social platforms has become rampant. This necessitates the timely identification of authentic content. However, most advanced detection methods are computationally heavy, hindering their real-time application. In this paper, we describe an efficient two-stream architecture for real-time image manipulation detection.… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: BMVC 2024

  16. arXiv:2408.15657  [pdf, other

    cs.CV cs.RO

    TeFF: Tracking-enhanced Forgetting-free Few-shot 3D LiDAR Semantic Segmentation

    Authors: Junbao Zhou, Jilin Mei, Pengze Wu, Liang Chen, Fangzhou Zhao, Xijun Zhao, Yu Hu

    Abstract: In autonomous driving, 3D LiDAR plays a crucial role in understanding the vehicle's surroundings. However, the newly emerged, unannotated objects presents few-shot learning problem for semantic segmentation. This paper addresses the limitations of current few-shot semantic segmentation by exploiting the temporal continuity of LiDAR data. Employing a tracking model to generate pseudo-ground-truths… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  17. arXiv:2408.13823  [pdf, other

    cs.RO

    Improving GNSS Positioning in Challenging Urban Areas by Digital Twin Database Correction

    Authors: Jiarong Lian, Jiayi Zhou, Guohao Zhang, Li-Ta Hsu

    Abstract: Accurate positioning technology is the foundation for industry and business applications. Although indoor and outdoor positioning techniques have been well studied separately, positioning performance in the intermediate period of changing the positioning environment is still challenging. This paper proposed a digital twin-aided positioning correction method for seamless positioning focusing on imp… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 7 pages conference paper in indoor positioning and indoor navigation 2024

  18. arXiv:2408.13697  [pdf, other

    cs.CV

    Guided and Fused: Efficient Frozen CLIP-ViT with Feature Guidance and Multi-Stage Feature Fusion for Generalizable Deepfake Detection

    Authors: Yingjian Chen, Lei Zhang, Yakun Niu, Pei Chen, Lei Tan, Jing Zhou

    Abstract: The rise of generative models has sparked concerns about image authenticity online, highlighting the urgent need for an effective and general detector. Recent methods leveraging the frozen pre-trained CLIP-ViT model have made great progress in deepfake detection. However, these models often rely on visual-general features directly extracted by the frozen network, which contain excessive informatio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  19. arXiv:2408.13480  [pdf, other

    cs.DB

    Towards a Converged Relational-Graph Optimization Framework

    Authors: Yunkai Lou, Longbin Lai, Bingqing Lyu, Yufan Yang, Xiaoli Zhou, Wenyuan Yu, Ying Zhang, Jingren Zhou

    Abstract: The recent ISO SQL:2023 standard adopts SQL/PGQ (Property Graph Queries), facilitating graph-like querying within relational databases. This advancement, however, underscores a significant gap in how to effectively optimize SQL/PGQ queries within relational database systems. To address this gap, we extend the foundational SPJ(Select-Project-Join) queries to SPJM queries, which include an additiona… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  20. arXiv:2408.12829  [pdf, other

    cs.LG cs.SD eess.AS

    Uncertainty-Aware Mean Opinion Score Prediction

    Authors: Hui Wang, Shiwan Zhao, Jiaming Zhou, Xiguang Zheng, Haoqin Sun, Xuechen Wang, Yong Qin

    Abstract: Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these systems. In this paper, we point out that the absence of uncertainty modeling is a significant limitation hindering MOS prediction systems from applying to the real… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by Interspeech 2024, oral

  21. arXiv:2408.12673  [pdf, other

    cs.AI

    Enhancing Transferability of Adversarial Attacks with GE-AdvGAN+: A Comprehensive Framework for Gradient Editing

    Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Yuchen Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

    Abstract: Transferable adversarial attacks pose significant threats to deep neural networks, particularly in black-box scenarios where internal model information is inaccessible. Studying adversarial attack methods helps advance the performance of defense mechanisms and explore model vulnerabilities. These methods can uncover and exploit weaknesses in models, promoting the development of more robust archite… ▽ More

    Submitted 31 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  22. arXiv:2408.12606  [pdf, other

    cs.CV cs.AI

    Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model

    Authors: Luyang Luo, Mingxiang Wu, Mei Li, Yi Xin, Qiong Wang, Varut Vardhanabhuti, Winnie CW Chu, Zhenhui Li, Juan Zhou, Pranav Rajpurkar, Hao Chen

    Abstract: Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts… ▽ More

    Submitted 1 September, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 27 pages, 8 figures, 10 tables

  23. arXiv:2408.12236  [pdf, other

    cs.AI

    MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient

    Authors: Yanzeng Li, Cheng Zeng, Jinchao Zhang, Jie Zhou, Lei Zou

    Abstract: Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversati… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  24. arXiv:2408.11843  [pdf, other

    cs.CL cs.AI

    Editable Fairness: Fine-Grained Bias Mitigation in Language Models

    Authors: Ruizhe Chen, Yichen Li, Jianfei Yang, Joey Tianyi Zhou, Zuozhu Liu

    Abstract: Generating fair and accurate predictions plays a pivotal role in deploying large language models (LLMs) in the real world. However, existing debiasing methods inevitably generate unfair or incorrect predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.09341

  25. arXiv:2408.11811  [pdf, other

    cs.CV cs.RO

    EmbodiedSAM: Online Segment Any 3D Thing in Real Time

    Authors: Xiuwei Xu, Huangxing Chen, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu

    Abstract: Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration, so an online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed. Since high-quality 3D data is limited, directly training such a model in 3D is almost infeasible. Meanwhile, vision foundation models (VFM) has revolutionized the field of 2D computer vision with… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Project page: https://xuxw98.github.io/ESAM/

  26. arXiv:2408.11297  [pdf, other

    cs.CV

    Making Large Vision Language Models to be Good Few-shot Learners

    Authors: Fan Liu, Wenwen Cai, Jian Huo, Chuanyi Zhang, Delong Chen, Jun Zhou

    Abstract: Few-shot classification (FSC) is a fundamental yet challenging task in computer vision that involves recognizing novel classes from limited data. While previous methods have focused on enhancing visual features or incorporating additional modalities, Large Vision Language Models (LVLMs) offer a promising alternative due to their rich knowledge and strong visual perception. However, LVLMs risk lear… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  27. arXiv:2408.10908  [pdf, other

    cs.RO cs.HC

    Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

    Authors: Yiqun Duan, Zhuoli Zhuang, Jinzhao Zhou, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: This paper presents a pioneering exploration into the integration of fine-grained human supervision within the autonomous driving domain to enhance system performance. The current advances in End-to-End autonomous driving normally are data-driven and rely on given expert trials. However, this reliance limits the systems' generalizability and their ability to earn human trust. Addressing this gap,… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  28. arXiv:2408.10822  [pdf, other

    cs.LG

    Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting

    Authors: Jianxiang Zhou, Erdong Liu, Wei Chen, Siru Zhong, Yuxuan Liang

    Abstract: Traffic forecasting has emerged as a crucial research area in the development of smart cities. Although various neural networks with intricate architectures have been developed to address this problem, they still face two key challenges: i) Recent advancements in network designs for modeling spatio-temporal correlations are starting to see diminishing returns in performance enhancements. ii) Addit… ▽ More

    Submitted 25 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  29. arXiv:2408.10764  [pdf, other

    cs.CL

    Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

    Authors: Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou

    Abstract: Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc. Existing inference intervention approaches attempt to mitigate these issues by finetuning additional models to produce calibration signals (such as rewards) that guide the LLM's decoding process. However, this solution introduces substantial time and space overhead due… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 16 pages

  30. arXiv:2408.10679  [pdf, other

    cs.CV

    DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba

    Authors: Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou

    Abstract: Moire patterns arise when two similar repetitive patterns interfere, a phenomenon frequently observed during the capture of images or videos on screens. The color, shape, and location of moire patterns may differ across video frames, posing a challenge in learning information from adjacent frames and preserving temporal consistency. Previous video demoireing methods heavily rely on well-designed a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  31. arXiv:2408.10567  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG

    Prompt Your Brain: Scaffold Prompt Tuning for Efficient Adaptation of fMRI Pre-trained Model

    Authors: Zijian Dong, Yilei Wu, Zijiao Chen, Yichi Zhang, Yueming Jin, Juan Helen Zhou

    Abstract: We introduce Scaffold Prompt Tuning (ScaPT), a novel prompt-based framework for adapting large-scale functional magnetic resonance imaging (fMRI) pre-trained models to downstream tasks, with high parameter efficiency and improved performance compared to fine-tuning and baselines for prompt tuning. The full fine-tuning updates all pre-trained parameters, which may distort the learned feature space… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024

  32. arXiv:2408.09058  [pdf, other

    cs.RO

    Vision-assisted Avocado Harvesting with Aerial Bimanual Manipulation

    Authors: Zhichao Liu, Jingzong Zhou, Caio Mucchiani, Konstantinos Karydis

    Abstract: Robotic fruit harvesting holds potential in precision agriculture to improve harvesting efficiency. While ground mobile robots are mostly employed in fruit harvesting, certain crops, like avocado trees, cannot be harvested efficiently from the ground alone. This is because of unstructured ground and planting arrangement and high-to-reach fruits. In such cases, aerial robots integrated with manipul… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: First Two Authors Share Equal Contribution. 13 Pages, 15 Figures

  33. arXiv:2408.08601  [pdf, other

    cs.CV

    Learning A Low-Level Vision Generalist via Visual Task Prompt

    Authors: Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, Chao Dong

    Abstract: Building a unified model for general low-level vision tasks holds significant research and practical value. Current methods encounter several critical issues. Multi-task restoration approaches can address multiple degradation-to-clean restoration tasks, while their applicability to tasks with different target domains (e.g., image stylization) is limited. Methods like PromptGIP can handle multiple… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted to ACMMM24

  34. arXiv:2408.08570  [pdf, other

    cs.CV

    EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation

    Authors: Jun Zhou, Chunsheng Liu, Faliang Chang, Wenqian Wang, Penghui Hao, Yiming Huang, Zhiqiang Yang

    Abstract: Associating driver attention with driving scene across two fields of views (FOVs) is a hard cross-domain perception problem, which requires comprehensive consideration of cross-view mapping, dynamic driving scene analysis, and driver status tracking. Previous methods typically focus on a single view or map attention to the scene via estimated gaze, failing to exploit the implicit connection betwee… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 13pages, 9 figures,

  35. arXiv:2408.08018  [pdf, other

    cs.HC

    Investigating Size Congruency Between the Visual Perception of a VR Object and the Haptic Perception of Its Physical World Agent

    Authors: Wenqi Zheng, Dawei Xiong, Cekai Weng, Jiajun Jiang, Junwei Li, Jinni Zhou, Mingming Fan

    Abstract: The perception of physical objects and miniatures enhances the realism and immersion in VR. This work explores the relationship between haptic feedback from real objects and their visual representations in VR. The study examines how users confirm and adjust the sizes of different virtual objects. The results show that as the size of the virtual cubes increases, users are less likely to perceive th… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 8 pages, 6 figures, VINCI 2024

  36. arXiv:2408.08003  [pdf, other

    cs.CL

    Leveraging Web-Crawled Data for High-Quality Fine-Tuning

    Authors: Jing Zhou, Chenglin Jiang, Wei Shen, Xiao Zhou, Xiaonan He

    Abstract: Most large language models are fine-tuned using either expensive human-annotated data or GPT-4 generated data which cannot guarantee performance in certain domains. We argue that although the web-crawled data often has formatting errors causing semantic inaccuracies, it can still serve as a valuable source for high-quality supervised fine-tuning in specific domains without relying on advanced mode… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  37. arXiv:2408.07733  [pdf, other

    cs.LG cs.CR

    Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack

    Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Chenyu Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

    Abstract: In recent times, the swift evolution of adversarial attacks has captured widespread attention, particularly concerning their transferability and other performance attributes. These techniques are primarily executed at the sample level, frequently overlooking the intrinsic parameters of models. Such neglect suggests that the perturbations introduced in adversarial samples might have the potential f… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  38. arXiv:2408.07556  [pdf, other

    cs.LG

    PolyCL: Contrastive Learning for Polymer Representation Learning via Explicit and Implicit Augmentations

    Authors: Jiajun Zhou, Yijie Yang, Austin M. Mroz, Kim E. Jelfs

    Abstract: Polymers play a crucial role in a wide array of applications due to their diverse and tunable properties. Establishing the relationship between polymer representations and their properties is crucial to the computational design and screening of potential polymers via machine learning. The quality of the representation significantly influences the effectiveness of these computational methods. Here,… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  39. arXiv:2408.07444  [pdf, other

    eess.IV cs.CV

    Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

    Authors: Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, Haiyue Jiang

    Abstract: Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range co… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  40. arXiv:2408.07278  [pdf, other

    cs.IR cs.AI cs.CV

    Scene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction

    Authors: Wenhao Li, Jie Zhou, Chuan Luo, Chao Tang, Kun Zhang, Shixiong Zhao

    Abstract: In the realm of modern mobile E-commerce, providing users with nearby commercial service recommendations through location-based online services has become increasingly vital. While machine learning approaches have shown promise in multi-scene recommendation, existing methodologies often struggle to address cold-start problems in unprecedented scenes: the increasing diversity of commercial choices,… ▽ More

    Submitted 18 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 10 pages, 6 figures, accepted by Recsys 2024

    MSC Class: 68T09 ACM Class: I.2.0

  41. arXiv:2408.07083  [pdf, other

    cs.LG cs.AI

    Masked EEG Modeling for Driving Intention Prediction

    Authors: Jinzhao Zhou, Justin Sia, Yiqun Duan, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: Driving under drowsy conditions significantly escalates the risk of vehicular accidents. Although recent efforts have focused on using electroencephalography to detect drowsiness, helping prevent accidents caused by driving in such states, seamless human-machine interaction in driving scenarios requires a more versatile EEG-based system. This system should be capable of understanding a driver's in… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  42. arXiv:2408.06927  [pdf, other

    cs.CV cs.LG

    Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

    Authors: Xin Zhang, Jiawei Du, Ping Liu, Joey Tianyi Zhou

    Abstract: Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an i… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  43. arXiv:2408.05740  [pdf, other

    cs.LG cs.AI stat.ML

    MTSCI: A Conditional Diffusion Model for Multivariate Time Series Consistent Imputation

    Authors: Jianping Zhou, Junhao Li, Guanjie Zheng, Xinbing Wang, Chenghu Zhou

    Abstract: Missing values are prevalent in multivariate time series, compromising the integrity of analyses and degrading the performance of downstream tasks. Consequently, research has focused on multivariate time series imputation, aiming to accurately impute the missing values based on available observations. A key research question is how to ensure imputation consistency, i.e., intra-consistency between… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, accepted by CIKM2024

  44. arXiv:2408.04967  [pdf, other

    eess.AS cs.SD

    ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

    Authors: Jiangyan Yi, Chu Yuan Zhang, Jianhua Tao, Chenglong Wang, Xinrui Yan, Yong Ren, Hao Gu, Junzuo Zhou

    Abstract: The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manip… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  45. arXiv:2408.04879  [pdf, other

    cs.CV

    On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey

    Authors: Jingcai Guo, Zhijie Rao, Zhi Chen, Song Guo, Jingren Zhou, Dacheng Tao

    Abstract: Zero-shot image recognition (ZSIR) aims at empowering models to recognize and reason in unseen domains via learning generalized knowledge from limited data in the seen domain. The gist for ZSIR is to execute element-wise representation and reasoning from the input visual space to the target semantic space, which is a bottom-up modeling paradigm inspired by the process by which humans observe the w… ▽ More

    Submitted 22 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: 23 pages, 7 figures, and 3 tables

  46. arXiv:2408.04840  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

    Authors: Jiabo Ye, Haiyang Xu, Haowei Liu, Anwen Hu, Ming Yan, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

    Abstract: Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in executing instructions for a variety of single-image tasks. Despite this progress, significant challenges remain in modeling long image sequences. In this work, we introduce the versatile multi-modal large language model, mPLUG-Owl3, which enhances the capability for long image-sequence understanding in scenario… ▽ More

    Submitted 13 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  47. arXiv:2408.04679  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings

    Authors: Jinzhao Zhou, Yiqun Duan, Ziyi Zhao, Yu-Cheng Chang, Yu-Kai Wang, Thomas Do, Chin-Teng Lin

    Abstract: Decoding linguistic information from non-invasive brain signals using EEG has gained increasing research attention due to its vast applicational potential. Recently, a number of works have adopted a generative-based framework to decode electroencephalogram (EEG) signals into sentences by utilizing the power generative capacity of pretrained large language models (LLMs). However, this approach has… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  48. arXiv:2408.03631  [pdf, ps, other

    cs.AI cs.CL

    Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent

    Authors: Yanhu Wang, Muhammad Muzammil Afzal, Zhengyang Li, Jie Zhou, Chenyuan Feng, Shuaishuai Guo, Tony Q. S. Quek

    Abstract: Traditional base station siting (BSS) methods rely heavily on drive testing and user feedback, which are laborious and require extensive expertise in communication, networking, and optimization. As large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering, network optimization will witness a revolutionary approach… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  49. arXiv:2408.03429  [pdf, other

    quant-ph cs.ET

    MarQSim: Reconciling Determinism and Randomness in Compiler Optimization for Quantum Simulation

    Authors: Xiuqi Cao, Junyu Zhou, Yuhao Liu, Yunong Shi, Gushu Li

    Abstract: Quantum simulation, fundamental in quantum algorithm design, extends far beyond its foundational roots, powering diverse quantum computing applications. However, optimizing the compilation of quantum Hamiltonian simulation poses significant challenges. Existing approaches fall short in reconciling deterministic and randomized compilation, lack appropriate intermediate representations, and struggle… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  50. arXiv:2408.03095  [pdf, other

    cs.SE

    TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration

    Authors: Siqi Gu, Chunrong Fang, Quanjun Zhang, Fangyuan Tian, Jianyi Zhou, Zhenyu Chen

    Abstract: Unit test is crucial for detecting bugs in individual program units but consumes time and effort. The existing automated unit test generation methods are mainly based on search-based software testing (SBST) and language models to liberate developers. Recently, large language models (LLMs) have demonstrated remarkable reasoning and generation capabilities. However, several problems limit their abil… ▽ More

    Submitted 12 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.