Skip to main content

Showing 1–50 of 606 results for author: Wu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13473  [pdf, other

    cs.CL

    Why Antiwork: A RoBERTa-Based System for Work-Related Stress Identification and Leading Factor Analysis

    Authors: Tao Lu, Muzhe Wu, Xinyi Lu, Siyuan Xu, Shuyu Zhan, Anuj Tambwekar, Emily Mower Provost

    Abstract: Harsh working environments and work-related stress have been known to contribute to mental health problems such as anxiety, depression, and suicidal ideation. As such, it is paramount to create solutions that can both detect employee unhappiness and find the root cause of the problem. While prior works have examined causes of mental health using machine learning, they typically focus on general me… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

  2. arXiv:2408.12606  [pdf, other

    cs.CV cs.AI

    Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model

    Authors: Luyang Luo, Mingxiang Wu, Mei Li, Yi Xin, Qiong Wang, Varut Vardhanabhuti, Winnie CW Chu, Zhenhui Li, Juan Zhou, Pranav Rajpurkar, Hao Chen

    Abstract: Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 27 pages, 8 figures, 10 tables

  3. arXiv:2408.10599  [pdf, other

    hep-ex cs.CV

    Vision Calorimeter for Anti-neutron Reconstruction: A Baseline

    Authors: Hongtian Yu, Yangu Li, Mingrui Wu, Letian Shen, Yue Liu, Yunxuan Song, Qixiang Ye, Xiaorui Lyu, Yajun Mao, Yangheng Zheng, Yunfan Liu

    Abstract: In high-energy physics, anti-neutrons ($\bar{n}$) are fundamental particles that frequently appear as final-state particles, and the reconstruction of their kinematic properties provides an important probe for understanding the governing principles. However, this confronts significant challenges instrumentally with the electromagnetic calorimeter (EMC), a typical experimental sensor but recovering… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.10041  [pdf, other

    cs.CV

    Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation

    Authors: Minye Wu, Tinne Tuytelaars

    Abstract: Recent advancements in photo-realistic novel view synthesis have been significantly driven by Gaussian Splatting (3DGS). Nevertheless, the explicit nature of 3DGS data entails considerable storage requirements, highlighting a pressing need for more efficient data representations. To address this, we present Implicit Gaussian Splatting (IGS), an innovative hybrid model that integrates explicit poin… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2408.09739  [pdf, other

    cs.CV

    TraDiffusion: Trajectory-Based Training-Free Image Generation

    Authors: Mingrui Wu, Oucheng Huang, Jiayi Ji, Jiale Li, Xinyue Cai, Huafeng Kuang, Jianzhuang Liu, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this work, we propose a training-free, trajectory-based controllable T2I approach, termed TraDiffusion. This novel method allows users to effortlessly guide image generation via mouse trajectories. To achieve precise control, we design a distance awareness energy function to effectively guide latent variables, ensuring that the focus of generation is within the areas defined by the trajectory.… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: The code: https://github.com/och-mac/TraDiffusion

  6. arXiv:2408.09315  [pdf, other

    eess.IV cs.CV

    Unpaired Volumetric Harmonization of Brain MRI with Conditional Latent Diffusion

    Authors: Mengqi Wu, Minhui Yu, Shuaiming Jing, Pew-Thian Yap, Zhengwu Zhang, Mingxia Liu

    Abstract: Multi-site structural MRI is increasingly used in neuroimaging studies to diversify subject cohorts. However, combining MR images acquired from various sites/centers may introduce site-related non-biological variations. Retrospective image harmonization helps address this issue, but current methods usually perform harmonization on pre-extracted hand-crafted radiomic features, limiting downstream a… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  7. arXiv:2408.08490  [pdf, other

    cs.AR

    Accelerating Mini-batch HGNN Training by Reducing CUDA Kernels

    Authors: Meng Wu, Jingkai Qiu, Mingyu Yan, Wenming Li, Yang Zhang, Zhimin Zhang, Xiaochun Ye, Dongrui Fan

    Abstract: Heterogeneous graph neural networks (HGNNs) are essential for capturing the structure and semantic information in heterogeneous graphs. However, existing GPU-based solutions, such as PyTorch Geometric, suffer from low GPU utilization due to numerous short-execution-time and memory-bound CUDA kernels during HGNN training. To address this issue, we introduce HiFuse, an enhancement for PyTorch Geom… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  8. arXiv:2408.07719  [pdf, other

    cs.LG cs.AI

    Operator Feature Neural Network for Symbolic Regression

    Authors: Yusong Deng, Min Wu, Lina Yu, Jingyi Liu, Shu Wei, Yanjie Li, Weijun Li

    Abstract: Symbolic regression is a task aimed at identifying patterns in data and representing them through mathematical expressions, generally involving skeleton prediction and constant optimization. Many methods have achieved some success, however they treat variables and symbols merely as characters of natural language without considering their mathematical essence. This paper introduces the operator fea… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 12 pages

  9. arXiv:2408.07100  [pdf, other

    cs.LG cs.AI

    Pattern-Matching Dynamic Memory Network for Dual-Mode Traffic Prediction

    Authors: Wenchao Weng, Mei Wu, Hanyu Jiang, Wanzeng Kong, Xiangjie Kong, Feng Xia

    Abstract: In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  10. arXiv:2408.06631  [pdf, other

    cs.CL

    IFShip: A Large Vision-Language Model for Interpretable Fine-grained Ship Classification via Domain Knowledge-Enhanced Instruction Tuning

    Authors: Mingning Guo, Mengwei Wu, Yuxiang Shen, Haifeng Li, Chao Tao

    Abstract: End-to-end interpretation is currently the prevailing paradigm for remote sensing fine-grained ship classification (RS-FGSC) task. However, its inference process is uninterpretable, leading to criticism as a black box model. To address this issue, we propose a large vision-language model (LVLM) named IFShip for interpretable fine-grained ship classification. Unlike traditional methods, IFShip exce… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  11. arXiv:2408.02221  [pdf, other

    cs.CR

    SoK: Fighting Counterfeits with Cyber-Physical Synergy Based on Physically-Unclonable Identifiers of Paper Surface

    Authors: Anirudh Nakra, Min Wu, Chau-Wai Wong

    Abstract: Counterfeit products cause severe harm to public safety and health by penetrating untrusted supply chains. Numerous anti-counterfeiting techniques have been proposed, among which the use of inherent, unclonable irregularities of paper surfaces has shown considerable potential as a high-performance economical solution. Prior works do not consider supply chains cohesively, either focusing on creatin… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  12. arXiv:2408.01902  [pdf, other

    cs.AR

    A Comprehensive Survey on GNN Characterization

    Authors: Meng Wu, Mingyu Yan, Wenming Li, Xiaochun Ye, Dongrui Fan, Yuan Xie

    Abstract: Characterizing graph neural networks (GNNs) is essential for identifying performance bottlenecks and facilitating their deployment. Despite substantial work in this area, a comprehensive survey on GNN characterization is lacking. This work presents a comprehensive survey, proposing a triple-level classification method to categorize, summarize, and compare existing efforts. In addition, we identify… ▽ More

    Submitted 15 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

  13. arXiv:2408.01413  [pdf, other

    cs.GT

    A Game Theoretic Analysis of High Occupancy Toll Lane Design

    Authors: Zhanhao Zhang, Ruifan Yang, Manxi Wu

    Abstract: In this article, we study the optimal design of High Occupancy Toll (HOT) lanes. The traffic authority determines the road capacity allocation between HOT lanes and ordinary lanes, as well as the toll price charged for travelers using HOT lanes who do not meet the high-occupancy eligibility criteria. We develop a game-theoretic model to analyze the decisions of travelers with heterogeneous prefere… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  14. arXiv:2407.21534  [pdf, other

    cs.CV

    ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models

    Authors: Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Xiaoshuai Sun, Rongrong Ji

    Abstract: In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. We observe the relationship between text prompt tokens and visual tokens in MLLMs, where attention layers model the connection between them. Our approach involves adjusting visual tokens from the MLP output during inference, controlli… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Code:https://github.com/mrwu-mac/ControlMLLM

  15. arXiv:2407.21311  [pdf, other

    cs.CV cs.AI cs.LG

    EUDA: An Efficient Unsupervised Domain Adaptation via Self-Supervised Vision Transformer

    Authors: Ali Abedi, Q. M. Jonathan Wu, Ning Zhang, Farhad Pourpanah

    Abstract: Unsupervised domain adaptation (UDA) aims to mitigate the domain shift issue, where the distribution of training (source) data differs from that of testing (target) data. Many models have been developed to tackle this problem, and recently vision transformers (ViTs) have shown promising results. However, the complexity and large number of trainable parameters of ViTs restrict their deployment in p… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 12 pages, 4 figures

  16. arXiv:2407.20462  [pdf, other

    cs.IR cs.LG

    Graphite: A Graph-based Extreme Multi-Label Short Text Classifier for Keyphrase Recommendation

    Authors: Ashirbad Mishra, Soumik Dey, Jinyu Zhao, Marshall Wu, Binbin Li, Kamesh Madduri

    Abstract: Keyphrase Recommendation has been a pivotal problem in advertising and e-commerce where advertisers/sellers are recommended keyphrases (search queries) to bid on to increase their sales. It is a challenging task due to the plethora of items shown on online platforms and various possible queries that users search while showing varying interest in the displayed items. Moreover, query/keyphrase recom… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  17. arXiv:2407.19852  [pdf

    quant-ph cs.LG q-bio.BM

    Quantum Long Short-Term Memory for Drug Discovery

    Authors: Liang Zhang, Yin Xu, Mohan Wu, Liang Wang, Hua Xu

    Abstract: Quantum computing combined with machine learning (ML) is an extremely promising research area, with numerous studies demonstrating that quantum machine learning (QML) is expected to solve scientific problems more effectively than classical ML. In this work, we successfully apply QML to drug discovery, showing that QML can significantly improve model performance and achieve faster convergence compa… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  18. arXiv:2407.19373  [pdf, other

    stat.ML cs.LG

    Uncertainty Quantification of Data Shapley via Statistical Inference

    Authors: Mengmeng Wu, Zhihong Liu, Xiang Li, Ruoxi Jia, Xiangyu Chang

    Abstract: As data plays an increasingly pivotal role in decision-making, the emergence of data markets underscores the growing importance of data valuation. Within the machine learning landscape, Data Shapley stands out as a widely embraced method for data valuation. However, a limitation of Data Shapley is its assumption of a fixed dataset, contrasting with the dynamic nature of real-world applications whe… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  19. arXiv:2407.18391  [pdf, other

    cs.CV

    UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models

    Authors: Xinyu Pi, Mingyuan Wu, Jize Jiang, Haozhen Zheng, Beitong Tian, Chengxiang Zhai, Klara Nahrstedt, Zhiting Hu

    Abstract: Smaller-scale Vision-Langauge Models (VLMs) often claim to perform on par with larger models in general-domain visual grounding and question-answering benchmarks while offering advantages in computational efficiency and storage. However, their ability to handle rare objects, which fall into the long tail of data distributions, is less understood. To rigorously evaluate this aspect, we introduce th… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 10 pages

  20. arXiv:2407.17117  [pdf, other

    cs.LG cs.AI cs.CE

    EverAdapt: Continuous Adaptation for Dynamic Machine Fault Diagnosis Environments

    Authors: Edward, Mohamed Ragab, Yuecong Xu, Min Wu, Yuecong Xu, Zhenghua Chen, Abdulla Alseiari, Xiaoli Li

    Abstract: Unsupervised Domain Adaptation (UDA) has emerged as a key solution in data-driven fault diagnosis, addressing domain shift where models underperform in changing environments. However, under the realm of continually changing environments, UDA tends to underperform on previously seen domains when adapting to new ones - a problem known as catastrophic forgetting. To address this limitation, we introd… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  21. arXiv:2407.16541  [pdf, other

    cs.CV cs.MM

    QPT V2: Masked Image Modeling Advances Visual Scoring

    Authors: Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

    Abstract: Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (e.g., classification, detection et… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures

  22. arXiv:2407.14355  [pdf, other

    cs.SD eess.AS

    Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

    Authors: Xuenan Xu, Pingyue Zhang, Ming Yan, Ji Zhang, Mengyue Wu

    Abstract: Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each c… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  23. arXiv:2407.14329  [pdf, other

    cs.SD eess.AS

    Efficient Audio Captioning with Encoder-Level Knowledge Distillation

    Authors: Xuenan Xu, Haohe Liu, Mengyue Wu, Wenwu Wang, Mark D. Plumbley

    Abstract: Significant improvement has been achieved in automated audio captioning (AAC) with recent models. However, these models have become increasingly large as their performance is enhanced. In this work, we propose a knowledge distillation (KD) framework for AAC. Our analysis shows that in the encoder-decoder based AAC models, it is more effective to distill knowledge into the encoder as compared with… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  24. arXiv:2407.13519  [pdf, other

    cs.CV

    GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding

    Authors: Changshuo Wang, Meiqing Wu, Siew-Kei Lam, Xin Ning, Shangshu Yu, Ruiping Wang, Weijun Li, Thambipillai Srikanthan

    Abstract: Despite the significant advancements in pre-training methods for point cloud understanding, directly capturing intricate shape information from irregular point clouds without reliance on external data remains a formidable challenge. To address this problem, we propose GPSFormer, an innovative Global Perception and Local Structure Fitting-based Transformer, which learns detailed shape information f… ▽ More

    Submitted 24 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  25. arXiv:2407.13198  [pdf, other

    cs.SD eess.AS

    DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation

    Authors: Baihan Li, Zeyu Xie, Xuenan Xu, Yiwei Guo, Ming Yan, Ji Zhang, Kai Yu, Mengyue Wu

    Abstract: Audio generation has attracted significant attention. Despite remarkable enhancement in audio quality, existing models overlook diversity evaluation. This is partially due to the lack of a systematic sound class diversity framework and a matching dataset. To address these issues, we propose DiveSound, a novel framework for constructing multimodal datasets with in-class diversified taxonomy, assist… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  26. arXiv:2407.08176  [pdf, other

    cs.SE cs.AI cs.LG

    Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software

    Authors: Dezhi Ran, Mengzhou Wu, Wei Yang, Tao Xie

    Abstract: By treating data and models as the source code, Foundation Models (FMs) become a new type of software. Mirroring the concept of software crisis, the increasing complexity of FMs making FM crisis a tangible concern in the coming decade, appealing for new theories and methodologies from the field of software engineering. In this paper, we outline our vision of introducing Foundation Model (FM) engin… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by 2030 Software Engineering Workshop, co-located with FSE24; Invited to ACM TOSEM 2030 Roadmap for Software Engineering

  27. arXiv:2407.08150  [pdf, other

    cs.CV

    Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding

    Authors: Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang

    Abstract: Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders. There is currently a lack of research in this area, and most existing benchmarks suffer from several drawbacks: 1) a limited number of modalities and answers with restrictive length; 2) the content and scenarios within… ▽ More

    Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MULTIMEDIA 2024

  28. arXiv:2407.07200  [pdf, ps, other

    cs.RO

    Measuring Trust for Exoskeleton Systems

    Authors: Leia Stirling, Man I Wu, Xiangyu Peng

    Abstract: Wearable robotic systems are a class of robots that have a tight coupling between human and robot movements. Similar to non-wearable robots, it is important to measure the trust a person has that the robot can support achieving the desired goals. While some measures of trust may apply to all potential robotic roles, there are key distinctions between wearable and non-wearable robotic systems. In t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Taking a Closer Look: Refining Trust and Its Impact in HRI Workshop, HRI '24, March 11, 2024

  29. arXiv:2407.06153  [pdf, other

    cs.SE cs.CL

    What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

    Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

    Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

  30. arXiv:2407.04245  [pdf, other

    cs.CV

    Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization

    Authors: Ming-Yang Ho, Che-Ming Wu, Min-Sheng Wu, Yufeng Jane Tseng

    Abstract: Recent advancements in ultra-high-resolution unpaired image-to-image translation have aimed to mitigate the constraints imposed by limited GPU memory through patch-wise inference. Nonetheless, existing methods often compromise between the reduction of noticeable tiling artifacts and the preservation of color and hue contrast, attributed to the reliance on global image- or patch-level statistics in… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  31. arXiv:2407.02869  [pdf, other

    cs.SD eess.AS

    PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

    Authors: Zeyu Xie, Xuenan Xu, Zhizheng Wu, Mengyue Wu

    Abstract: Recently, audio generation tasks have attracted considerable research interests. Precise temporal controllability is essential to integrate audio generation with real applications. In this work, we propose a temporal controlled audio generation framework, PicoAudio. PicoAudio integrates temporal information to guide audio generation through tailored model design. It leverages data crawling, segmen… ▽ More

    Submitted 17 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 68Txx ACM Class: I.2

  32. arXiv:2407.02857  [pdf, other

    cs.SD eess.AS

    AudioTime: A Temporally-aligned Audio-text Benchmark Dataset

    Authors: Zeyu Xie, Xuenan Xu, Zhizheng Wu, Mengyue Wu

    Abstract: Recent advancements in audio generation have enabled the creation of high-fidelity audio clips from free-form textual descriptions. However, temporal relationships, a critical feature for audio content, are currently underrepresented in mainstream models, resulting in an imprecise temporal controllability. Specifically, users cannot accurately control the timestamps of sound events using free-form… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 68Txx ACM Class: I.2

  33. arXiv:2407.02098  [pdf, other

    cs.CV

    DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection

    Authors: Kaixin Xu, Qingtian Feng, Hao Chen, Zhe Wang, Xue Geng, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Applying deep neural networks to 3D point cloud processing has attracted increasing attention due to its advanced performance in many areas, such as AR/VR, autonomous driving, and robotics. However, as neural network models and 3D point clouds expand in size, it becomes a crucial challenge to reduce the computational and memory overhead to meet latency and energy constraints in real-world applicat… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  34. arXiv:2407.02068  [pdf, other

    cs.CV

    LPViT: Low-Power Semi-structured Pruning for Vision Transformers

    Authors: Kaixin Xu, Zhe Wang, Chunyun Chen, Xue Geng, Jie Lin, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize this high-performance technology and make it more… ▽ More

    Submitted 12 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  35. arXiv:2407.02052  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

    Authors: Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang

    Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ICASSP 2024

  36. arXiv:2407.01284  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

    Authors: Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang

    Abstract: Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduc… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress

  37. arXiv:2406.19941  [pdf, other

    cs.CV

    GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection

    Authors: Chih-Chung Hsu, Shao-Ning Chen, Mei-Hsuan Wu, Yi-Fang Wang, Chia-Ming Lee, Yi-Shiuan Chou

    Abstract: As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques.… ▽ More

    Submitted 30 June, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to TPAMI 2024

  38. arXiv:2406.17145  [pdf, other

    cs.DC cs.AI cs.LG

    GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

    Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

    Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  39. arXiv:2406.16449  [pdf, other

    cs.CV

    Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

    Authors: Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji

    Abstract: The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can be easily alleviated by introducing object detectors. However, these efforts neglect hallucinations in inter-object relationships, which is essential for visual comprehension. In this work, we introduce R-Benc… ▽ More

    Submitted 18 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: ICML2024; Project Page:https://github.com/mrwu-mac/R-Bench

  40. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  41. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  42. arXiv:2406.14874  [pdf, other

    cs.CV

    TraceNet: Segment one thing efficiently

    Authors: Mingyuan Wu, Zichuan Liu, Haozhen Zheng, Hongpeng Guo, Bo Chen, Xin Lu, Klara Nahrstedt

    Abstract: Efficient single instance segmentation is essential for unlocking features in the mobile imaging applications, such as capture or editing. Existing on-the-fly mobile imaging applications scope the segmentation task to portraits or the salient subject due to the computational constraints. Instance segmentation, despite its recent developments towards efficient networks, is still heavy due to the co… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  43. arXiv:2406.14844  [pdf, other

    cs.LG cs.AI

    DN-CL: Deep Symbolic Regression against Noise via Contrastive Learning

    Authors: Jingyi Liu, Yanjie Li, Lina Yu, Min Wu, Weijun Li, Wenqiang Li, Meilan Hao, Yusong Deng, Shu Wei

    Abstract: Noise ubiquitously exists in signals due to numerous factors including physical, electronic, and environmental effects. Traditional methods of symbolic regression, such as genetic programming or deep learning models, aim to find the most fitting expressions for these signals. However, these methods often overlook the noise present in real-world data, leading to reduced fitting accuracy. To tackle… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  44. arXiv:2406.14457  [pdf, other

    cs.AI

    Rewarding What Matters: Step-by-Step Reinforcement Learning for Task-Oriented Dialogue

    Authors: Huifang Du, Shuqin Li, Minghao Wu, Xuejing Feng, Yuan-Fang Li, Haofen Wang

    Abstract: Reinforcement learning (RL) is a powerful approach to enhance task-oriented dialogue (TOD) systems. However, existing RL methods tend to mainly focus on generation tasks, such as dialogue policy learning (DPL) or response generation (RG), while neglecting dialogue state tracking (DST) for understanding. This narrow focus limits the systems to achieve globally optimal performance by overlooking the… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  45. arXiv:2406.14092  [pdf, other

    cs.CL eess.AS

    Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

    Authors: Jing Xu, Minglin Wu, Xixin Wu, Helen Meng

    Abstract: Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose ad… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  46. arXiv:2406.13951  [pdf, other

    cs.CV

    Towards the in-situ Trunk Identification and Length Measurement of Sea Cucumbers via Bézier Curve Modelling

    Authors: Shuaixin Liu, Kunqian Li, Yilin Ding, Kuangwei Xu, Qianli Jiang, Q. M. Jonathan Wu, Dalei Song

    Abstract: We introduce a novel vision-based framework for in-situ trunk identification and length measurement of sea cucumbers, which plays a crucial role in the monitoring of marine ranching resources and mechanized harvesting. To model sea cucumber trunk curves with varying degrees of bending, we utilize the parametric Bézier curve due to its computational simplicity, stability, and extensive range of tra… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  47. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global shipping, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  48. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  49. arXiv:2406.11410  [pdf, other

    cs.CL cs.AI

    HARE: HumAn pRiors, a key to small language model Efficiency

    Authors: Lingyun Zhang, Bin jin, Gaojian Ge, Lunhui Liu, Xuewen Shen, Mingyong Wu, Houqian Zhang, Yongneng Jiang, Shiqi Chen, Shi Pu

    Abstract: Human priors play a crucial role in efficiently utilizing data in deep learning. However, with the development of large language models (LLMs), there is an increasing emphasis on scaling both model size and data volume, which often diminishes the importance of human priors in data construction. Influenced by these trends, existing Small Language Models (SLMs) mainly rely on web-scraped large-scale… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  50. arXiv:2406.10976  [pdf, other

    cs.LG cs.CL cs.CR

    Promoting Data and Model Privacy in Federated Learning through Quantized LoRA

    Authors: JianHao Zhu, Changze Lv, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

    Abstract: Conventional federated learning primarily aims to secure the privacy of data distributed across multiple edge devices, with the global model dispatched to edge devices for parameter updates during the learning process. However, the development of large language models (LLMs) requires substantial data and computational resources, rendering them valuable intellectual properties for their developers… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.