Skip to main content

Showing 1–50 of 2,372 results for author: Chen, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03403  [pdf, other

    cs.RO

    RoVi-Aug: Robot and Viewpoint Augmentation for Cross-Embodiment Robot Learning

    Authors: Lawrence Yunliang Chen, Chenfeng Xu, Karthik Dharmarajan, Zubair Irshad, Richard Cheng, Kurt Keutzer, Masayoshi Tomizuka, Quan Vuong, Ken Goldberg

    Abstract: Scaling up robot learning requires large and diverse datasets, and how to efficiently reuse collected data and transfer policies to new embodiments remains an open question. Emerging research such as the Open-X Embodiment (OXE) project has shown promise in leveraging skills by combining datasets including different robots. However, imbalances in the distribution of robot types and camera angles in… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: CoRL 2024 (Oral)

  2. arXiv:2409.03198  [pdf, other

    cs.CV

    RoomDiffusion: A Specialized Diffusion Model in the Interior Design Industry

    Authors: Zhaowei Wang, Ying Hao, Hao Wei, Qing Xiao, Lulu Chen, Yulong Li, Yue Yang, Tianyi Li

    Abstract: Recent advancements in text-to-image diffusion models have significantly transformed visual content generation, yet their application in specialized fields such as interior design remains underexplored. In this paper, we present RoomDiffusion, a pioneering diffusion model meticulously tailored for the interior design industry. To begin with, we build from scratch a whole data pipeline to update an… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.03179  [pdf, other

    eess.IV cs.CV

    Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem

    Authors: Qiwen Zhu, Yanjie Wang, Shilv Cai, Liqun Chen, Jiahuan Zhou, Luxin Yan, Sheng Zhong, Xu Zou

    Abstract: Training Single-Image Super-Resolution (SISR) models using pixel-based regression losses can achieve high distortion metrics scores (e.g., PSNR and SSIM), but often results in blurry images due to insufficient recovery of high-frequency details. Conversely, using GAN or perceptual losses can produce sharp images with high perceptual metric scores (e.g., LPIPS), but may introduce artifacts and inco… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Initial Commit, 21 pages

  5. Assembling the Puzzle: Exploring Collaboration and Data Sensemaking in Nursing Practices for Remote Patient Monitoring

    Authors: Mihnea Calota, Janet Yi-Ching Huang, Lin-Lin Chen, Mathias Funk

    Abstract: Remote patient monitoring (RPM) involves the remote collection and transmission of patient health data, serving as a notable application of data-driven healthcare. This technology facilitates clinical monitoring and decision-making, offering benefits like reduced healthcare costs and improved patient outcomes. However, RPM also introduces challenges common to data-driven healthcare, such as additi… ▽ More

    Submitted 5 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  6. arXiv:2409.01548  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka

    Authors: Li-Wei Chen, Hung-Shin Lee, Chen-Chi Chang

    Abstract: This paper introduces VoxHakka, a text-to-speech (TTS) system designed for Taiwanese Hakka, a critically under-resourced language spoken in Taiwan. Leveraging the YourTTS framework, VoxHakka achieves high naturalness and accuracy and low real-time factor in speech synthesis while supporting six distinct Hakka dialects. This is achieved by training the model with dialect-specific data, allowing for… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Submitted to O-COCOSDA 2024

  7. arXiv:2409.01545  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation

    Authors: Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Berlin Chen, Hsin-Min Wang

    Abstract: Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited tar… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted to IEEE SLT 2024

  8. arXiv:2409.01199  [pdf, other

    cs.CV eess.IV

    OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

    Authors: Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinghua Cheng, Li Yuan

    Abstract: Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ign… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: https://github.com/PKU-YuanGroup/Open-Sora-Plan

  9. ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model

    Authors: Dawei Wang, Geng Zhou, Li Chen, Dan Li, Yukai Miao

    Abstract: Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resultin… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: Preprint

  10. arXiv:2408.16756  [pdf, other

    cs.CL

    How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models

    Authors: Jiyue Jiang, Liheng Chen, Pengan Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

    Abstract: The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  11. arXiv:2408.16498  [pdf, other

    cs.SE

    A Survey on Evaluating Large Language Models in Code Generation Tasks

    Authors: Liguo Chen, Qi Guo, Hongrui Jia, Zhengran Zeng, Xin Wang, Yijiang Xu, Jian Wu, Yidong Wang, Qing Gao, Jindong Wang, Wei Ye, Shikun Zhang

    Abstract: This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks. With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation. The paper begins by reviewing the historical development of LLMs and their applicatio… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  12. arXiv:2408.16420  [pdf, other

    cs.RO

    Time-Optimized Trajectory Planning for Non-Prehensile Object Transportation in 3D

    Authors: Lingyun Chen, Haoyu Yu, Abdeldjallil Naceri, Abdalla Swikir, Sami Haddadin

    Abstract: Non-prehensile object transportation offers a way to enhance robotic performance in object manipulation tasks, especially with unstable objects. Effective trajectory planning requires simultaneous consideration of robot motion constraints and object stability. Here, we introduce a physical model for object stability and propose a novel trajectory planning approach for non-prehensile transportation… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted to the European Robotic Forum (ERF) 2024

  13. arXiv:2408.16266  [pdf, other

    cs.CV

    Improving Diffusion-based Data Augmentation with Inversion Spherical Interpolation

    Authors: Yanghao Wang, Long Chen

    Abstract: Data Augmentation (DA), \ie, synthesizing faithful and diverse samples to expand the original training set, is a prevalent and effective strategy to improve various visual recognition tasks. With the powerful image generation ability, diffusion-based DA has shown strong performance gains on different benchmarks. In this paper, we analyze today's diffusion-based DA methods, and argue that they cann… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  14. arXiv:2408.15980  [pdf, other

    cs.RO cs.AI

    In-Context Imitation Learning via Next-Token Prediction

    Authors: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg

    Abstract: We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor traj… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  15. arXiv:2408.15881  [pdf, other

    cs.CV

    LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

    Authors: Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang

    Abstract: We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, s… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  16. arXiv:2408.15657  [pdf, other

    cs.CV cs.RO

    TeFF: Tracking-enhanced Forgetting-free Few-shot 3D LiDAR Semantic Segmentation

    Authors: Junbao Zhou, Jilin Mei, Pengze Wu, Liang Chen, Fangzhou Zhao, Xijun Zhao, Yu Hu

    Abstract: In autonomous driving, 3D LiDAR plays a crucial role in understanding the vehicle's surroundings. However, the newly emerged, unannotated objects presents few-shot learning problem for semantic segmentation. This paper addresses the limitations of current few-shot semantic segmentation by exploiting the temporal continuity of LiDAR data. Employing a tracking model to generate pseudo-ground-truths… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  17. arXiv:2408.14438  [pdf, other

    cs.CL cs.CY

    Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study

    Authors: Liuchang Xu, Shuo Zhao, Qingming Lin, Luyao Chen, Qianqian Luo, Sensen Wu, Xinyue Ye, Hailin Feng, Zhenhong Du

    Abstract: The advent of large language models such as ChatGPT, Gemini, and others has underscored the importance of evaluating their diverse capabilities, ranging from natural language understanding to code generation. However, their performance on spatial tasks has not been comprehensively assessed. This study addresses this gap by introducing a novel multi-task spatial evaluation dataset, designed to syst… ▽ More

    Submitted 2 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  18. arXiv:2408.14211  [pdf, other

    cs.CV cs.AI

    MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

    Authors: Xu He, Xiaoyu Li, Di Kang, Jiangnan Ye, Chaopeng Zhang, Liyang Chen, Xiangjun Gao, Han Zhang, Zhiyong Wu, Haolin Zhuang

    Abstract: Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D d… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Project Page: https://thuhcsi.github.io/MagicMan

  19. arXiv:2408.14173  [pdf, other

    cs.CV

    BackFlip: The Impact of Local and Global Data Augmentations on Artistic Image Aesthetic Assessment

    Authors: Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten, Derya Soydaner, Johan Wagemans

    Abstract: Assessing the aesthetic quality of artistic images presents unique challenges due to the subjective nature of aesthetics and the complex visual characteristics inherent to artworks. Basic data augmentation techniques commonly applied to natural images in computer vision may not be suitable for art images in aesthetic evaluation tasks, as they can change the composition of the art images. In this p… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Published at the VISART VII workshop at ECCV 2024. Ombretta Strafforello, Gonzalo Muradas Odriozola, Fatemeh Behrad, Li-Wei Chen, Anne-Sofie Maerten and Derya Soydaner contributed equally to this work

  20. arXiv:2408.13044  [pdf, other

    cs.RO

    Identification and validation of the dynamic model of a tendon-driven anthropomorphic finger

    Authors: Junnan Li, Lingyun Chen, Johannes Ringwald, Edmundo Pozo Fortunic, Amartya Ganguly, Sami Haddadin

    Abstract: This study addresses the absence of an identification framework to quantify a comprehensive dynamic model of human and anthropomorphic tendon-driven fingers, which is necessary to investigate the physiological properties of human fingers and improve the control of robotic hands. First, a generalized dynamic model was formulated, which takes into account the inherent properties of such a mechanical… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 8 pages, 9 figures

  21. arXiv:2408.12981  [pdf, other

    cs.AI

    QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

    Authors: Chenghua Gao, Min Li, Jianshuo Liu, Junxing Ren, Lin Chen, Haoyu Liu, Bo Meng, Jitao Fu, Wenwen Su

    Abstract: Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language s… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures, 4 tables

  22. arXiv:2408.12879  [pdf, other

    cs.CV cs.AI

    Frequency-aware Feature Fusion for Dense Image Prediction

    Authors: Linwei Chen, Ying Fu, Lin Gu, Chenggang Yan, Tatsuya Harada, Gao Huang

    Abstract: Dense image prediction tasks demand features with strong category information and precise spatial boundary details at high resolution. To achieve this, modern hierarchical models often utilize feature fusion, directly adding upsampled coarse features from deep layers and high-resolution features from lower levels. In this paper, we observe rapid variations in fused feature values within objects, r… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by TPAMI (2024)

  23. arXiv:2408.12857  [pdf, other

    cs.LG cs.AI cs.CL

    Memory-Efficient LLM Training with Online Subspace Descent

    Authors: Kaizhao Liang, Bo Liu, Lizhang Chen, Qiang Liu

    Abstract: Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. In this work, we… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Code is available at https://github.com/kyleliang919/Online-Subspace-Descent

  24. arXiv:2408.12527  [pdf, other

    cs.RO cs.CV

    UMAD: University of Macau Anomaly Detection Benchmark Dataset

    Authors: Dong Li, Lineng Chen, Cheng-Zhong Xu, Hui Kong

    Abstract: Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024, project code at https://github.com/IMRL/UMAD

  25. arXiv:2408.12526  [pdf, other

    cs.LG

    Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services

    Authors: Weiyan Wang, Yilun Jin, Yiming Zhang, Victor Junqiu Wei, Han Tian, Li Chen, Kai Chen

    Abstract: Due to high accuracy, BERT-like models have been widely adopted by discriminative text mining and web searching. However, large BERT-like models suffer from inefficient online inference, as they face the following two problems on GPUs. First, they rely on the large model depth to achieve high accuracy, which linearly increases the sequential computation on GPUs. Second, stochastic and dynamic onli… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  26. arXiv:2408.11824   

    cs.HC cs.AI

    AppAgent v2: Advanced Agent for Flexible Mobile Interactions

    Authors: Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

    Abstract: With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible actio… ▽ More

    Submitted 23 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: Pre-print version, some content needs to be supplemented

  27. arXiv:2408.11048  [pdf, other

    cs.RO cs.AI cs.LG

    RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

    Authors: Yi Zhao, Le Chen, Jan Schneider, Quankai Gao, Juho Kannala, Bernhard Schölkopf, Joni Pajarinen, Dieter Büchler

    Abstract: It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these meth… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project Website: https://rp1m.github.io/

  28. arXiv:2408.10198  [pdf, other

    cs.CV cs.GR

    MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

    Authors: Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su

    Abstract: Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. S… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 20 pages, 9 figures

  29. arXiv:2408.10195  [pdf, other

    cs.CV cs.AI cs.GR

    SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

    Authors: Chao Xu, Ang Li, Linghao Chen, Yulin Liu, Ruoxi Shi, Hao Su, Minghua Liu

    Abstract: Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may not align with users' expectations. In this paper, we explore an important scenario in which the input consists of one or a few unposed 2D images of a sing… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  30. arXiv:2408.09858  [pdf, ps, other

    cs.LG cs.AR

    ShortCircuit: AlphaZero-Driven Circuit Design

    Authors: Dimitrios Tsaras, Antoine Grosnit, Lei Chen, Zhiyao Xie, Haitham Bou-Ammar, Mingxuan Yuan

    Abstract: Chip design relies heavily on generating Boolean circuits, such as AND-Inverter Graphs (AIGs), from functional descriptions like truth tables. While recent advances in deep learning have aimed to accelerate circuit design, these efforts have mostly focused on tasks other than synthesis, and traditional heuristic methods have plateaued. In this paper, we introduce ShortCircuit, a novel transformer-… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  31. arXiv:2408.08295  [pdf, other

    cs.CV cs.AI cs.LG

    SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training

    Authors: Gengwei Zhang, Liyuan Wang, Guoliang Kang, Ling Chen, Yunchao Wei

    Abstract: In recent years, continual learning with pre-training (CLPT) has received widespread interest, instead of its traditional focus of training from scratch. The use of strong pre-trained models (PTMs) can greatly facilitate knowledge transfer and alleviate catastrophic forgetting, but also suffers from progressive overfitting of pre-trained knowledge into specific downstream tasks. A majority of curr… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: This paper is an extension of our ICCV 23 paper (arXiv:2303.05118)

  32. arXiv:2408.08243  [pdf, other

    quant-ph cs.NI

    From Entanglement Purification Scheduling to Fidelity-constrained Multi-Flow Routing

    Authors: Ziyue Jia, Lin Chen

    Abstract: Recently emerged as a disruptive networking paradigm, quantum networks rely on the mysterious quantum entanglement to teleport qubits without physically transferring quantum particles. However, the state of quantum systems is extremely fragile due to environment noise. A promising technique to combat against quantum decoherence is entanglement purification. To fully exploit its benefit, two fundam… ▽ More

    Submitted 22 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: 15 pages, 12 figures

  33. arXiv:2408.08078  [pdf, other

    cs.CV cs.AI

    Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining

    Authors: Xixi Wang, Zitian Wang, Jingtao Jiang, Lan Chen, Xiao Wang, Bo Jiang

    Abstract: Current works focus on addressing the remote sensing change detection task using bi-temporal images. Although good performance can be achieved, however, seldom of they consider the motion cues which may also be vital. In this work, we revisit the widely adopted bi-temporal images-based framework and propose a novel Coarse-grained Temporal Mining Augmented (CTMA) framework. To be specific, given th… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  34. arXiv:2408.07999  [pdf, other

    cs.CV

    Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement

    Authors: Wenxuan Li, Qin Zou, Chi Chen, Bo Du, Long Chen

    Abstract: In the realm of autonomous driving,accurately detecting occluded or distant objects,referred to as weak positive sample ,presents significant challenges. These challenges predominantly arise during query initialization, where an over-reliance on heatmap confidence often results in a high rate of false positives, consequently masking weaker detections and impairing system performance. To alleviate… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  35. arXiv:2408.06891  [pdf

    cs.AI cs.CE cs.CV cs.LG

    Automatic Feature Recognition and Dimensional Attributes Extraction From CAD Models for Hybrid Additive-Subtractive Manufacturing

    Authors: Muhammad Tayyab Khan, Wenhe Feng, Lequn Chen, Ye Han Ng, Nicholas Yew Jin Tan, Seung Ki Moon

    Abstract: The integration of Computer-Aided Design (CAD), Computer-Aided Process Planning (CAPP), and Computer-Aided Manufacturing (CAM) plays a crucial role in modern manufacturing, facilitating seamless transitions from digital designs to physical products. However, a significant challenge within this integration is the Automatic Feature Recognition (AFR) of CAD models, especially in the context of hybrid… ▽ More

    Submitted 14 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 10 pages, 12 figures. This paper has been accepted for presentation at the ASME IDETC-CIE 2024 conference

  36. arXiv:2408.06743  [pdf, other

    cs.LG

    Class-aware and Augmentation-free Contrastive Learning from Label Proportion

    Authors: Jialiang Wang, Ning Zhang, Shimin Di, Ruidong Wang, Lei Chen

    Abstract: Learning from Label Proportion (LLP) is a weakly supervised learning scenario in which training data is organized into predefined bags of instances, disclosing only the class label proportions per bag. This paradigm is essential for user modeling and personalization, where user privacy is paramount, offering insights into user preferences without revealing individual data. LLP faces a unique diffi… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  37. arXiv:2408.06717  [pdf, other

    cs.LG cs.AI

    Computation-friendly Graph Neural Network Design by Accumulating Knowledge on Large Language Models

    Authors: Jialiang Wang, Shimin Di, Hanmo Liu, Zhili Wang, Jiachuan Wang, Lei Chen, Xiaofang Zhou

    Abstract: Graph Neural Networks (GNNs), like other neural networks, have shown remarkable success but are hampered by the complexity of their architecture designs, which heavily depend on specific data and tasks. Traditionally, designing proper architectures involves trial and error, which requires intensive manual effort to optimize various components. To reduce human workload, researchers try to develop a… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  38. arXiv:2408.06568  [pdf, other

    cs.SE

    MORCoRA: Multi-Objective Refactoring Recommendation Considering Review Availability

    Authors: Lei Chen, Shinpei Hayashi

    Abstract: Background: Search-based refactoring involves searching for a sequence of refactorings to achieve specific objectives. Although a typical objective is improving code quality, a different perspective is also required; the searched sequence must undergo review before being applied and may not be applied if the review fails or is postponed due to no proper reviewers. Aim: Therefore, it is essential t… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Preprint of an article accepted to be published in International Journal of Software Engineering and Knowledge Engineering, (C) 2024 World Scientific Publishing Company, https://www.worldscientific.com/worldscinet/ijseke

  39. arXiv:2408.05897  [pdf, other

    cs.HC

    TRIZ-GPT: An LLM-augmented method for problem-solving

    Authors: Liuqing Chen, Yaxuan Song, Shixian Ding, Lingyun Sun, Peter Childs, Haoyu Zuo

    Abstract: TRIZ, the Theory of Inventive Problem Solving, is derived from a comprehensive analysis of patents across various domains, offering a framework and practical tools for problem-solving. Despite its potential to foster innovative solutions, the complexity and abstractness of TRIZ methodology often make its acquisition and application challenging. This often requires users to have a deep understandin… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  40. arXiv:2408.05778  [pdf, other

    cs.LG math.OC

    Pareto Front Shape-Agnostic Pareto Set Learning in Multi-Objective Optimization

    Authors: Rongguang Ye, Longcan Chen, Wei-Bin Kou, Jinyuan Zhang, Hisao Ishibuchi

    Abstract: Pareto set learning (PSL) is an emerging approach for acquiring the complete Pareto set of a multi-objective optimization problem. Existing methods primarily rely on the mapping of preference vectors in the objective space to Pareto optimal solutions in the decision space. However, the sampling of preference vectors theoretically requires prior knowledge of the Pareto front shape to ensure high pe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 7 pages

    Journal ref: IEEE International Conference on Systems, Man, and Cybernetics (IEEE SMC 2024)

  41. arXiv:2408.05699  [pdf, other

    cs.CV

    MacFormer: Semantic Segmentation with Fine Object Boundaries

    Authors: Guoan Xu, Wenfeng Huang, Tao Wu, Ligeng Chen, Wenjing Jia, Guangwei Gao, Xiatian Zhu, Stuart Perry

    Abstract: Semantic segmentation involves assigning a specific category to each pixel in an image. While Vision Transformer-based models have made significant progress, current semantic segmentation methods often struggle with precise predictions in localized areas like object boundaries. To tackle this challenge, we introduce a new semantic segmentation architecture, ``MacFormer'', which features two key co… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 13 pages, 7 figures, submitted to TIP

  42. arXiv:2408.05584  [pdf

    cs.LG stat.ME

    Dynamical causality under invisible confounders

    Authors: Jinling Yan, Shao-Wu Zhang, Chihao Zhang, Weitian Huang, Jifan Shi, Luonan Chen

    Abstract: Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result,… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 23 pages, 5 figures

  43. arXiv:2408.05307  [pdf

    cs.CE cs.LG

    Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing

    Authors: Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao

    Abstract: Various machine learning (ML)-based in-situ monitoring systems have been developed to detect laser additive manufacturing (LAM) process anomalies and defects. Multimodal fusion can improve in-situ monitoring performance by acquiring and integrating data from multiple modalities, including visual and audio data. However, multimodal fusion employs multiple sensors of different types, which leads to… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 36 pages, 12 figures, 6 tables

  44. arXiv:2408.03957  [pdf, other

    cs.NI cs.IT cs.LG eess.SP

    GNN-Based Joint Channel and Power Allocation in Heterogeneous Wireless Networks

    Authors: Lili Chen, Jingge Zhu, Jamie Evans

    Abstract: The optimal allocation of channels and power resources plays a crucial role in ensuring minimal interference, maximal data rates, and efficient energy utilisation. As a successful approach for tackling resource management problems in wireless networks, Graph Neural Networks (GNNs) have attracted a lot of attention. This article proposes a GNN-based algorithm to address the joint resource allocatio… ▽ More

    Submitted 28 July, 2024; originally announced August 2024.

  45. arXiv:2408.03771  [pdf

    cs.CV

    Methodological Explainability Evaluation of an Interpretable Deep Learning Model for Post-Hepatectomy Liver Failure Prediction Incorporating Counterfactual Explanations and Layerwise Relevance Propagation: A Prospective In Silico Trial

    Authors: Xian Zhong, Zohaib Salahuddin, Yi Chen, Henry C Woodruff, Haiyi Long, Jianyun Peng, Nuwan Udawatte, Roberto Casale, Ayoub Mokhtari, Xiaoer Zhang, Jiayao Huang, Qingyu Wu, Li Tan, Lili Chen, Dongming Li, Xiaoyan Xie, Manxia Lin, Philippe Lambin

    Abstract: Artificial intelligence (AI)-based decision support systems have demonstrated value in predicting post-hepatectomy liver failure (PHLF) in hepatocellular carcinoma (HCC). However, they often lack transparency, and the impact of model explanations on clinicians' decisions has not been thoroughly evaluated. Building on prior research, we developed a variational autoencoder-multilayer perceptron (VAE… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  46. arXiv:2408.03394  [pdf, other

    cs.RO

    Faster Model Predictive Control via Self-Supervised Initialization Learning

    Authors: Zhaoxin Li, Letian Chen, Rohan Paleja, Subramanya Nageshrao, Matthew Gombolay

    Abstract: Optimization for robot control tasks, spanning various methodologies, includes Model Predictive Control (MPC). However, the complexity of the system, such as non-convex and non-differentiable cost functions and prolonged planning horizons often drastically increases the computation time, limiting MPC's real-world applicability. Prior works in speeding up the optimization have limitations on solvin… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  47. arXiv:2408.02999  [pdf, other

    cs.FL cs.AI

    LLMs as Probabilistic Minimally Adequate Teachers for DFA Learning

    Authors: Lekai Chen, Ashutosh Trivedi, Alvaro Velasquez

    Abstract: The emergence of intelligence in large language models (LLMs) has inspired investigations into their integration into automata learning. This paper introduces the probabilistic Minimally Adequate Teacher (pMAT) formulation, which leverages a probabilistic oracle that could give persistent errors randomly during answering the membership queries for deterministic finite automata (DFA) learning. Give… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  48. arXiv:2408.02293  [pdf, other

    cs.RO eess.SY

    OPENGRASP-LITE Version 1.0: A Tactile Artificial Hand with a Compliant Linkage Mechanism

    Authors: Sonja Groß, Michael Ratzel, Edgar Welte, Diego Hidalgo-Carvajal, Lingyun Chen, Edmundo Pozo Fortunić, Amartya Ganguly, Abdalla Swikir, Sami Haddadin

    Abstract: Recent research has seen notable progress in the development of linkage-based artificial hands. While previous designs have focused on adaptive grasping, dexterity and biomimetic artificial skin, only a few systems have proposed a lightweight, accessible solution integrating tactile sensing with a compliant linkage-based mechanism. This paper introduces OPENGRASP LITE, an open-source, highly integ… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted at IEEE/RSJ International Conference on Intelligent Robots and Systems, 14-18 October 2024

  49. arXiv:2408.01976  [pdf, other

    cs.CV

    Single-Point Supervised High-Resolution Dynamic Network for Infrared Small Target Detection

    Authors: Jing Wu, Rixiang Ni, Feng Huang, Zhaobing Qiu, Liqiong Chen, Changhai Luo, Yunxiang Li, Youli Li

    Abstract: Infrared small target detection (IRSTD) tasks are extremely challenging for two main reasons: 1) it is difficult to obtain accurate labelling information that is critical to existing methods, and 2) infrared (IR) small target information is easily lost in deep networks. To address these issues, we propose a single-point supervised high-resolution dynamic network (SSHD-Net). In contrast to existing… ▽ More

    Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

  50. arXiv:2408.01120  [pdf, other

    cs.CV

    An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding

    Authors: Wei Chen, Long Chen, Yu Wu

    Abstract: Most advanced visual grounding methods rely on Transformers for visual-linguistic feature fusion. However, these Transformer-based approaches encounter a significant drawback: the computational costs escalate quadratically due to the self-attention mechanism in the Transformer Encoder, particularly when dealing with high-resolution images or long context sentences. This quadratic increase in compu… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 21pages, 10 figures, 9 tables. Accepted to ECCV 2024