Skip to main content

Showing 1–50 of 1,340 results for author: Yang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.00846  [pdf, ps, other

    math.CO cs.CC math.MG

    Undecidability of Translational Tiling of the 4-dimensional Space with a Set of 4 Polyhypercubes

    Authors: Chao Yang, Zhujun Zhang

    Abstract: Recently, Greenfeld and Tao disproof the conjecture that translational tilings of a single tile can always be periodic [Ann. Math. 200(2024), 301-363]. In another paper [to appear in J. Eur. Math. Soc.], they also show that if the dimension $n$ is part of the input, the translational tiling for subsets of $\mathbb{Z}^n$ with one tile is undecidable. These two results are very strong pieces of evid… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 18 pages, 20 figures

  2. Dependency-Aware Code Naturalness

    Authors: Chen Yang, Junjie Chen, Jiajun Jiang, Yuliang Huang

    Abstract: Code naturalness, which captures repetitiveness and predictability in programming languages, has proven valuable for various code-related tasks in software engineering. However, precisely measuring code naturalness remains a fundamental challenge. Existing methods measure code naturalness over individual lines of code while ignoring the deep semantic relations among different lines, e.g., program… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  3. arXiv:2409.00727  [pdf, other

    cs.AI cs.CL cs.IR

    Hound: Hunting Supervision Signals for Few and Zero Shot Node Classification on Text-attributed Graph

    Authors: Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuanhui Yang, Yuanyuan Zhu, Chuang Hu, Bo Du, Jiawei Jiang

    Abstract: Text-attributed graph (TAG) is an important type of graph structured data with text descriptions for each node. Few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. However, the two tasks are challenging due to the lack of supervision signals, and existing methods only use the contrastive loss to align graph-based node embedding and… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  4. arXiv:2409.00369  [pdf, other

    cs.CL

    An Empirical Study on Information Extraction using Large Language Models

    Authors: Ridong Han, Chaohao Yang, Tao Peng, Prayag Tiwari, Xiang Wan, Lu Liu, Benyou Wang

    Abstract: Human-like large language models (LLMs), especially the most powerful and popular ones in OpenAI's GPT family, have proven to be very helpful for many natural language processing (NLP) related tasks. Therefore, various attempts have been made to apply LLMs to information extraction (IE), which is a fundamental NLP task that involves extracting information from unstructured plain text. To demonstra… ▽ More

    Submitted 3 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: This article has an original arxiv version entitled "Is Information Extraction Solved by ChatGPT? An Analysis of Performance, Evaluation Criteria, Robustness and Errors", whose url link is arXiv/2305.14450

  5. arXiv:2409.00349  [pdf, other

    cs.CV

    ToddlerAct: A Toddler Action Recognition Dataset for Gross Motor Development Assessment

    Authors: Hsiang-Wei Huang, Jiacheng Sun, Cheng-Yen Yang, Zhongyu Jiang, Li-Yu Huang, Jenq-Neng Hwang, Yu-Ching Yeh

    Abstract: Assessing gross motor development in toddlers is crucial for understanding their physical development and identifying potential developmental delays or disorders. However, existing datasets for action recognition primarily focus on adults, lacking the diversity and specificity required for accurate assessment in toddlers. In this paper, we present ToddlerAct, a toddler gross motor action recogniti… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Accepted by 2024 ECCV ABAW Workshop

  6. arXiv:2408.17214  [pdf, other

    cs.IR

    Efficient Multi-task Prompt Tuning for Recommendation

    Authors: Ting Bai, Le Huang, Yue Yu, Cheng Yang, Cheng Hou, Zhe Zhao, Chuan Shi

    Abstract: With the expansion of business scenarios, real recommender systems are facing challenges in dealing with the constantly emerging new tasks in multi-task learning frameworks. In this paper, we attempt to improve the generalization ability of multi-task recommendations when dealing with new tasks. We find that joint training will enhance the performance of the new task but always negatively impact e… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  7. arXiv:2408.16180  [pdf, other

    eess.AS cs.CL cs.SD

    Benchmarking Japanese Speech Recognition on ASR-LLM Setups with Multi-Pass Augmented Generative Error Correction

    Authors: Yuka Ko, Sheng Li, Chao-Han Huck Yang, Tatsuya Kawahara

    Abstract: With the strong representational power of large language models (LLMs), generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors. This work explores how LLM-based GER can enhance and expand the capabilities of Japanese language processing, presenting the first GER benchmark for Japanese ASR with 0.9-2.6k text u… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: submitted to SLT2024

  8. arXiv:2408.15991  [pdf, other

    cs.CV

    Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

    Authors: Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun

    Abstract: Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into an one-step student generator, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  9. arXiv:2408.14851  [pdf, other

    cs.IR

    Graph and Sequential Neural Networks in Session-based Recommendation: A Survey

    Authors: Zihao Li, Chao Yang, Yakun Chen, Xianzhi Wang, Hongxu Chen, Guandong Xu, Lina Yao, Quan Z. Sheng

    Abstract: Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users' short-term preference capture and aims to provide a more dynamic and timely recommendation based on the ongoing interacted actions. In this survey, we will give a comprehensive overview… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  10. arXiv:2408.14757  [pdf, other

    cs.CV cs.LG

    Learning effective pruning at initialization from iterative pruning

    Authors: Shengkai Liu, Yaofeng Cheng, Fusheng Zha, Wei Guo, Lining Sun, Zhenshan Bing, Chenguang Yang

    Abstract: Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  11. arXiv:2408.11599  [pdf, other

    cs.CL cs.AI

    Cause-Aware Empathetic Response Generation via Chain-of-Thought Fine-Tuning

    Authors: Xinhao Chen, Chong Yang, Man Lan, Li Cai, Yang Chen, Tu Hu, Xinlin Zhuang, Aimin Zhou

    Abstract: Empathetic response generation endows agents with the capability to comprehend dialogue contexts and react to expressed emotions. Previous works predominantly focus on leveraging the speaker's emotional labels, but ignore the importance of emotion cause reasoning in empathetic response generation, which hinders the model's capacity for further affective understanding and cognitive inference. In th… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  12. arXiv:2408.09865  [pdf, other

    cs.LG cs.CL cs.IR

    MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation

    Authors: Ching-Wen Yang, Che Wei Chen, Kun-da Wu, Hao Xu, Jui-Feng Yao, Hung-Yu Kao

    Abstract: Explainable Recommendation task is designed to receive a pair of user and item and output explanations to justify why an item is recommended to a user. Many models treat review-generation as a proxy of explainable recommendation. Although they are able to generate fluent and grammatical sentences, they suffer from generality and hallucination issues. We propose a personalized, aspect-controlled mo… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 main pages, 10 pages for appendix. Under review

  13. Revisiting Reciprocal Recommender Systems: Metrics, Formulation, and Method

    Authors: Chen Yang, Sunhao Dai, Yupeng Hou, Wayne Xin Zhao, Jun Xu, Yang Song, Hengshu Zhu

    Abstract: Reciprocal recommender systems~(RRS), conducting bilateral recommendations between two involved parties, have gained increasing attention for enhancing matching efficiency. However, the majority of existing methods in the literature still reuse conventional ranking metrics to separately assess the performance on each side of the recommendation process. These methods overlook the fact that the rank… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: KDD 2024

  14. arXiv:2408.09665  [pdf, other

    cs.CV

    SG-GS: Photo-realistic Animatable Human Avatars with Semantically-Guided Gaussian Splatting

    Authors: Haoyu Zhao, Chen Yang, Hao Wang, Xingyue Zhao, Wei Shen

    Abstract: Reconstructing photo-realistic animatable human avatars from monocular videos remains challenging in computer vision and graphics. Recently, methods using 3D Gaussians to represent the human body have emerged, offering faster optimization and real-time rendering. However, due to ignoring the crucial role of human body semantic information which represents the intrinsic structure and connections wi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 12 pages, 5 figures

  15. arXiv:2408.09663  [pdf, other

    cs.CV

    CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning

    Authors: Haoyu Zhao, Hao Wang, Chen Yang, Wei Shen

    Abstract: Recent advancements in human avatar synthesis have utilized radiance fields to reconstruct photo-realistic animatable human avatars. However, both NeRFs-based and 3DGS-based methods struggle with maintaining 3D consistency and exhibit suboptimal detail reconstruction, especially with sparse inputs. To address this challenge, we propose CHASE, which introduces supervision from intrinsic 3D consiste… ▽ More

    Submitted 19 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 13 pages, 6 figures

  16. arXiv:2408.09218  [pdf

    eess.IV cs.CV cs.LG

    FQGA-single: Towards Fewer Training Epochs and Fewer Model Parameters for Image-to-Image Translation Tasks

    Authors: Cho Yang

    Abstract: CycleGAN was trained on SynthRAD Grand Challenge Dataset using the single-epoch modification (SEM) method proposed in this paper which is referred to as (CycleGAN-single) compared to the usual method of training CycleGAN on around 200 epochs (CycleGAN-multi). Model performance were evaluated qualitatively and quantitatively with quantitative performance metrics like PSNR, SSIM, MAE and MSE. The co… ▽ More

    Submitted 22 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

  17. arXiv:2408.08685  [pdf, other

    cs.LG cs.AI cs.CY cs.SI

    Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks?

    Authors: Zhongjian Zhang, Xiao Wang, Huichi Zhou, Yue Yu, Mengmei Zhang, Cheng Yang, Chuan Shi

    Abstract: Graph neural networks (GNNs) are vulnerable to adversarial perturbations, especially for topology attacks, and many methods that improve the robustness of GNNs have received considerable attention. Recently, we have witnessed the significant success of large language models (LLMs), leading many to explore the great potential of LLMs on GNNs. However, they mainly focus on improving the performance… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  18. arXiv:2408.05575  [pdf, other

    cs.AI cs.GT

    In-Context Exploiter for Extensive-Form Games

    Authors: Shuxin Li, Chang Yang, Youzhi Zhang, Pengdeng Li, Xinrun Wang, Xiao Huang, Hau Chan, Bo An

    Abstract: Nash equilibrium (NE) is a widely adopted solution concept in game theory due to its stability property. However, we observe that the NE strategy might not always yield the best results, especially against opponents who do not adhere to NE strategies. Based on this observation, we pose a new game-solving question: Can we learn a model that can exploit any, even NE, opponent to maximize their own u… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  19. arXiv:2408.05124  [pdf, other

    cs.CR cs.CV

    Modeling Electromagnetic Signal Injection Attacks on Camera-based Smart Systems: Applications and Mitigation

    Authors: Youqian Zhang, Michael Cheung, Chunxi Yang, Xinwei Zhai, Zitong Shen, Xinyu Ji, Eugene Y. Fu, Sze-Yiu Chau, Xiapu Luo

    Abstract: Numerous safety- or security-critical systems depend on cameras to perceive their surroundings, further allowing artificial intelligence (AI) to analyze the captured images to make important decisions. However, a concerning attack vector has emerged, namely, electromagnetic waves, which pose a threat to the integrity of these systems. Such attacks enable attackers to manipulate the images remotely… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 13 pages, 10 figures, 4 tables

  20. arXiv:2408.03748  [pdf, other

    cs.CV

    Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model

    Authors: Guoqing Zhu, Honghu Pan, Qiang Wang, Chao Tian, Chao Yang, Zhenyu He

    Abstract: In challenging low light and adverse weather conditions,thermal vision algorithms,especially object detection,have exhibited remarkable potential,contrasting with the frequent struggles encountered by visible vision algorithms. Nevertheless,the efficacy of thermal vision algorithms driven by deep learning models remains constrained by the paucity of available training data samples. To this end,thi… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: accepted by ACM MM 2024/ACM MM24

  21. arXiv:2408.02311  [pdf, other

    cs.SE

    PTM4Tag+: Tag Recommendation of Stack Overflow Posts with Pre-trained Models

    Authors: Junda He, Bowen Xu, Zhou Yang, DongGyun Han, Chengran Yang, Jiakun Liu, Zhipeng Zhao, David Lo

    Abstract: Stack Overflow is one of the most influential Software Question & Answer (SQA) websites, hosting millions of programming-related questions and answers. Tags play a critical role in efficiently organizing the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant content. Poorly selected tags often raise problems like tag ambiguity and tag explosion.… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2203.10965

  22. arXiv:2408.02196  [pdf, ps, other

    math.CO cs.CC math.MG

    Undecidability of Translational Tiling of the 3-dimensional Space with a Set of 6 Polycubes

    Authors: Chao Yang, Zhujun Zhang

    Abstract: This paper focuses on the undecidability of translational tiling of $n$-dimensional space $\mathbb{Z}^n$ with a set of $k$ tiles. It is known that tiling $\mathbb{Z}^2$ with translated copies with a set of $8$ tiles is undecidable. Greenfeld and Tao gave strong evidence in a series of works that for sufficiently large dimension $n$, the translational tiling problem for $\mathbb{Z}^n$ might be unde… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  23. arXiv:2407.21331  [pdf, other

    cs.CV

    CAMAv2: A Vision-Centric Approach for Static Map Element Annotation

    Authors: Shiyuan Chen, Jiaxin Zhang, Ruohong Mei, Yingfeng Cai, Haoran Yin, Tao Chen, Wei Sui, Cong Yang

    Abstract: The recent development of online static map element (a.k.a. HD map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. For instance, the manual labelled (low efficiency) nuScenes still contains misalignment and inconsistency between the HD… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2309.11754

  24. An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation

    Authors: Cheng Yang, Guoping Huang, Mo Yu, Zhirui Zhang, Siheng Li, Mingming Yang, Shuming Shi, Yujiu Yang, Lemao Liu

    Abstract: Word-level AutoCompletion(WLAC) is a rewarding yet challenging task in Computer-aided Translation. Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label (i.e., the candidate target word is treated as a label). Since the context hidden vector itself does not take the label into account… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to TACL 2024

  25. arXiv:2407.19041  [pdf, other

    cs.AI cs.CL

    Optimizing Numerical Estimation and Operational Efficiency in the Legal Domain through Large Language Models

    Authors: Jia-Hong Huang, Chao-Chun Yang, Yixian Shen, Alessio M. Pacces, Evangelos Kanoulas

    Abstract: The legal landscape encompasses a wide array of lawsuit types, presenting lawyers with challenges in delivering timely and accurate information to clients, particularly concerning critical aspects like potential imprisonment duration or financial repercussions. Compounded by the scarcity of legal experts, there's an urgent need to enhance the efficiency of traditional legal workflows. Recent advan… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: The paper has been accepted by the 33rd ACM International Conference on Information and Knowledge Management (CIKM) in 2024

  26. arXiv:2407.18480  [pdf, other

    cs.LG

    Scalable Graph Compressed Convolutions

    Authors: Junshu Sun, Chenxue Yang, Shuhui Wang, Qingming Huang

    Abstract: Designing effective graph neural networks (GNNs) with message passing has two fundamental challenges, i.e., determining optimal message-passing pathways and designing local aggregators. Previous methods of designing optimal pathways are limited with information loss on the input features. On the other hand, existing local aggregators generally fail to extract multi-scale features and approximate d… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  27. arXiv:2407.17344  [pdf, other

    cs.CL

    Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition

    Authors: Ke Bao, Chonghuan Yang

    Abstract: Named entity recognition on the in-domain supervised and few-shot settings have been extensively discussed in the NLP community and made significant progress. However, cross-domain NER, a more common task in practical scenarios, still poses a challenge for most NER methods. Previous research efforts in that area primarily focus on knowledge transfer such as correlate label information from source… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 9 pages, 4 figures

  28. arXiv:2407.16370  [pdf, other

    cs.CL cs.SD eess.AS

    Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction

    Authors: Rithik Sachdev, Zhong-Qiu Wang, Chao-Han Huck Yang

    Abstract: Building upon the strength of modern large language models (LLMs), generative error correction (GEC) has emerged as a promising paradigm that can elevate the performance of modern automatic speech recognition (ASR) systems. One representative approach is to leverage in-context learning to prompt LLMs so that a better hypothesis can be generated by the LLMs based on a carefully-designed prompt and… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: in submission

  29. arXiv:2407.16351  [pdf, other

    cs.HC

    Datasets of Visualization for Machine Learning

    Authors: Can Liu, Ruike Jiang, Shaocong Tan, Jiacheng Yu, Chaofan Yang, Hanning Shao, Xiaoru Yuan

    Abstract: Datasets of visualization play a crucial role in automating data-driven visualization pipelines, serving as the foundation for supervised model training and algorithm benchmarking. In this paper, we survey the literature on visualization datasets and provide a comprehensive overview of existing visualization datasets, including their data types, formats, supported tasks, and openness. We propose a… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 15 pages

  30. arXiv:2407.16327  [pdf, other

    cs.CR cs.CV

    Understanding Impacts of Electromagnetic Signal Injection Attacks on Object Detection

    Authors: Youqian Zhang, Chunxi Yang, Eugene Y. Fu, Qinhong Jiang, Chen Yan, Sze-Yiu Chau, Grace Ngai, Hong-Va Leong, Xiapu Luo, Wenyuan Xu

    Abstract: Object detection can localize and identify objects in images, and it is extensively employed in critical multimedia applications such as security surveillance and autonomous driving. Despite the success of existing object detection models, they are often evaluated in ideal scenarios where captured images guarantee the accurate and complete representation of the detecting scenes. However, images ca… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME), July 15 - July 19, 2024, Niagra Falls, Ontario, Canada

  31. arXiv:2407.16008  [pdf, other

    cs.CL

    Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation

    Authors: Jiaming Shen, Ran Xu, Yennie Jun, Zhen Qin, Tianqi Liu, Carl Yang, Yi Liang, Simon Baumgartner, Michael Bendersky

    Abstract: Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. They are trained using preference datasets where each example consists of one input prompt, two responses, and a preference label. As curating a high-quality human labeled preference dataset is both time-consuming and expensive, people often rely on existing powerful LLMs for preference label generati… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  32. arXiv:2407.15066  [pdf, other

    cs.CV

    LSReGen: Large-Scale Regional Generator via Backward Guidance Framework

    Authors: Bowen Zhang, Cheng Yang, Xuanhui Liu

    Abstract: In recent years, advancements in AIGC (Artificial Intelligence Generated Content) technology have significantly enhanced the capabilities of large text-to-image models. Despite these improvements, controllable image generation remains a challenge. Current methods, such as training, forward guidance, and backward guidance, have notable limitations. The first two approaches either demand substantial… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  33. arXiv:2407.14829  [pdf, other

    cs.CL

    Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

    Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

    Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More

    Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

  34. arXiv:2407.14651  [pdf, other

    eess.IV cs.AI cs.CV

    Improving Representation of High-frequency Components for Medical Foundation Models

    Authors: Yuetan Chu, Yilan Zhang, Zhongyi Han, Changchun Yang, Longxi Zhou, Gongning Luo, Xin Gao

    Abstract: Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomic… ▽ More

    Submitted 25 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  35. arXiv:2407.13937  [pdf, other

    cs.CV

    Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check

    Authors: Sheng-Yao Kuan, Jen-Hao Cheng, Hsiang-Wei Huang, Wenhao Chai, Cheng-Yen Yang, Hugo Latapie, Gaowen Liu, Bing-Fei Wu, Jenq-Neng Hwang

    Abstract: In the domain of autonomous driving, the integration of multi-modal perception techniques based on data from diverse sensors has demonstrated substantial progress. Effectively surpassing the capabilities of state-of-the-art single-modality detectors through sensor fusion remains an active challenge. This work leverages the respective advantages of cameras in perspective view and radars in Bird's E… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE Intelligent Vehicles Symposium (IV)

  36. arXiv:2407.13460  [pdf, other

    cs.CV cs.LG

    SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

    Authors: Sheng-Wei Li, Zi-Xiang Wei, Wei-Jie Chen, Yi-Hsin Yu, Chih-Yuan Yang, Jane Yung-jen Hsu

    Abstract: Existing zero-shot skeleton-based action recognition methods utilize projection networks to learn a shared latent space of skeleton features and semantic embeddings. The inherent imbalance in action recognition datasets, characterized by variable skeleton sequences yet constant class labels, presents significant challenges for alignment. To address the imbalance, we propose SA-DVAE -- Semantic Ali… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  37. arXiv:2407.11677  [pdf, other

    cs.CV

    Video-Language Alignment via Spatio-Temporal Graph Transformer

    Authors: Shi-Xue Zhang, Hongfa Wang, Xiaobin Zhu, Weibo Gu, Tianjin Zhang, Chun Yang, Wei Liu, Xu-Cheng Yin

    Abstract: Video-language alignment is a crucial multi-modal task that benefits various downstream applications, e.g., video-text retrieval and video question answering. Existing methods either utilize multi-modal information in video-text pairs or apply global and local alignment techniques to promote alignment precision. However, these methods often fail to fully explore the spatio-temporal relationships a… ▽ More

    Submitted 23 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: under review

  38. arXiv:2407.11480  [pdf, other

    cs.LG cs.AI

    AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models

    Authors: Lei Ren, Haiteng Wang, Yang Tang, Chunhua Yang

    Abstract: With the remarkable success of generative models like ChatGPT, Artificial Intelligence Generated Content (AIGC) is undergoing explosive development. Not limited to text and images, generative models can generate industrial time series data, addressing challenges such as the difficulty of data collection and data annotation. Due to their outstanding generation ability, they have been widely used in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 17 pages, 4 figures.This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  39. GeoMix: Towards Geometry-Aware Data Augmentation

    Authors: Wentao Zhao, Qitian Wu, Chenxiao Yang, Junchi Yan

    Abstract: Mixup has shown considerable success in mitigating the challenges posed by limited labeled data in image classification. By synthesizing samples through the interpolation of features and labels, Mixup effectively addresses the issue of data scarcity. However, it has rarely been explored in graph learning tasks due to the irregularity and connectivity of graph data. Specifically, in node classifica… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Published as a conference paper at KDD 2024

  40. arXiv:2407.10142  [pdf, other

    cs.CV

    PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration

    Authors: Runzhao Yao, Shaoyi Du, Wenting Cui, Canhui Tang, Chengwu Yang

    Abstract: Learning rotation-invariant distinctive features is a fundamental requirement for point cloud registration. Existing methods often use rotation-sensitive networks to extract features, while employing rotation augmentation to learn an approximate invariant mapping rudely. This makes networks fragile to rotations, overweight, and hinders the distinctiveness of features. To tackle these problems, we… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  41. arXiv:2407.09886  [pdf, other

    eess.AS cs.CL cs.SD

    Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation

    Authors: Chun-Yi Kuan, Chih-Kai Yang, Wei-Ping Huang, Ke-Han Lu, Hung-yi Lee

    Abstract: In this work, we introduce Speech-Copilot, a modular framework for instruction-oriented speech-processing tasks that minimizes human effort in toolset construction. Unlike end-to-end methods using large audio-language models, Speech-Copilot builds speech processing-specific toolsets by analyzing pre-collected task instructions and breaking tasks into manageable sub-tasks. It features a flexible ag… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 8 pages, 2 figures

  42. arXiv:2407.07924  [pdf, other

    math.OC cs.AI cs.CL cs.LG

    Solving General Natural-Language-Description Optimization Problems with Large Language Models

    Authors: Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, Wotao Yin

    Abstract: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  43. arXiv:2407.07061  [pdf, other

    cs.CL

    Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

    Authors: Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

    Abstract: The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: work in progress

  44. arXiv:2407.06957  [pdf, other

    eess.AS cs.CL cs.CY

    Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models

    Authors: Yi-Cheng Lin, Tzu-Quan Lin, Chih-Kai Yang, Ke-Han Lu, Wei-Chih Chen, Chun-Yi Kuan, Hung-yi Lee

    Abstract: Speech Integrated Large Language Models (SILLMs) combine large language models with speech perception to perform diverse tasks, such as emotion recognition to speaker verification, demonstrating universal audio understanding capability. However, these models may amplify biases present in training data, potentially leading to biased access to information for marginalized groups. This work introduce… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  45. arXiv:2407.05934  [pdf, other

    cs.LG cs.AI

    Graph Anomaly Detection with Noisy Labels by Reinforcement Learning

    Authors: Zhu Wang, Shuang Zhou, Junnan Dong, Chang Yang, Xiao Huang, Shengjie Zhao

    Abstract: Graph anomaly detection (GAD) has been widely applied in many areas, e.g., fraud detection in finance and robot accounts in social networks. Existing methods are dedicated to identifying the outlier nodes that deviate from normal ones. While they heavily rely on high-quality annotation, which is hard to obtain in real-world scenarios, this could lead to severely degraded performance based on noisy… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  46. arXiv:2407.05718  [pdf, other

    cs.CL

    A Factuality and Diversity Reconciled Decoding Method for Knowledge-Grounded Dialogue Generation

    Authors: Chenxu Yang, Zheng Lin, Chong Tian, Liang Pang, Lanrui Wang, Zhengyang Tong, Qirong Ho, Yanan Cao, Weiping Wang

    Abstract: Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to disc… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  47. arXiv:2407.05361  [pdf, other

    eess.AS cs.CL

    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

    Authors: Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu

    Abstract: Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first op… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Fix typos

  48. arXiv:2407.05216  [pdf, other

    cs.CL

    Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

    Authors: Cheng-Han Chiang, Wei-Chih Chen, Chun-Yi Kuan, Chienchou Yang, Hung-yi Lee

    Abstract: Using large language models (LLMs) for automatic evaluation has become an important evaluation method in NLP research. However, it is unclear whether these LLM-based evaluators can be applied in real-world classrooms to assess student assignments. This empirical report shares how we use GPT-4 as an automatic assignment evaluator in a university course with 1,028 students. Based on student response… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: An empirical report of our course: Introduction to Generative AI 2024 Spring (https://speech.ee.ntu.edu.tw/~hylee/genai/2024-spring.php)

  49. arXiv:2407.04738  [pdf

    eess.SP cs.LG cs.RO

    A Contrastive Learning Based Convolutional Neural Network for ERP Brain-Computer Interfaces

    Authors: Yuntian Cui, Xinke Shen, Dan Zhang, Chen Yang

    Abstract: ERP-based EEG detection is gaining increasing attention in the field of brain-computer interfaces. However, due to the complexity of ERP signal components, their low signal-to-noise ratio, and significant inter-subject variability, cross-subject ERP signal detection has been challenging. The continuous advancement in deep learning has greatly contributed to addressing this issue. This brief propos… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 5 pages, 2 figures, 2 tables

  50. A Pairwise DomMix Attentive Adversarial Network for Unsupervised Domain Adaptive Object Detection

    Authors: Jie Shao, Jiacheng Wu, Wenzhong Shen, Cheng Yang

    Abstract: Unsupervised Domain Adaptive Object Detection (DAOD) could adapt a model trained on a source domain to an unlabeled target domain for object detection. Existing unsupervised DAOD methods usually perform feature alignments from the target to the source. Unidirectional domain transfer would omit information about the target samples and result in suboptimal adaptation when there are large domain shif… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: has published on IEEE Signal Processing Letters, 2023