Skip to main content

Showing 1–50 of 4,037 results for author: Yang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03644  [pdf, other

    cs.CV

    RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images

    Authors: Benzhi Wang, Jingkai Zhou, Jingqi Bai, Yang Yang, Weihua Chen, Fan Wang, Zhen Lei

    Abstract: In recent years, diffusion models have revolutionized visual generation, outperforming traditional frameworks like Generative Adversarial Networks (GANs). However, generating images of humans with realistic semantic parts, such as hands and faces, remains a significant challenge due to their intricate structural complexity. To address this issue, we propose a novel post-processing solution named R… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2409.03198  [pdf, other

    cs.CV

    RoomDiffusion: A Specialized Diffusion Model in the Interior Design Industry

    Authors: Zhaowei Wang, Ying Hao, Hao Wei, Qing Xiao, Lulu Chen, Yulong Li, Yue Yang, Tianyi Li

    Abstract: Recent advancements in text-to-image diffusion models have significantly transformed visual content generation, yet their application in specialized fields such as interior design remains underexplored. In this paper, we present RoomDiffusion, a pioneering diffusion model meticulously tailored for the interior design industry. To begin with, we build from scratch a whole data pipeline to update an… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.02727  [pdf, other

    cs.CL cs.IR

    Pooling And Attention: What Are Effective Designs For LLM-Based Embedding Models?

    Authors: Yixuan Tang, Yi Yang

    Abstract: The significant advancements of Large Language Models (LLMs) in generative tasks have led to a growing body of work exploring LLM-based embedding models. While these models, employing different pooling and attention strategies, have achieved state-of-the-art performance on public embedding benchmarks, questions still arise about what constitutes an effective design for LLM-based embedding models.… ▽ More

    Submitted 5 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: https://github.com/yixuantt/PoolingAndAttn

  4. arXiv:2409.02444  [pdf, other

    cs.RO eess.SY

    USV-AUV Collaboration Framework for Underwater Tasks under Extreme Sea Conditions

    Authors: Jingzehua Xu, Guanwen Xie, Xinqi Wang, Yiyuan Yang, Shuai Zhang

    Abstract: Autonomous underwater vehicles (AUVs) are valuable for ocean exploration due to their flexibility and ability to carry communication and detection units. Nevertheless, AUVs alone often face challenges in harsh and extreme sea conditions. This study introduces a unmanned surface vehicle (USV)-AUV collaboration framework, which includes high-precision multi-AUV positioning using USV path planning vi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  5. arXiv:2409.02428  [pdf, other

    cs.LG cs.AI cs.CL eess.SY

    Large Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement Learning

    Authors: Guanwen Xie, Jingzehua Xu, Yiyuan Yang, Shuai Zhang

    Abstract: Leveraging large language models (LLMs) for designing reward functions demonstrates significant potential. However, achieving effective design and improvement of reward functions in reinforcement learning (RL) tasks with complex custom environments and multiple requirements presents considerable challenges. In this paper, we enable LLMs to be effective white-box searchers, highlighting their advan… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  6. Action-Based ADHD Diagnosis in Video

    Authors: Yichun Li, Yuxing Yang, Syed Nohsen Naqvi

    Abstract: Attention Deficit Hyperactivity Disorder (ADHD) causes significant impairment in various domains. Early diagnosis of ADHD and treatment could significantly improve the quality of life and functioning. Recently, machine learning methods have improved the accuracy and efficiency of the ADHD diagnosis process. However, the cost of the equipment and trained staff required by the existing methods are g… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 31st European Symposium on Artificial Neural Networks

  7. arXiv:2409.01646  [pdf, other

    cs.RO

    BEVNav: Robot Autonomous Navigation Via Spatial-Temporal Contrastive Learning in Bird's-Eye View

    Authors: Jiahao Jiang, Yuxiang Yang, Yingqi Deng, Chenlong Ma, Jing Zhang

    Abstract: Goal-driven mobile robot navigation in map-less environments requires effective state representations for reliable decision-making. Inspired by the favorable properties of Bird's-Eye View (BEV) in point clouds for visual perception, this paper introduces a novel navigation approach named BEVNav. It employs deep reinforcement learning to learn BEV representations and enhance decision-making reliabi… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  8. arXiv:2409.01522  [pdf, other

    cs.CV

    Lagrangian Motion Fields for Long-term Motion Generation

    Authors: Yifei Yang, Zikai Huang, Chenshu Xu, Shengfeng He

    Abstract: Long-term motion generation is a challenging task that requires producing coherent and realistic sequences over extended durations. Current methods primarily rely on framewise motion representations, which capture only static spatial details and overlook temporal dynamics. This approach leads to significant redundancy across the temporal dimension, complicating the generation of effective long-ter… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 13 pages, 9 figures

  9. arXiv:2409.01495  [pdf, other

    cs.CL

    The Compressor-Retriever Architecture for Language Model OS

    Authors: Yuan Yang, Siheng Xiong, Ehsan Shareghi, Faramarz Fekri

    Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their capacity to aggregate and process information across multiple modalities, enabling them to perform a wide range of tasks such as multimodal data querying, tool usage, web interactions, and handling long documents. These capabilities pave the way for transforming LLMs from mere chatbots into general-purpose agents… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  10. arXiv:2409.00901  [pdf, other

    stat.ML cs.LG math.NA

    On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks

    Authors: Yunfei Yang

    Abstract: This paper studies the problem of how efficiently functions in the Sobolev spaces $\mathcal{W}^{s,q}([0,1]^d)$ and Besov spaces $\mathcal{B}^s_{q,r}([0,1]^d)$ can be approximated by deep ReLU neural networks with width $W$ and depth $L$, when the error is measured in the $L^p([0,1]^d)$ norm. This problem has been studied by several recent works, which obtained the approximation rate… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  11. arXiv:2409.00819  [pdf, other

    cs.SD cs.CL eess.AS

    LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

    Authors: Zengrui Jin, Yifan Yang, Mohan Shi, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey

    Abstract: The evolving speech processing landscape is increasingly focused on complex scenarios like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions. Existing methodologies for addressing these challenges fall into two categories: multi-channel and single-channel solutions. Single-channel approaches, notable for their generality and convenience, do not require speci… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: InterSpeech 2024

  12. arXiv:2409.00622  [pdf, other

    cs.CV cs.AI cs.LG

    Roundabout Dilemma Zone Data Mining and Forecasting with Trajectory Prediction and Graph Neural Networks

    Authors: Manthan Chelenahalli Satish, Duo Lu, Bharatesh Chakravarthi, Mohammad Farhadi, Yezhou Yang

    Abstract: Traffic roundabouts, as complex and critical road scenarios, pose significant safety challenges for autonomous vehicles. In particular, the encounter of a vehicle with a dilemma zone (DZ) at a roundabout intersection is a pivotal concern. This paper presents an automated system that leverages trajectory forecasting to predict DZ events, specifically at traffic roundabouts. Our system aims to enhan… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  13. arXiv:2409.00243  [pdf, other

    cs.GT

    PRADA: Proactive Risk Assessment and Mitigation of Misinformed Demand Attacks on Navigational Route Recommendations

    Authors: Ya-Ting Yang, Haozhe Lei, Quanyan Zhu

    Abstract: Leveraging recent advances in wireless communication, IoT, and AI, intelligent transportation systems (ITS) played an important role in reducing traffic congestion and enhancing user experience. Within ITS, navigational recommendation systems (NRS) are essential for helping users simplify route choices in urban environments. However, NRS are vulnerable to information-based attacks that can manipul… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  14. arXiv:2409.00236  [pdf, other

    cs.GT

    Adaptive Incentive-Compatible Navigational Route Recommendations in Urban Transportation Networks

    Authors: Ya-Ting Yang, Haozhe Lei, Quanyan Zhu

    Abstract: In urban transportation environments, drivers often encounter various path (route) options when navigating to their destinations. This emphasizes the importance of navigational recommendation systems (NRS), which simplify decision-making and reduce travel time for users while alleviating potential congestion for broader societal benefits. However, recommending the shortest path may cause the flash… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  15. arXiv:2409.00204  [pdf, other

    eess.IV cs.CV

    MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection

    Authors: Zeyu Zhang, Nengmin Yi, Shengbo Tan, Ying Cai, Yi Yang, Lei Xu, Qingtai Li, Zhang Yi, Daji Ergu, Yang Zhao

    Abstract: Cervical disc herniation (CDH) is a prevalent musculoskeletal disorder that significantly impacts health and requires labor-intensive analysis from experts. Despite advancements in automated detection of medical imaging, two significant challenges hinder the real-world application of these methods. First, the computational complexity and resource demands present a significant gap for real-time app… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  16. arXiv:2409.00162  [pdf, other

    cs.CL cs.AI

    Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

    Authors: Jiayi Zhou, Jiaming Ji, Juntao Dai, Yaodong Yang

    Abstract: Aligning the behavior of Large language models (LLMs) with human intentions and values remains a critical challenge. Reinforcement learning from human feedback (RLHF) aligns LLMs by training a reward model (RM) on human preferences and fine-tuning the LLMs to maximize RM feedback. Despite its effectiveness and popularity, RLHF is prone to biased local optimization. It means RM fails to provide fee… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: 7 pages

  17. arXiv:2409.00122  [pdf, other

    eess.SP cs.AI cs.LG

    Brant-X: A Unified Physiological Signal Alignment Framework

    Authors: Daoze Zhang, Zhizhang Yuan, Junru Chen, Kerui Chen, Yang Yang

    Abstract: Physiological signals serve as indispensable clues for understanding various physiological states of human bodies. Most existing works have focused on a single type of physiological signals for a range of application scenarios. However, as the body is a holistic biological system, the inherent interconnection among various physiological data should not be neglected. In particular, given the brain'… ▽ More

    Submitted 28 August, 2024; originally announced September 2024.

    Comments: Accepted by SIGKDD 2024

    Journal ref: SIGKDD 2024

  18. arXiv:2409.00016  [pdf, other

    cs.IT eess.SP

    Channel Knowledge Map for Cellular-Connected UAV via Binary Bayesian Filtering

    Authors: Yuhang Yang, Xiaoli Xu, Yong Zeng, Haijian Sun, Rose Qingyang Hu

    Abstract: Channel knowledge map (CKM) is a promising technology to enable environment-aware wireless communications and sensing. Link state map (LSM) is one particular type of CKM that aims to learn the location-specific line-of-sight (LoS) link probability between the transmitter and the receiver at all possible locations, which provides the prior information to enhance the communication quality of dynamic… ▽ More

    Submitted 16 August, 2024; originally announced September 2024.

  19. arXiv:2408.17054  [pdf

    cs.CV

    BTMuda: A Bi-level Multi-source unsupervised domain adaptation framework for breast cancer diagnosis

    Authors: Yuxiang Yang, Xinyi Zeng, Pinxian Zeng, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Deep learning has revolutionized the early detection of breast cancer, resulting in a significant decrease in mortality rates. However, difficulties in obtaining annotations and huge variations in distribution between training sets and real scenes have limited their clinical applications. To address these limitations, unsupervised domain adaptation (UDA) methods have been used to transfer knowledg… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  20. arXiv:2408.16633  [pdf

    cs.RO cs.AI

    Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning

    Authors: Keqin Li, Jin Wang, Xubo Wu, Xirui Peng, Runmian Chang, Xiaoyu Deng, Yiwen Kang, Yue Yang, Fanghao Ni, Bo Hong

    Abstract: With the rapid growth of global e-commerce, the demand for automation in the logistics industry is increasing. This study focuses on automated picking systems in warehouses, utilizing deep learning and reinforcement learning technologies to enhance picking efficiency and accuracy while reducing system failure rates. Through empirical analysis, we demonstrate the effectiveness of these technologies… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  21. arXiv:2408.16315  [pdf, other

    cs.HC cs.LG eess.SP

    Passenger hazard perception based on EEG signals for highly automated driving vehicles

    Authors: Ashton Yu Xuan Tan, Yingkai Yang, Xiaofei Zhang, Bowen Li, Xiaorong Gao, Sifa Zheng, Jianqiang Wang, Xinyu Gu, Jun Li, Yang Zhao, Yuxin Zhang, Tania Stathaki

    Abstract: Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  22. arXiv:2408.15915  [pdf, other

    cs.CV cs.AI cs.CL

    Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

    Authors: Yuncheng Yang, Yulei Qin, Tong Wu, Zihan Xu, Gang Li, Pengcheng Guo, Hang Shao, Yucheng Shi, Ke Li, Xing Sun, Jie Yang, Yun Gu

    Abstract: The cultivation of expertise for large language models (LLMs) to solve tasks of specific areas often requires special-purpose tuning with calibrated behaviors on the expected stable outputs. To avoid huge cost brought by manual preparation of instruction datasets and training resources up to hundreds of hours, the exploitation of open knowledge including a wealth of low rank adaptation (LoRA) mode… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 28 pages, 12 tables, 10 figures

  23. arXiv:2408.15813  [pdf, other

    cs.CV

    DQFormer: Towards Unified LiDAR Panoptic Segmentation with Decoupled Queries

    Authors: Yu Yang, Jianbiao Mei, Liang Liu, Siliang Du, Yilin Xiao, Jongwon Ra, Yong Liu, Xiao Xu, Huifeng Wu

    Abstract: LiDAR panoptic segmentation, which jointly performs instance and semantic segmentation for things and stuff classes, plays a fundamental role in LiDAR perception tasks. While most existing methods explicitly separate these two segmentation tasks and utilize different branches (i.e., semantic and instance branches), some recent methods have embraced the query-based paradigm to unify LiDAR panoptic… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages, 10 figures

  24. arXiv:2408.15260  [pdf

    cs.HC cs.CY stat.ME

    Artificial Data, Real Insights: Evaluating Opportunities and Risks of Expanding the Data Ecosystem with Synthetic Data

    Authors: Richard Timpone, Yongwei Yang

    Abstract: Synthetic Data is not new, but recent advances in Generative AI have raised interest in expanding the research toolbox, creating new opportunities and risks. This article provides a taxonomy of the full breadth of the Synthetic Data domain. We discuss its place in the research ecosystem by linking the advances in computational social science with the idea of the Fourth Paradigm of scientific disco… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 38 pages, 10 figures: originally prepared for the 2024 International Conference for Computational Social Science

  25. arXiv:2408.15032  [pdf, other

    cs.CV cs.AI

    Mamba2MIL: State Space Duality Based Multiple Instance Learning for Computational Pathology

    Authors: Yuqi Zhang, Xiaoqian Zhang, Jiakai Wang, Yuancheng Yang, Taiying Peng, Chao Tong

    Abstract: Computational pathology (CPath) has significantly advanced the clinical practice of pathology. Despite the progress made, Multiple Instance Learning (MIL), a promising paradigm within CPath, continues to face challenges, particularly related to incomplete information utilization. Existing frameworks, such as those based on Convolutional Neural Networks (CNNs), attention, and selective scan space s… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  26. arXiv:2408.14868  [pdf, other

    cs.CV

    ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

    Authors: Wenjin Hou, Dingjie Fu, Kun Li, Shiming Chen, Hehe Fan, Yi Yang

    Abstract: Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global visual features from Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for visual-semantic interactions. Due to the limited receptive f… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  27. arXiv:2408.14197  [pdf, other

    cs.CV

    Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

    Authors: Yu Yang, Jianbiao Mei, Yukai Ma, Siliang Du, Wenqing Chen, Yijie Qian, Yuxiang Feng, Yong Liu

    Abstract: World models envision potential future states based on various ego actions. They embed extensive knowledge about the driving environment, facilitating safe and scalable autonomous driving. Most existing methods primarily focus on either data generation or the pretraining paradigms of world models. Unlike the aforementioned prior works, we propose Drive-OccWorld, which adapts a vision-centric 4D fo… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 18 pages, 10 figures

  28. arXiv:2408.13705  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval

    Authors: Lifeng Zhou, Yuke Li, Rui Deng, Yuting Yang, Haoqi Zhu

    Abstract: The success of speech-image retrieval relies on establishing an effective alignment between speech and image. Existing methods often model cross-modal interaction through simple cosine similarity of the global feature of each modality, which fall short in capturing fine-grained details within modalities. To address this issue, we introduce an effective framework and a novel learning task named cro… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2408.13119

  29. arXiv:2408.13627  [pdf, other

    cs.CV

    Recent Event Camera Innovations: A Survey

    Authors: Bharatesh Chakravarthi, Aayush Atul Verma, Kostas Daniilidis, Cornelia Fermuller, Yezhou Yang

    Abstract: Event-based vision, inspired by the human visual system, offers transformative capabilities such as low latency, high dynamic range, and reduced power consumption. This paper presents a comprehensive survey of event cameras, tracing their evolution over time. It introduces the fundamental principles of event cameras, compares them with traditional frame cameras, and highlights their unique charact… ▽ More

    Submitted 27 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

  30. arXiv:2408.13623  [pdf, other

    cs.CV

    Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing

    Authors: Yitong Yang, Yinglin Wang, Jing Wang, Tian Zhang

    Abstract: Text-driven diffusion models have achieved remarkable success in image editing, but a crucial component in these models-text embeddings-has not been fully explored. The entanglement and opacity of text embeddings present significant challenges to achieving precise image editing. In this paper, we provide a comprehensive and in-depth analysis of text embeddings in Stable Diffusion XL, offering thre… ▽ More

    Submitted 26 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

  31. arXiv:2408.13480  [pdf, other

    cs.DB

    Towards a Converged Relational-Graph Optimization Framework

    Authors: Yunkai Lou, Longbin Lai, Bingqing Lyu, Yufan Yang, Xiaoli Zhou, Wenyuan Yu, Ying Zhang, Jingren Zhou

    Abstract: The recent ISO SQL:2023 standard adopts SQL/PGQ (Property Graph Queries), facilitating graph-like querying within relational databases. This advancement, however, underscores a significant gap in how to effectively optimize SQL/PGQ queries within relational database systems. To address this gap, we extend the foundational SPJ(Select-Project-Join) queries to SPJM queries, which include an additiona… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  32. arXiv:2408.13379  [pdf, other

    cs.CV cs.AI

    N-DriverMotion: Driver motion learning and prediction using an event-based camera and directly trained spiking neural networks

    Authors: Hyo Jong Chung, Byungkon Kang, Yoonseok Yang

    Abstract: Driver motion recognition is a principal factor in ensuring the safety of driving systems. This paper presents a novel system for learning and predicting driver motions and an event-based high-resolution (1280x720) dataset, N-DriverMotion, newly collected to train on a neuromorphic vision system. The system comprises an event-based camera that generates the first high-resolution driver motion data… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

    MSC Class: 68T45 ACM Class: I.4.8; I.4.9

  33. arXiv:2408.12822  [pdf, other

    cs.RO eess.SY

    Courteous MPC for Autonomous Driving with CBF-inspired Risk Assessment

    Authors: Yanze Zhang, Yiwei Lyu, Sude E. Demir, Xingyu Zhou, Yupeng Yang, Junmin Wang, Wenhao Luo

    Abstract: With more autonomous vehicles (AVs) sharing roadways with human-driven vehicles (HVs), ensuring safe and courteous maneuvers that respect HVs' behavior becomes increasingly important. To promote both safety and courtesy in AV's behavior, an extension of Control Barrier Functions (CBFs)-inspired risk evaluation framework is proposed in this paper by considering both noisy observed positions and vel… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 7 pages, accepted to ITSC 2024

  34. arXiv:2408.12664  [pdf, other

    cs.AI q-bio.NC

    Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

    Authors: Zhonghao He, Jascha Achterberg, Katie Collins, Kevin Nejad, Danyal Akarca, Yinzhu Yang, Wes Gurnee, Ilia Sucholutsky, Yuhan Tang, Rebeca Ianov, George Ogden, Chole Li, Kai Sandbrink, Stephen Casper, Anna Ivanova, Grace W. Lindsay

    Abstract: As deep learning systems are scaled up to many billions of parameters, relating their internal structure to external behaviors becomes very challenging. Although daunting, this problem is not new: Neuroscientists and cognitive scientists have accumulated decades of experience analyzing a particularly complex system - the brain. In this work, we argue that interpreting both biological and artificia… ▽ More

    Submitted 25 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  35. arXiv:2408.12605  [pdf

    eess.IV cs.AI cs.CV

    Convolutional Neural Networks for Predictive Modeling of Lung Disease

    Authors: Yingbin Liang, Xiqing Liu, Haohao Xia, Yiru Cang, Zitao Zheng, Yuanfang Yang

    Abstract: In this paper, Pro-HRnet-CNN, an innovative model combining HRNet and void-convolution techniques, is proposed for disease prediction under lung imaging. Through the experimental comparison on the authoritative LIDC-IDRI dataset, we found that compared with the traditional ResNet-50, Pro-HRnet-CNN showed better performance in the feature extraction and recognition of small-size nodules, significan… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 7 pages

  36. arXiv:2408.12483  [pdf, other

    cs.CV cs.AI

    Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation

    Authors: Shaobo Wang, Yantai Yang, Qilong Wang, Kaixin Li, Linfeng Zhang, Junchi Yan

    Abstract: Dataset Distillation (DD) aims to synthesize a small dataset capable of performing comparably to the original dataset. Despite the success of numerous DD methods, theoretical exploration of this area remains unaddressed. In this paper, we take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty. We begin by empirically examining sample… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  37. arXiv:2408.12130  [pdf, other

    cs.AI

    S-EPOA: Overcoming the Indivisibility of Annotations with Skill-Driven Preference-Based Reinforcement Learning

    Authors: Ni Mu, Yao Luan, Yiqin Yang, Qing-shan Jia

    Abstract: Preference-based reinforcement learning (PbRL) stands out by utilizing human preferences as a direct reward signal, eliminating the need for intricate reward engineering. However, despite its potential, traditional PbRL methods are often constrained by the indivisibility of annotations, which impedes the learning process. In this paper, we introduce a groundbreaking approach, Skill-Enhanced Prefer… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Submitted to AAAI 02025

  38. arXiv:2408.12086  [pdf, other

    cs.CV cs.AI

    Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy

    Authors: Hong Zhang, Yixuan Lyu, Qian Yu, Hanyang Liu, Huimin Ma, Ding Yuan, Yifan Yang

    Abstract: In the domain of Camouflaged Object Segmentation (COS), despite continuous improvements in segmentation performance, the underlying mechanisms of effective camouflage remain poorly understood, akin to a black box. To address this gap, we present the first comprehensive study to examine the impact of camouflage attributes on the effectiveness of camouflage patterns, offering a quantitative framewor… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  39. arXiv:2408.11878  [pdf, other

    cs.CL cs.CE q-fin.CP

    Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

    Authors: Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu , et al. (14 additional authors not shown)

    Abstract: Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, table… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 33 pages, 13 figures

  40. arXiv:2408.11491  [pdf, other

    cs.AI

    Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering

    Authors: Zouying Cao, Yifei Yang, Hai Zhao

    Abstract: Safety alignment is indispensable for Large language models (LLMs) to defend threats from malicious instructions. However, recent researches reveal safety-aligned LLMs prone to reject benign queries due to the exaggerated safety issue, limiting their helpfulness. In this paper, we propose a Safety-Conscious Activation Steering (SCANS) method to mitigate the exaggerated safety concerns in aligned L… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  41. arXiv:2408.11405  [pdf, other

    cs.SD eess.AS

    DDSP Guitar Amp: Interpretable Guitar Amplifier Modeling

    Authors: Yen-Tung Yeh, Yu-Hua Chen, Yuan-Chiao Cheng, Jui-Te Wu, Jun-Jie Fu, Yi-Fan Yeh, Yi-Hsuan Yang

    Abstract: Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called ``DDSP guitar amp,'' that models the four components of a guitar amp (i.e., preamp, tone stack… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Preprint paper

  42. arXiv:2408.10588  [pdf, other

    cs.CV cs.GR

    DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

    Authors: Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang

    Abstract: Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method lea… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  43. arXiv:2408.10285  [pdf, other

    cs.LG cs.AI cs.CE

    BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction

    Authors: Yifei Yang, Runhan Shi, Zuchao Li, Shu Jiang, Bao-Liang Lu, Yang Yang, Hai Zhao

    Abstract: Retrosynthesis analysis is pivotal yet challenging in drug discovery and organic chemistry. Despite the proliferation of computational tools over the past decade, AI-based systems often fall short in generalizing across diverse reaction types and exploring alternative synthetic pathways. This paper presents BatGPT-Chem, a large language model with 15 billion parameters, tailored for enhanced retro… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  44. arXiv:2408.09943  [pdf

    cs.CR

    Calibrating Noise for Group Privacy in Subsampled Mechanisms

    Authors: Yangfan Jiang, Xinjian Luo, Yin Yang, Xiaokui Xiao

    Abstract: Given a group size m and a sensitive dataset D, group privacy (GP) releases information about D with the guarantee that the adversary cannot infer with high confidence whether the underlying data is D or a neighboring dataset D' that differs from D by m records. GP generalizes the well-established notion of differential privacy (DP) for protecting individuals' privacy; in particular, when m=1, GP… ▽ More

    Submitted 24 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: accepted for publication in Proceedings of VLDB Endowment (PVLDB) 2025

  45. arXiv:2408.09851  [pdf, other

    cs.NI eess.SY

    ISAC-Fi: Enabling Full-fledged Monostatic Sensing over Wi-Fi Communication

    Authors: Zhe Chen, Chao Hu, Tianyue Zheng, Hangcheng Cao, Yanbing Yang, Yen Chu, Hongbo Jiang, Jun Luo

    Abstract: Whereas Wi-Fi communications have been exploited for sensing purpose for over a decade, the bistatic or multistatic nature of Wi-Fi still poses multiple challenges, hampering real-life deployment of integrated sensing and communication (ISAC) within Wi-Fi framework. In this paper, we aim to re-design WiFi so that monostatic sensing (mimicking radar) can be achieved over the multistatic communicati… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 14 pages, 22 figures

  46. arXiv:2408.09768  [pdf, other

    cs.AI

    MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal Malfunctions

    Authors: Qinchen Yang, Zejun Xie, Hua Wei, Desheng Zhang, Yu Yang

    Abstract: Urban traffic is subject to disruptions that cause extended waiting time and safety issues at signalized intersections. While numerous studies have addressed the issue of intelligent traffic systems in the context of various disturbances, traffic signal malfunction, a common real-world occurrence with significant repercussions, has received comparatively limited attention. The primary objective of… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: Paper accepted to CIKM24 Full Research track

  47. arXiv:2408.09706  [pdf, other

    cs.CV

    MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model

    Authors: Xinyang Wang, Yi Yang, Minfeng Zhu, Kecheng Zheng, Shi Liu, Wei Chen

    Abstract: Recent advancements in pre-trained Vision-Language Models (VLMs) have highlighted the significant potential of prompt tuning for adapting these models to a wide range of downstream tasks. However, existing prompt tuning methods typically map an image to a single representation, limiting the model's ability to capture the diverse ways an image can be described. To address this limitation, we invest… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  48. arXiv:2408.09588  [pdf, other

    cs.AI

    SynTraC: A Synthetic Dataset for Traffic Signal Control from Traffic Monitoring Cameras

    Authors: Tiejin Chen, Prithvi Shirke, Bharatesh Chakravarthi, Arpitsinh Vaghela, Longchao Da, Duo Lu, Yezhou Yang, Hua Wei

    Abstract: This paper introduces SynTraC, the first public image-based traffic signal control dataset, aimed at bridging the gap between simulated environments and real-world traffic management challenges. Unlike traditional datasets for traffic signal control which aim to provide simplified feature vectors like vehicle counts from traffic simulators, SynTraC provides real-style images from the CARLA simulat… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE ITSC2024

  49. arXiv:2408.09172   

    cs.AI cs.CL

    Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection

    Authors: Hsiu-Yuan Huang, Zichen Wu, Yutong Yang, Junzhao Zhang, Yunfang Wu

    Abstract: Nowadays, Large Language Models (LLMs) have demonstrated exceptional performance across various downstream tasks. However, it is challenging for users to discern whether the responses are generated with certainty or are fabricated to meet user expectations. Estimating the uncertainty of LLMs is particularly challenging due to their vast scale and the lack of white-box access. In this work, we prop… ▽ More

    Submitted 24 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Comments: The model diagram in Figure 1 on page 3 of the paper has significant ambiguities. It may lead readers to mistakenly believe that the experiments were conducted in a multi-turn dialogue format. Therefore, we request the withdrawal of this submission

  50. arXiv:2408.08978  [pdf, other

    cs.CL

    See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

    Authors: Yulong Chen, Yang Liu, Jianhao Yan, Xuefeng Bai, Ming Zhong, Yinghao Yang, Ziyi Yang, Chenguang Zhu, Yue Zhang

    Abstract: The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are becoming increasingly important. In this paper, we investigate the question of whether an LLM can discover its own limitations from the errors it makes. To this en… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: COLM 2024