Skip to main content

Showing 1–50 of 4,054 results for author: Wang, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03087  [pdf, other

    eess.IV cs.CV

    Coupling AI and Citizen Science in Creation of Enhanced Training Dataset for Medical Image Segmentation

    Authors: Amir Syahmi, Xiangrong Lu, Yinxuan Li, Haoxuan Yao, Hanjun Jiang, Ishita Acharya, Shiyi Wang, Yang Nan, Xiaodan Xing, Guang Yang

    Abstract: Recent advancements in medical imaging and artificial intelligence (AI) have greatly enhanced diagnostic capabilities, but the development of effective deep learning (DL) models is still constrained by the lack of high-quality annotated datasets. The traditional manual annotation process by medical experts is time- and resource-intensive, limiting the scalability of these datasets. In this work, w… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02877  [pdf, other

    cs.AI cs.CL cs.LG

    Configurable Foundation Models: Building LLMs from a Modular Perspective

    Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

    Abstract: Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.01995  [pdf, other

    eess.AS cs.AI cs.SD

    vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

    Authors: Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang, Shuai Wang, Xie Chen, Kai Yu

    Abstract: We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice conversion (VC). We use discrete tokens from speech self-supervised models as the content features of source speech, and treat VC as a prompted vocoding task. To amend the loss of speaker timbre in the content tokens, vec2wav 2.0 utilizes the WavLM features to provide strong timbre-dependent information. A novel adap… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures

  4. arXiv:2409.01491  [pdf, other

    cs.CV cs.AI

    EarthGen: Generating the World from Top-Down Views

    Authors: Ansh Sharma, Albert Xiao, Praneet Rathi, Rohit Kundu, Albert Zhai, Yuan Shen, Shenlong Wang

    Abstract: In this work, we present a novel method for extensive multi-scale generative terrain modeling. At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions. Pairing this concept with a tiled generation method yields a scalable system that can generate thousands of square kilometers of realistic Earth surface… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    ACM Class: J.2; I.4.8

  5. arXiv:2409.01420  [pdf, other

    cs.LG

    Erasure Coded Neural Network Inference via Fisher Averaging

    Authors: Divyansh Jhunjhunwala, Neharika Jali, Gauri Joshi, Shiqiang Wang

    Abstract: Erasure-coded computing has been successfully used in cloud systems to reduce tail latency caused by factors such as straggling servers and heterogeneous traffic variations. A majority of cloud computing traffic now consists of inference on neural networks on shared resources where the response time of inference queries is also adversely affected by the same factors. However, current erasure codin… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted to ISIT 2024

  6. ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

    Authors: Luoyu Mei, Shuai Wang, Yun Cheng, Ruofeng Liu, Zhimeng Yin, Wenchao Jiang, Shuai Wang, Wei Gong

    Abstract: Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Journal ref: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024

  7. arXiv:2409.00923  [pdf, other

    cs.RO cs.AI

    Development of Occupancy Prediction Algorithm for Underground Parking Lots

    Authors: Shijie Wang

    Abstract: The core objective of this study is to address the perception challenges faced by autonomous driving in adverse environments like basements. Initially, this paper commences with data collection in an underground garage. A simulated underground garage model is established within the CARLA simulation environment, and SemanticKITTI format occupancy ground truth data is collected in this simulated set… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  8. arXiv:2409.00920  [pdf, other

    cs.LG cs.AI cs.CL

    ToolACE: Winning the Points of LLM Function Calling

    Authors: Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian , et al. (2 additional authors not shown)

    Abstract: Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic ag… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 21 pages, 22 figures

  9. arXiv:2409.00664  [pdf, other

    q-bio.NC cs.LG

    Video-based Analysis Reveals Atypical Social Gaze in People with Autism Spectrum Disorder

    Authors: Xiangxu Yu, Mindi Ruan, Chuanbo Hu, Wenqi Li, Lynn K. Paul, Xin Li, Shuo Wang

    Abstract: In this study, we present a quantitative and comprehensive analysis of social gaze in people with autism spectrum disorder (ASD). Diverging from traditional first-person camera perspectives based on eye-tracking technologies, this study utilizes a third-person perspective database from the Autism Diagnostic Observation Schedule, 2nd Edition (ADOS-2) interview videos, encompassing ASD participants… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  10. arXiv:2409.00543  [pdf, other

    cs.CV cs.CL cs.LG eess.IV

    How Does Diverse Interpretability of Textual Prompts Impact Medical Vision-Language Zero-Shot Tasks?

    Authors: Sicheng Wang, Che Liu, Rossella Arcucci

    Abstract: Recent advancements in medical vision-language pre-training (MedVLP) have significantly enhanced zero-shot medical vision tasks such as image classification by leveraging large-scale medical image-text pair pre-training. However, the performance of these tasks can be heavily influenced by the variability in textual prompts describing the categories, necessitating robustness in MedVLP models to div… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  11. arXiv:2409.00509  [pdf, other

    cs.CL

    LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models

    Authors: Zhiyuan Hu, Yuliang Liu, Jinman Zhao, Suyuchen Wang, Yan Wang, Wei Shen, Qing Gu, Anh Tuan Luu, See-Kiong Ng, Zhiwei Jiang, Bryan Hooi

    Abstract: Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training s… ▽ More

    Submitted 4 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Work in Progress

  12. arXiv:2409.00489  [pdf

    cs.CV cs.AI

    Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability

    Authors: Chia-Yu Hsu, Wenwen Li, Sizhe Wang

    Abstract: Research on geospatial foundation models (GFMs) has become a trending topic in geospatial artificial intelligence (AI) research due to their potential for achieving high generalizability and domain adaptability, reducing model training costs for individual researchers. Unlike large language models, such as ChatGPT, constructing visual foundation models for image analysis, particularly in remote se… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  13. arXiv:2409.00017  [pdf, other

    cs.HC

    Could Micro-Expressions be Quantified? Electromyography Gives Affirmative Evidence

    Authors: Jingting Li, Shaoyuan Lu, Yan Wang, Zizhao Dong, Su-Jing Wang, Xiaolan Fu

    Abstract: Micro-expressions (MEs) are brief, subtle facial expressions that reveal concealed emotions, offering key behavioral cues for social interaction. Characterized by short duration, low intensity, and spontaneity, MEs have been mostly studied through subjective coding, lacking objective, quantitative indicators. This paper explores ME characteristics using facial electromyography (EMG), analyzing dat… ▽ More

    Submitted 16 August, 2024; originally announced September 2024.

  14. arXiv:2408.17135  [pdf, other

    cs.CV

    Temporal and Interactive Modeling for Efficient Human-Human Motion Generation

    Authors: Yabiao Wang, Shuo Wang, Jiangning Zhang, Ke Fan, Jiafu Wu, Zhengkai Jiang, Yong Liu

    Abstract: Human-human motion generation is essential for understanding humans as social beings. Although several transformer-based methods have been proposed, they typically model each individual separately and overlook the causal relationships in temporal motion sequences. Furthermore, the attention mechanism in transformers exhibits quadratic computational complexity, significantly reducing their efficien… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Homepage: https://aigc-explorer.github.io/TIM-page/

  15. arXiv:2408.16756  [pdf, other

    cs.CL

    How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models

    Authors: Jiyue Jiang, Liheng Chen, Pengan Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

    Abstract: The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  16. arXiv:2408.16706  [pdf, other

    cs.PL cs.SE

    Incremental Context-free Grammar Inference in Black Box Settings

    Authors: Feifei Li, Xiao Chen, Xi Xiao, Xiaoyu Sun, Chuan Chen, Shaohua Wang, Jitao Han

    Abstract: Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize grammar rules, initiating from flat parse trees and exploring diverse generalization sequences. We have observed that these approaches suffer from low quality and… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  17. arXiv:2408.16659  [pdf, other

    physics.med-ph cs.GR

    Motion-Driven Neural Optimizer for Prophylactic Braces Made by Distributed Microstructures

    Authors: Xingjian Han, Yu Jiang, Weiming Wang, Guoxin Fang, Simeon Gill, Zhiqiang Zhang, Shengfa Wang, Jun Saito, Deepak Kumar, Zhongxuan Luo, Emily Whiting, Charlie C. L. Wang

    Abstract: Joint injuries, and their long-term consequences, present a substantial global health burden. Wearable prophylactic braces are an attractive potential solution to reduce the incidence of joint injuries by limiting joint movements that are related to injury risk. Given human motion and ground reaction forces, we present a computational framework that enables the design of personalized braces by opt… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  18. arXiv:2408.16530  [pdf, other

    cs.CV

    A Comprehensive Review of 3D Object Detection in Autonomous Driving: Technological Advances and Future Directions

    Authors: Yu Wang, Shaohua Wang, Yicheng Li, Mingchun Liu

    Abstract: In recent years, 3D object perception has become a crucial component in the development of autonomous driving systems, providing essential environmental awareness. However, as perception tasks in autonomous driving evolve, their variants have increased, leading to diverse insights from industry and academia. Currently, there is a lack of comprehensive surveys that collect and summarize these perce… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  19. CanCal: Towards Real-time and Lightweight Ransomware Detection and Response in Industrial Environments

    Authors: Shenao Wang, Feng Dong, Hangfeng Yang, Jingheng Xu, Haoyu Wang

    Abstract: Ransomware attacks have emerged as one of the most significant cybersecurity threats. Despite numerous proposed detection and defense methods, existing approaches face two fundamental limitations in large-scale industrial applications: intolerable system overheads and notorious alert fatigue. To address these challenges, we propose CanCal, a real-time and lightweight ransomware detection system. S… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: To appear in the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS'24), October 14--18, 2024, Salt Lake City

  20. arXiv:2408.16268  [pdf, other

    cs.CV

    UDD: Dataset Distillation via Mining Underutilized Regions

    Authors: Shiguang Wang, Zhongyu Zhang, Jian Cheng

    Abstract: Dataset distillation synthesizes a small dataset such that a model trained on this set approximates the performance of the original dataset. Recent studies on dataset distillation focused primarily on the design of the optimization process, with methods such as gradient matching, feature alignment, and training trajectory matching. However, little attention has been given to the issue of underutil… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: PRCV2024

  21. arXiv:2408.16237  [pdf, other

    cs.DB

    MQRLD: A Multimodal Data Retrieval Platform with Query-aware Feature Representation and Learned Index Based on Data Lake

    Authors: Ming Sheng, Shuliang Wang, Yong Zhang, Kaige Wang, Jingyi Wang, Yi Luo, Rui Hao

    Abstract: Multimodal data has become a crucial element in the realm of big data analytics, driving advancements in data exploration, data mining, and empowering artificial intelligence applications. To support high-quality retrieval for these cutting-edge applications, a robust data retrieval platform should meet the requirements for transparent data storage, rich hybrid queries, effective feature represent… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 36 pages, 28 figures

  22. arXiv:2408.16233  [pdf, other

    cs.CV

    PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

    Authors: Shiguang Wang, Tao Xie, Haijun Liu, Xingcheng Zhang, Jian Cheng

    Abstract: Channel Pruning is one of the most widespread techniques used to compress deep neural networks while maintaining their performances. Currently, a typical pruning algorithm leverages neural architecture search to directly find networks with a configurable width, the key step of which is to identify representative subnet for various pruning ratios by training a supernet. However, current methods mai… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 10pages, Neural Networks

  23. arXiv:2408.15998  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

    Authors: Min Shi, Fuxiao Liu, Shihao Wang, Shijia Liao, Subhashree Radhakrishnan, De-An Huang, Hongxu Yin, Karan Sapra, Yaser Yacoob, Humphrey Shi, Bryan Catanzaro, Andrew Tao, Jan Kautz, Zhiding Yu, Guilin Liu

    Abstract: The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vis… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Github: https://github.com/NVlabs/Eagle, HuggingFace: https://huggingface.co/NVEagle

  24. arXiv:2408.15585  [pdf, other

    cs.SD eess.AS

    Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models

    Authors: Yiyang Zhao, Shuai Wang, Guangzhi Sun, Zehua Chen, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

    Abstract: In this paper, Whisper, a large-scale pre-trained model for automatic speech recognition, is proposed to apply to speaker verification. A partial multi-scale feature aggregation (PMFA) approach is proposed based on a subset of Whisper encoder blocks to derive highly discriminative speaker embeddings.Experimental results demonstrate that using the middle to later blocks of the Whisper encoder keeps… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by Interspeech 2024

  25. arXiv:2408.15474  [pdf, other

    eess.AS cs.SD

    Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation

    Authors: Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng Pan, Jie Ding, Lei Xie

    Abstract: Rap, a prominent genre of vocal performance, remains underexplored in vocal generation. General vocal synthesis depends on precise note and duration inputs, requiring users to have related musical knowledge, which limits flexibility. In contrast, rap typically features simpler melodies, with a core focus on a strong rhythmic sense that harmonizes with accompanying beats. In this paper, we propose… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  26. arXiv:2408.15069  [pdf

    cs.CV eess.IV physics.ins-det

    Geometric Artifact Correction for Symmetric Multi-Linear Trajectory CT: Theory, Method, and Generalization

    Authors: Zhisheng Wang, Yanxu Sun, Shangyu Li, Legeng Lin, Shunli Wang, Junning Cui

    Abstract: For extending CT field-of-view to perform non-destructive testing, the Symmetric Multi-Linear trajectory Computed Tomography (SMLCT) has been developed as a successful example of non-standard CT scanning modes. However, inevitable geometric errors can cause severe artifacts in the reconstructed images. The existing calibration method for SMLCT is both crude and inefficient. It involves reconstruct… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 15 pages, 10 figures

    MSC Class: 68U10 (Primary) 68V99; 68Q30(Secondary)

  27. arXiv:2408.14734  [pdf

    cs.LG math-ph math.NA

    General-Kindred Physics-Informed Neural Network to the Solutions of Singularly Perturbed Differential Equations

    Authors: Sen Wang, Peizhi Zhao, Qinglong Ma, Tao Song

    Abstract: Physics-Informed Neural Networks (PINNs) have become a promising research direction in the field of solving Partial Differential Equations (PDEs). Dealing with singular perturbation problems continues to be a difficult challenge in the field of PINN. The solution of singular perturbation problems often exhibits sharp boundary layers and steep gradients, and traditional PINN cannot achieve approxim… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  28. arXiv:2408.14689  [pdf, other

    cs.IR

    Federated User Preference Modeling for Privacy-Preserving Cross-Domain Recommendation

    Authors: Li Wang, Shoujin Wang, Quangui Zhang, Qiang Wu, Min Xu

    Abstract: Cross-domain recommendation (CDR) aims to address the data-sparsity problem by transferring knowledge across domains. Existing CDR methods generally assume that the user-item interaction data is shareable between domains, which leads to privacy leakage. Recently, some privacy-preserving CDR (PPCDR) models have been proposed to solve this problem. However, they primarily transfer simple representat… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  29. arXiv:2408.14453  [pdf

    cs.LG eess.IV eess.SP

    Reconstructing physiological signals from fMRI across the adult lifespan

    Authors: Shiyu Wang, Ziyuan Xu, Yamin Li, Mara Mather, Roza G. Bayrak, Catie Chang

    Abstract: Interactions between the brain and body are of fundamental importance for human behavior and health. Functional magnetic resonance imaging (fMRI) captures whole-brain activity noninvasively, and modeling how fMRI signals interact with physiological dynamics of the body can provide new insight into brain function and offer potential biomarkers of disease. However, physiological recordings are not a… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  30. arXiv:2408.14342  [pdf, other

    cs.CV physics.med-ph

    Dual-Domain CLIP-Assisted Residual Optimization Perception Model for Metal Artifact Reduction

    Authors: Xinrui Zhang, Ailong Cai, Shaoyu Wang, Linyuan Wang, Zhizhong Zheng, Lei Li, Bin Yan

    Abstract: Metal artifacts in computed tomography (CT) imaging pose significant challenges to accurate clinical diagnosis. The presence of high-density metallic implants results in artifacts that deteriorate image quality, manifesting in the forms of streaking, blurring, or beam hardening effects, etc. Nowadays, various deep learning-based approaches, particularly generative models, have been proposed for me… ▽ More

    Submitted 29 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 14 pages, 18 figures

  31. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  32. arXiv:2408.13735  [pdf, other

    cs.CV

    MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation

    Authors: Chaowei Chen, Li Yu, Shiquan Min, Shunfang Wang

    Abstract: State Space Models (SSMs), especially Mamba, have shown great promise in medical image segmentation due to their ability to model long-range dependencies with linear computational complexity. However, accurate medical image segmentation requires the effective learning of both multi-scale detailed feature representations and global contextual dependencies. Although existing works have attempted to… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  33. arXiv:2408.13654  [pdf, other

    cs.CL

    Symbolic Working Memory Enhances Language Models for Complex Rule Application

    Authors: Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren

    Abstract: Large Language Models (LLMs) have shown remarkable reasoning performance but struggle with multi-step deductive reasoning involving a series of rule application steps, especially when rules are presented non-sequentially. Our preliminary analysis shows that while LLMs excel in single-step rule application, their performance drops significantly in multi-step scenarios due to the challenge in rule g… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  34. arXiv:2408.13580  [pdf, ps, other

    econ.TH cs.GT math.OC

    Semi-Separable Mechanisms in Multi-Item Robust Screening

    Authors: Shixin Wang

    Abstract: It is generally challenging to characterize the optimal selling mechanism even when the seller knows the buyer's valuation distributions in multi-item screening. An insightful and significant result in robust mechanism design literature is that if the seller knows only marginal distributions of the buyer's valuation, then separable mechanisms, in which all items are sold independently, are robustl… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  35. arXiv:2408.13357  [pdf, other

    cs.IR

    SEQ+MD: Learning Multi-Task as a SEQuence with Multi-Distribution Data

    Authors: Siqi Wang, Audrey Zhijiao Chen, Austin Clapp, Sheng-Min Shih, Xiaoting Zhao

    Abstract: In e-commerce, the order in which search results are displayed when a customer tries to find relevant listings can significantly impact their shopping experience and search efficiency. Tailored re-ranking system based on relevance and engagement signals in E-commerce has often shown improvement on sales and gross merchandise value (GMV). Designing algorithms for this purpose is even more challengi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  36. arXiv:2408.13338  [pdf, other

    cs.HC cs.AI cs.CL

    LalaEval: A Holistic Human Evaluation Framework for Domain-Specific Large Language Models

    Authors: Chongyan Sun, Ken Lin, Shiwei Wang, Hulong Wu, Chengfei Fu, Zhen Wang

    Abstract: This paper introduces LalaEval, a holistic framework designed for the human evaluation of domain-specific large language models (LLMs). LalaEval proposes a comprehensive suite of end-to-end protocols that cover five main components including domain specification, criteria establishment, benchmark dataset creation, construction of evaluation rubrics, and thorough analysis and interpretation of eval… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  37. arXiv:2408.13290  [pdf, ps, other

    eess.IV cs.CV

    Multi-modal Intermediate Feature Interaction AutoEncoder for Overall Survival Prediction of Esophageal Squamous Cell Cancer

    Authors: Chengyu Wu, Yatao Zhang, Yaqi Wang, Qifeng Wang, Shuai Wang

    Abstract: Survival prediction for esophageal squamous cell cancer (ESCC) is crucial for doctors to assess a patient's condition and tailor treatment plans. The application and development of multi-modal deep learning in this field have attracted attention in recent years. However, the prognostically relevant features between cross-modalities have not been further explored in previous studies, which could hi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by ISBI 2024

  38. arXiv:2408.12852  [pdf, other

    cs.IR

    Structural Representation Learning and Disentanglement for Evidential Chinese Patent Approval Prediction

    Authors: Jinzhi Shan, Qi Zhang, Chongyang Shi, Mengting Gui, Shoujin Wang, Usman Naseem

    Abstract: Automatic Chinese patent approval prediction is an emerging and valuable task in patent analysis. However, it involves a rigorous and transparent decision-making process that includes patent comparison and examination to assess its innovation and correctness. This resultant necessity of decision evidentiality, coupled with intricate patent comprehension presents significant challenges and obstacle… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: CIKM 2024, 10 Pages

  39. arXiv:2408.12757  [pdf, other

    cs.DC

    NanoFlow: Towards Optimal Large Language Model Serving Throughput

    Authors: Kan Zhu, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu, Dedong Xie, Yufei Gao, Qinyu Xu, Tian Tang, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci

    Abstract: The increasing usage of Large Language Models (LLMs) has resulted in a surging demand for planet-scale serving systems, where tens of thousands of GPUs continuously serve hundreds of millions of users. Consequently, throughput (under reasonable latency constraints) has emerged as a key metric that determines serving systems' performance. To boost throughput, various methods of inter-device paralle… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  40. arXiv:2408.12483  [pdf, other

    cs.CV cs.AI

    Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation

    Authors: Shaobo Wang, Yantai Yang, Qilong Wang, Kaixin Li, Linfeng Zhang, Junchi Yan

    Abstract: Dataset Distillation (DD) aims to synthesize a small dataset capable of performing comparably to the original dataset. Despite the success of numerous DD methods, theoretical exploration of this area remains unaddressed. In this paper, we take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty. We begin by empirically examining sample… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  41. arXiv:2408.12316  [pdf, other

    cs.CV

    Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement

    Authors: Lingyu Zhu, Wenhan Yang, Baoliang Chen, Hanwei Zhu, Zhangkai Ni, Qi Mao, Shiqi Wang

    Abstract: Obtaining pairs of low/normal-light videos, with motions, is more challenging than still images, which raises technical issues and poses the technical route of unpaired learning as a critical role. This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth. Compared to low-light image enhancement, enhancing low-light videos is more dif… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  42. arXiv:2408.12300  [pdf, other

    cs.LG

    Tackling Data Heterogeneity in Federated Learning via Loss Decomposition

    Authors: Shuang Zeng, Pengxin Guo, Shuai Wang, Jianbo Wang, Yuyin Zhou, Liangqiong Qu

    Abstract: Federated Learning (FL) is a rising approach towards collaborative and privacy-preserving machine learning where large-scale medical datasets remain localized to each client. However, the issue of data heterogeneity among clients often compels local models to diverge, leading to suboptimal global models. To mitigate the impact of data heterogeneity on FL performance, we start with analyzing how FL… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted at MICCAI 2024

  43. arXiv:2408.12007  [pdf, other

    cs.LG cs.AI stat.ML

    QuaCK-TSF: Quantum-Classical Kernelized Time Series Forecasting

    Authors: Abdallah Aaraba, Soumaya Cherkaoui, Ola Ahmad, Jean-Frédéric Laprade, Olivier Nahman-Lévesque, Alexis Vieloszynski, Shengrui Wang

    Abstract: Forecasting in probabilistic time series is a complex endeavor that extends beyond predicting future values to also quantifying the uncertainty inherent in these predictions. Gaussian process regression stands out as a Bayesian machine learning technique adept at addressing this multifaceted challenge. This paper introduces a novel approach that blends the robustness of this Bayesian technique wit… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 12 pages, 15 figures, to be published in IEEE Quantum Week 2024's conference proceeding

  44. arXiv:2408.11987  [pdf, other

    cs.AI

    SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins

    Authors: Jingquan Wang, Harry Zhang, Huzaifa Mustafa Unjhawala, Peter Negrut, Shu Wang, Khailanii Slaton, Radu Serban, Jin-Long Wu, Dan Negrut

    Abstract: We introduce SimBench, a benchmark designed to evaluate the proficiency of student large language models (S-LLMs) in generating digital twins (DTs) that can be used in simulators for virtual testing. Given a collection of S-LLMs, this benchmark enables the ranking of the S-LLMs based on their ability to produce high-quality DTs. We demonstrate this by comparing over 20 open- and closed-source S-LL… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  45. arXiv:2408.11304  [pdf, other

    cs.LG

    FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts

    Authors: Hanzi Mei, Dongqi Cai, Ao Zhou, Shangguang Wang, Mengwei Xu

    Abstract: As Large Language Models (LLMs) push the boundaries of AI capabilities, their demand for data is growing. Much of this data is private and distributed across edge devices, making Federated Learning (FL) a de-facto alternative for fine-tuning (i.e., FedLLM). However, it faces significant challenges due to the inherent heterogeneity among clients, including varying data distributions and diverse tas… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  46. arXiv:2408.11278  [pdf, other

    cs.CV

    The Key of Parameter Skew in Federated Learning

    Authors: Sifan Wang, Junfeng Liao, Ye Yuan, Riquan Zhang

    Abstract: Federated Learning (FL) has emerged as an excellent solution for performing deep learning on different data owners without exchanging raw data. However, statistical heterogeneity in FL presents a key challenge, leading to a phenomenon of skewness in local model parameter distributions that researchers have largely overlooked. In this work, we propose the concept of parameter skew to describe the p… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  47. arXiv:2408.11227  [pdf

    eess.IV cs.AI cs.CV

    OCTCube: A 3D foundation model for optical coherence tomography that improves cross-dataset, cross-disease, cross-device and cross-modality analysis

    Authors: Zixuan Liu, Hanwen Xu, Addie Woicik, Linda G. Shapiro, Marian Blazes, Yue Wu, Cecilia S. Lee, Aaron Y. Lee, Sheng Wang

    Abstract: Optical coherence tomography (OCT) has become critical for diagnosing retinal diseases as it enables 3D images of the retina and optic nerve. OCT acquisition is fast, non-invasive, affordable, and scalable. Due to its broad applicability, massive numbers of OCT images have been accumulated in routine exams, making it possible to train large-scale foundation models that can generalize to various di… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  48. arXiv:2408.11198  [pdf, other

    cs.SE cs.AI cs.NE

    EPiC: Cost-effective Search-based Prompt Engineering of LLMs for Code Generation

    Authors: Hamed Taherkhani, Melika Sepindband, Hung Viet Pham, Song Wang, Hadi Hemmati

    Abstract: Large Language Models (LLMs) have seen increasing use in various software development tasks, especially in code generation. The most advanced recent methods attempt to incorporate feedback from code execution into prompts to help guide LLMs in generating correct code, in an iterative process. While effective, these methods could be costly and time-consuming due to numerous interactions with the LL… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Submitted to TSE

  49. arXiv:2408.10673  [pdf, other

    cs.CR

    Iterative Window Mean Filter: Thwarting Diffusion-based Adversarial Purification

    Authors: Hanrui Wang, Ruoxi Sun, Cunjian Chen, Minhui Xue, Lay-Ki Soon, Shuo Wang, Zhe Jin

    Abstract: Face authentication systems have brought significant convenience and advanced developments, yet they have become unreliable due to their sensitivity to inconspicuous perturbations, such as adversarial attacks. Existing defenses often exhibit weaknesses when facing various attack algorithms and adaptive attacks or compromise accuracy for enhanced security. To address these challenges, we have devel… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Under review

  50. arXiv:2408.10487  [pdf, other

    cs.CV cs.AI

    MambaEVT: Event Stream based Visual Object Tracking using State Space Model

    Authors: Xiao Wang, Chao wang, Shiao Wang, Xixi Wang, Zhicheng Zhao, Lin Zhu, Bo Jiang

    Abstract: Event camera-based visual tracking has drawn more and more attention in recent years due to the unique imaging principle and advantages of low energy consumption, high dynamic range, and dense temporal resolution. Current event-based tracking algorithms are gradually hitting their performance bottlenecks, due to the utilization of vision Transformer and the static template for target object locali… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: In Peer Review