Skip to main content

Showing 1–50 of 1,720 results for author: Wang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15971  [pdf, other

    cs.CL

    BattleAgentBench: A Benchmark for Evaluating Cooperation and Competition Capabilities of Language Models in Multi-Agent Systems

    Authors: Wei Wang, Dan Zhang, Tao Feng, Boyan Wang, Jie Tang

    Abstract: Large Language Models (LLMs) are becoming increasingly powerful and capable of handling complex tasks, e.g., building single agents and multi-agent systems. Compared to single agents, multi-agent systems have higher requirements for the collaboration capabilities of language models. Many benchmarks are proposed to evaluate their collaborative abilities. However, these benchmarks lack fine-grained… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.15741  [pdf, other

    cs.CV

    Segmentation-guided Layer-wise Image Vectorization with Gradient Fills

    Authors: Hengyu Zhou, Hui Zhang, Bin Wang

    Abstract: The widespread use of vector graphics creates a significant demand for vectorization methods. While recent learning-based techniques have shown their capability to create vector images of clear topology, filling these primitives with gradients remains a challenge. In this paper, we propose a segmentation-guided vectorization framework to convert raster images into concise vector graphics with radi… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  3. arXiv:2408.15079  [pdf, other

    cs.CL cs.AI

    BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

    Authors: Guosheng Dong, Da Pan, Yiding Sun, Shusen Zhang, Zheng Liang, Xin Wu, Yanjun Shen, Fan Yang, Haoze Sun, Tianpeng Li, Mingan Lin, Jianhua Xu, Yufan Zhang, Xiaonan Nie, Lei Su, Bingning Wang, Wentao Zhang, Jiaxin Mao, Zenan Zhou, Weipeng Chen

    Abstract: The general capabilities of Large Language Models (LLM) highly rely on the composition and selection on extensive pretraining datasets, treated as commercial secrets by several institutions. To mitigate this issue, we open-source the details of a universally applicable data processing pipeline and validate its effectiveness and potential by introducing a competitive LLM baseline. Specifically, the… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 19 pages, 6 figures

  4. arXiv:2408.14606  [pdf

    eess.IV cs.CV

    BreakNet: Discontinuity-Resilient Multi-Scale Transformer Segmentation of Retinal Layers

    Authors: Razieh Ganjee, Bingjie Wang, Lingyun Wang, Chengcheng Zhao, José-Alain Sahel, Shaohua Pi

    Abstract: Visible light optical coherence tomography (vis-OCT) is gaining traction for retinal imaging due to its high resolution and functional capabilities. However, the significant absorption of hemoglobin in the visible light range leads to pronounced shadow artifacts from retinal blood vessels, posing challenges for accurate layer segmentation. In this study, we present BreakNet, a multi-scale Transfor… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  5. arXiv:2408.13713  [pdf, other

    quant-ph cs.LG

    Verifiable cloud-based variational quantum algorithms

    Authors: Junhong Yang, Banghai Wang, Junyu Quan, Qin Li

    Abstract: Variational quantum algorithms (VQAs) have shown potential for quantum advantage with noisy intermediate-scale quantum (NISQ) devices for quantum machine learning (QML). However, given the high cost and limited availability of quantum resources, delegating VQAs via cloud networks is a more practical solution for clients with limited quantum capabilities. Recently, Shingu et al.[Physical Review A,… ▽ More

    Submitted 26 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

  6. arXiv:2408.13545  [pdf, other

    cs.CL

    IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering

    Authors: Ruosen Li, Barry Wang, Ruochen Li, Xinya Du

    Abstract: To evaluate Large Language Models (LLMs) for question answering (QA), traditional methods typically focus on directly assessing the immediate responses generated by the models based on the given question and context. In the common use case of humans seeking AI assistant's help in finding information, these non-interactive evaluations do not account for the dynamic nature of human-model conversatio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  7. arXiv:2408.13226  [pdf, other

    cs.CV

    D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching

    Authors: Jingyu Liu, Minquan Wang, Ye Ma, Bo Wang, Aozhu Chen, Quan Chen, Peng Jiang, Xirong Li

    Abstract: Videos showcasing specific products are increasingly important for E-commerce. Key moments naturally exist as the first appearance of a specific product, presentation of its distinctive features, the presence of a buying link, etc. Adding proper sound effects (SFX) to these key moments, or video decoration with SFX (VDSFX), is crucial for enhancing the user engaging experience. Previous studies ab… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures

  8. arXiv:2408.13193  [pdf, other

    cs.CG

    Critical Point Extraction from Multivariate Functional Approximation

    Authors: Guanqun Ma, David Lenz, Tom Peterka, Hanqi Guo, Bei Wang

    Abstract: Advances in high-performance computing require new ways to represent large-scale scientific data to support data storage, data transfers, and data analysis within scientific workflows. Multivariate functional approximation (MFA) has recently emerged as a new continuous meshless representation that approximates raw discrete data with a set of piecewise smooth functions. An MFA model of data thus of… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: TopoInVis 2024, 11 pages with 1-page appendix

  9. arXiv:2408.13045  [pdf, other

    cs.DS

    Adaptive complexity of log-concave sampling

    Authors: Huanjian Zhou, Baoxiang Wang, Masashi Sugiyama

    Abstract: In large-data applications, such as the inference process of diffusion models, it is desirable to design sampling algorithms with a high degree of parallelization. In this work, we study the adaptive complexity of sampling, which is the minimal number of sequential rounds required to achieve sampling given polynomially many queries executed in parallel at each round. For unconstrained sampling, we… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  10. arXiv:2408.12902  [pdf, other

    cs.AI cs.CL cs.LG

    IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities

    Authors: Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin

    Abstract: In the field of multimodal large language models (MLLMs), common methods typically involve unfreezing the language model during training to foster profound visual understanding. However, the fine-tuning of such models with vision-language data often leads to a diminution of their natural language processing (NLP) capabilities. To avoid this performance degradation, a straightforward solution is to… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  11. arXiv:2408.12680  [pdf, other

    cs.AI

    Can LLMs Understand Social Norms in Autonomous Driving Games?

    Authors: Boxuan Wang, Haonan Duan, Yanhao Feng, Xu Chen, Yongjie Fu, Zhaobin Mo, Xuan Di

    Abstract: Social norm is defined as a shared standard of acceptable behavior in a society. The emergence of social norms fosters coordination among agents without any hard-coded rules, which is crucial for the large-scale deployment of AVs in an intelligent transportation system. This paper explores the application of LLMs in understanding and modeling social norms in autonomous driving games. We introduce… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  12. arXiv:2408.12534  [pdf, other

    eess.IV cs.AI cs.CV

    Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

    Abstract: Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a lar… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024 FLARE Challenge Summary

  13. arXiv:2408.12128  [pdf, other

    cs.AI cs.CV

    Diffusion-Based Visual Art Creation: A Survey and New Perspectives

    Authors: Bingyuan Wang, Qifeng Chen, Zeyu Wang

    Abstract: The integration of generative AI in visual art has revolutionized not only how visual content is created but also how AI interacts with and reflects the underlying domain knowledge. This survey explores the emerging realm of diffusion-based visual art creation, examining its development from both artistic and technical perspectives. We structure the survey into three phases, data feature and frame… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 35 pages, 9 figures

  14. arXiv:2408.12119  [pdf, other

    cs.CR cs.AI

    Understanding Data Reconstruction Leakage in Federated Learning from a Theoretical Perspective

    Authors: Zifan Wang, Binghui Zhang, Meng Pang, Yuan Hong, Binghui Wang

    Abstract: Federated learning (FL) is an emerging collaborative learning paradigm that aims to protect data privacy. Unfortunately, recent works show FL algorithms are vulnerable to the serious data reconstruction attacks. However, existing works lack a theoretical foundation on to what extent the devices' data can be reconstructed and the effectiveness of these attacks cannot be compared fairly due to their… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  15. arXiv:2408.12071  [pdf, other

    cs.LG

    Multi-Task Curriculum Graph Contrastive Learning with Clustering Entropy Guidance

    Authors: Chusheng Zeng, Bocheng Wang, Jinghui Yuan, Rong Wang, Mulin Chen

    Abstract: Recent advances in unsupervised deep graph clustering have been significantly promoted by contrastive learning. Despite the strides, most graph contrastive learning models face challenges: 1) graph augmentation is used to improve learning diversity, but commonly used random augmentation methods may destroy inherent semantics and cause noise; 2) the fixed positive and negative sample selection stra… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  16. arXiv:2408.11878  [pdf, other

    cs.CL cs.CE q-fin.CP

    Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

    Authors: Qianqian Xie, Dong Li, Mengxi Xiao, Zihao Jiang, Ruoyu Xiang, Xiao Zhang, Zhengyu Chen, Yueru He, Weiguang Han, Yuzhe Yang, Shunian Chen, Yifei Zhang, Lihang Shen, Daniel Kim, Zhiwei Liu, Zheheng Luo, Yangyang Yu, Yupeng Cao, Zhiyang Deng, Zhiyuan Yao, Haohang Li, Duanyu Feng, Yongfu Dai, VijayaSai Somasundaram, Peng Lu , et al. (14 additional authors not shown)

    Abstract: Large language models (LLMs) have advanced financial applications, yet they often lack sufficient financial knowledge and struggle with tasks involving multi-modal inputs like tables and time series data. To address these limitations, we introduce \textit{Open-FinLLMs}, a series of Financial LLMs. We begin with FinLLaMA, pre-trained on a 52 billion token financial corpus, incorporating text, table… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 33 pages, 13 figures

  17. arXiv:2408.11839  [pdf

    cs.LG cs.AI

    Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function

    Authors: Hongye Zheng, Bingxing Wang, Minheng Xiao, Honglin Qin, Zhizhong Wu, Lianghao Tan

    Abstract: Adaptive optimizers are pivotal in guiding the weight updates of deep neural networks, yet they often face challenges such as poor generalization and oscillation issues. To counter these, we introduce sigSignGrad and tanhSignGrad, two novel optimizers that integrate adaptive friction coefficients based on the Sigmoid and Tanh functions, respectively. These algorithms leverage short-term gradient i… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  18. arXiv:2408.10688  [pdf, other

    cs.CV

    TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning

    Authors: Bin Wang, Wenqian Wang

    Abstract: Recently, large-scale pre-trained vision-language models (e.g., CLIP), have garnered significant attention thanks to their powerful representative capabilities. This inspires researchers in transferring the knowledge from these large pre-trained models to other task-specific models, e.g., Video Action Recognition (VAR) models, via particularly leveraging side networks to enhance the efficiency of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  19. arXiv:2408.10573  [pdf, other

    cs.CL cs.AI

    Putting People in LLMs' Shoes: Generating Better Answers via Question Rewriter

    Authors: Junhao Chen, Bowen Wang, Zhouqiang jiang, Yuta Nakashima

    Abstract: Large Language Models (LLMs) have demonstrated significant capabilities, particularly in the domain of question answering (QA). However, their effectiveness in QA is often undermined by the vagueness of user questions. To address this issue, we introduce single-round instance-level prompt optimization, referred to as question rewriter. By enhancing the intelligibility of human questions for black-… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 7 pages, 4 figures, 5 tables

  20. arXiv:2408.10556  [pdf, other

    cs.AI cs.LG

    Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

    Authors: Yun Qu, Boyuan Wang, Jianzhun Shao, Yuhang Jiang, Chen Chen, Zhenbin Ye, Lin Liu, Junfeng Yang, Lin Lai, Hongyang Qin, Minwen Deng, Juchao Zhuo, Deheng Ye, Qiang Fu, Wei Yang, Guang Yang, Lanxiao Huang, Xiangyang Ji

    Abstract: The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehens… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  21. arXiv:2408.10533  [pdf, other

    cs.CV

    FAGStyle: Feature Augmentation on Geodesic Surface for Zero-shot Text-guided Diffusion Image Style Transfer

    Authors: Yuexing Han, Liheng Ruan, Bing Wang

    Abstract: The goal of image style transfer is to render an image guided by a style reference while maintaining the original content. Existing image-guided methods rely on specific style reference images, restricting their wider application and potentially compromising result quality. As a flexible alternative, text-guided methods allow users to describe the desired style using text prompts. Despite their ve… ▽ More

    Submitted 20 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  22. arXiv:2408.09885  [pdf, other

    cs.GT

    Joint Auction in the Online Advertising Market

    Authors: Zhen Zhang, Weian Li, Yahui Lei, Bingzhe Wang, Zhicheng Zhang, Qi Qi, Qiang Liu, Xingxing Wang

    Abstract: Online advertising is a primary source of income for e-commerce platforms. In the current advertising pattern, the oriented targets are the online store owners who are willing to pay extra fees to enhance the position of their stores. On the other hand, brand suppliers are also desirable to advertise their products in stores to boost brand sales. However, the currently used advertising mode cannot… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  23. arXiv:2408.09462  [pdf, other

    cs.MM

    SpeechEE: A Novel Benchmark for Speech Event Extraction

    Authors: Bin Wang, Meishan Zhang, Hao Fei, Yu Zhao, Bobo Li, Shengqiong Wu, Wei Ji, Min Zhang

    Abstract: Event extraction (EE) is a critical direction in the field of information extraction, laying an important foundation for the construction of structured knowledge bases. EE from text has received ample research and attention for years, yet there can be numerous real-world applications that require direct information acquisition from speech signals, online meeting minutes, interview summaries, press… ▽ More

    Submitted 23 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

  24. arXiv:2408.09262  [pdf, other

    cs.LG cs.AI cs.LO

    PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks

    Authors: Xiyue Zhang, Benjie Wang, Marta Kwiatkowska, Huan Zhang

    Abstract: Most methods for neural network verification focus on bounding the image, i.e., set of outputs for a given input set. This can be used to, for example, check the robustness of neural network predictions to bounded perturbations of an input. However, verifying properties concerning the preimage, i.e., the set of inputs satisfying an output property, requires abstractions in the input space. We pres… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.03686

  25. arXiv:2408.09144  [pdf, other

    cs.CV

    SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

    Authors: Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

    Abstract: Sparse view NeRF is challenging because limited input images lead to an under constrained optimization problem for volume rendering. Existing methods address this issue by relying on supplementary information, such as depth maps. However, generating this supplementary information accurately remains problematic and often leads to NeRF producing images with undesired artifacts. To address these arti… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  26. arXiv:2408.08841  [pdf, other

    cs.CL

    FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats

    Authors: Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Baoxin Wang, Dayong Wu, Qingfu Zhu, Wanxiang Che

    Abstract: The table reasoning task aims to answer the question according to the given table. Currently, using Large Language Models (LLMs) is the predominant method for table reasoning. Most existing methods employ a fixed tabular format to represent the table, which could limit the performance. Given that each instance requires different capabilities and models possess varying abilities, we assert that dif… ▽ More

    Submitted 27 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  27. arXiv:2408.08541  [pdf, other

    cs.CL cs.LG

    Where is the signal in tokenization space?

    Authors: Renato Lui Geh, Honghua Zhang, Kareem Ahmed, Benjie Wang, Guy Van den Broeck

    Abstract: Large Language Models (LLMs) are typically shipped with tokenizers that deterministically encode text into so-called canonical token sequences, to which the LLMs assign probability values. One common assumption is that the probability of a piece of text is the probability of its canonical token sequence. However, the tokenization of a string is not unique: e.g., the Llama2 tokenizer encodes Tokens… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  28. arXiv:2408.08395  [pdf, other

    cs.GT

    Uncoupled and Convergent Learning in Monotone Games under Bandit Feedback

    Authors: Jing Dong, Baoxiang Wang, Yaoliang Yu

    Abstract: We study the problem of no-regret learning algorithms for general monotone and smooth games and their last-iterate convergence properties. Specifically, we investigate the problem under bandit feedback and strongly uncoupled dynamics, which allows modular development of the multi-player system that applies to a wide range of real applications. We propose a mirror-descent-based algorithm, which con… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  29. arXiv:2408.08208  [pdf, other

    cs.IR cs.AI

    LLM4DSR: Leveraing Large Language Model for Denoising Sequential Recommendation

    Authors: Bohao Wang, Feng Liu, Jiawei Chen, Yudi Wu, Xingyu Lou, Jun Wang, Yan Feng, Chun Chen, Can Wang

    Abstract: Sequential recommendation systems fundamentally rely on users' historical interaction sequences, which are often contaminated by noisy interactions. Identifying these noisy interactions accurately without additional information is particularly difficult due to the lack of explicit supervisory signals to denote noise. Large Language Models (LLMs), equipped with extensive open knowledge and semantic… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  30. arXiv:2408.08067  [pdf, other

    cs.CL cs.AI

    RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

    Authors: Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Cheng Jiayang, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, Pengfei Liu, Yue Zhang, Zheng Zhang

    Abstract: Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for b… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: Under Review. Github Repo: https://github.com/amazon-science/RAGChecker

  31. arXiv:2408.08023  [pdf, other

    cs.LG cs.AI

    Causal Discovery from Time-Series Data with Short-Term Invariance-Based Convolutional Neural Networks

    Authors: Rujia Shen, Boran Wang, Chao Zhao, Yi Guan, Jingchi Jiang

    Abstract: Causal discovery from time-series data aims to capture both intra-slice (contemporaneous) and inter-slice (time-lagged) causality between variables within the temporal chain, which is crucial for various scientific disciplines. Compared to causal discovery from non-time-series data, causal discovery from time-series data necessitates more serialized samples with a larger amount of observed time st… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  32. arXiv:2408.06941  [pdf, other

    cs.IR

    OpenResearcher: Unleashing AI for Accelerated Scientific Research

    Authors: Yuxiang Zheng, Shichao Sun, Lin Qiu, Dongyu Ru, Cheng Jiayang, Xuefeng Li, Jifan Lin, Binjie Wang, Yun Luo, Renjie Pan, Yang Xu, Qingkai Min, Zizhao Zhang, Yiwen Wang, Wenjie Li, Pengfei Liu

    Abstract: The rapid growth of scientific literature imposes significant challenges for researchers endeavoring to stay updated with the latest advancements in their fields and delve into new areas. We introduce OpenResearcher, an innovative platform that leverages Artificial Intelligence (AI) techniques to accelerate the research process by answering diverse questions from researchers. OpenResearcher is bui… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  33. arXiv:2408.06574  [pdf, other

    cs.CL

    SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

    Authors: Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, Guoping Hu

    Abstract: Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Ass… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  34. arXiv:2408.05780  [pdf, other

    cs.CV

    U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training

    Authors: Zhuoyan Liu, Bo Wang, Ye Li

    Abstract: Underwater object detection has higher requirements of running speed and deployment efficiency for the detector due to its specific environmental challenges. NMS of two- or one-stage object detectors and transformer architecture of query-based end-to-end object detectors are not conducive to deployment on underwater embedded devices with limited processing power. As for the detrimental effect of u… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  35. arXiv:2408.05723  [pdf, other

    cs.LG cs.CR cs.CV

    Deep Learning with Data Privacy via Residual Perturbation

    Authors: Wenqi Tao, Huaming Ling, Zuoqiang Shi, Bao Wang

    Abstract: Protecting data privacy in deep learning (DL) is of crucial importance. Several celebrated privacy notions have been established and used for privacy-preserving DL. However, many existing mechanisms achieve privacy at the cost of significant utility degradation and computational overhead. In this paper, we propose a stochastic differential equation-based residual perturbation for privacy-preservin… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  36. arXiv:2408.05497  [pdf, other

    cs.CL

    MABR: A Multilayer Adversarial Bias Removal Approach Without Prior Bias Knowledge

    Authors: Maxwell J. Yin, Boyu Wang, Charles Ling

    Abstract: Models trained on real-world data often mirror and exacerbate existing social biases. Traditional methods for mitigating these biases typically require prior knowledge of the specific biases to be addressed, such as gender or racial biases, and the social groups associated with each instance. In this paper, we introduce a novel adversarial training strategy that operates independently of prior bia… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  37. arXiv:2408.05440  [pdf

    cs.CV eess.IV

    Content-decoupled Contrastive Learning-based Implicit Degradation Modeling for Blind Image Super-Resolution

    Authors: Jiang Yuan, Ji Ma, Bo Wang, Weiming Hu

    Abstract: Implicit degradation modeling-based blind super-resolution (SR) has attracted more increasing attention in the community due to its excellent generalization to complex degradation scenarios and wide application range. How to extract more discriminative degradation representations and fully adapt them to specific image features is the key to this task. In this paper, we propose a new Content-decoup… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  38. arXiv:2408.05435  [pdf, other

    quant-ph cs.LG

    SuperEncoder: Towards Universal Neural Approximate Quantum State Preparation

    Authors: Yilun Zhao, Bingmeng Wang, Wenle Jiang, Xiwei Pan, Bing Li, Yinhe Han, Ying Wang

    Abstract: Numerous quantum algorithms operate under the assumption that classical data has already been converted into quantum states, a process termed Quantum State Preparation (QSP). However, achieving precise QSP requires a circuit depth that scales exponentially with the number of qubits, making it a substantial obstacle in harnessing quantum advantage. Recent research suggests using a Parameterized Qua… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  39. arXiv:2408.05393  [pdf, other

    stat.ML cs.LG

    fastkqr: A Fast Algorithm for Kernel Quantile Regression

    Authors: Qian Tang, Yuwen Gu, Boxiang Wang

    Abstract: Quantile regression is a powerful tool for robust and heterogeneous learning that has seen applications in a diverse range of applied areas. However, its broader application is often hindered by the substantial computational demands arising from the non-smooth quantile loss function. In this paper, we introduce a novel algorithm named fastkqr, which significantly advances the computation of quanti… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  40. arXiv:2408.05178  [pdf, other

    cs.LG

    ECG-FM: An Open Electrocardiogram Foundation Model

    Authors: Kaden McKeen, Laura Oliva, Sameer Masood, Augustin Toma, Barry Rubin, Bo Wang

    Abstract: The electrocardiogram (ECG) is a ubiquitous diagnostic test. Conventional task-specific ECG analysis models require large numbers of expensive ECG annotations or associated labels to train. Transfer learning techniques have been shown to improve generalization and reduce reliance on labeled data. We present ECG-FM, an open foundation model for ECG analysis, and conduct a comprehensive study perfor… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 22 pages, 7 figures, 10 tables

    MSC Class: 68T01 ACM Class: I.2.0

  41. arXiv:2408.04836  [pdf, other

    cs.CG cs.DC

    Distributed Augmentation, Hypersweeps, and Branch Decomposition of Contour Trees for Scientific Exploration

    Authors: Mingzhe Li, Hamish Carr, Oliver Rübel, Bei Wang, Gunther H. Weber

    Abstract: Contour trees describe the topology of level sets in scalar fields and are widely used in topological data analysis and visualization. A main challenge of utilizing contour trees for large-scale scientific data is their computation at scale using high-performance computing. To address this challenge, recent work has introduced distributed hierarchical contour trees for distributed computation and… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  42. arXiv:2408.04229  [pdf, ps, other

    cs.LG cs.AI

    Probabilistic Circuits for Cumulative Distribution Functions

    Authors: Oliver Broadrick, William Cao, Benjie Wang, Martin Trapp, Guy Van den Broeck

    Abstract: A probabilistic circuit (PC) succinctly expresses a function that represents a multivariate probability distribution and, given sufficient structural properties of the circuit, supports efficient probabilistic inference. Typically a PC computes the probability mass (or density) function (PMF or PDF) of the distribution. We consider PCs instead computing the cumulative distribution function (CDF).… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Journal ref: In Proceedings of the UAI Workshop on Tractable Probabilistic Modeling (TPM), 2024

  43. arXiv:2408.03560  [pdf, other

    cs.LG stat.ML

    In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models

    Authors: Ayrton San Joaquin, Bin Wang, Zhengyuan Liu, Nicholas Asher, Brian Lim, Philippe Muller, Nancy Chen

    Abstract: Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the open-source community. To address this challenge, we propose the In2Core algorithm, which selects a coreset by analyzing the correlation between training and eval… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  44. arXiv:2408.03361  [pdf, other

    eess.IV cs.CV

    GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

    Authors: Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao

    Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Curren… ▽ More

    Submitted 9 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  45. arXiv:2408.03322  [pdf, other

    eess.IV cs.CV

    Segment Anything in Medical Images and Videos: Benchmark and Deployment

    Authors: Jun Ma, Sumin Kim, Feifei Li, Mohammed Baharoon, Reza Asakereh, Hongwei Lyu, Bo Wang

    Abstract: Recent advances in segmentation foundation models have enabled accurate and efficient segmentation across a wide range of natural images and videos, but their utility to medical data remains unclear. In this work, we first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos and point out its strengths and weaknesses by comparing… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  46. arXiv:2408.02091  [pdf, other

    cs.CV

    Past Movements-Guided Motion Representation Learning for Human Motion Prediction

    Authors: Junyu Shi, Baoxuan Wang

    Abstract: Human motion prediction based on 3D skeleton is a significant challenge in computer vision, primarily focusing on the effective representation of motion. In this paper, we propose a self-supervised learning framework designed to enhance motion representation. This framework consists of two stages: first, the network is pretrained through the self-reconstruction of past sequences, and the guided re… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 13 pages, 4 figures

    MSC Class: 68T07 (Primary) 68T45 (Secondary) ACM Class: I.2.10; I.4.10; I.4.m

  47. arXiv:2408.01967  [pdf, other

    cs.LG

    A multi-task deep learning approach for lane-level pavement performance prediction with segment-level data

    Authors: Bo Wang, Wenbo Zhang, Yunpeng LI

    Abstract: The elaborate pavement performance prediction is an important premise of implementing preventive maintenance. Our survey reveals that in practice, the pavement performance is usually measured at segment-level, where an unique performance value is obtained for all lanes within one segment of 1km length. It still lacks more elaborate performance analysis at lane-level due to costly data collection a… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 24 pages, 8 figures, 4 tables

  48. arXiv:2408.01933  [pdf, other

    cs.CL cs.AI

    DiReCT: Diagnostic Reasoning for Clinical Notes via Large Language Models

    Authors: Bowen Wang, Jiuyang Chang, Yiming Qian, Guoxin Chen, Junhao Chen, Zhouqiang Jiang, Jiahao Zhang, Yuta Nakashima, Hajime Nagahara

    Abstract: Large language models (LLMs) have recently showcased remarkable capabilities, spanning a wide range of tasks and applications, including those in the medical domain. Models like GPT-4 excel in medical question answering but may face challenges in the lack of interpretability when handling complex tasks in real clinical settings. We thus introduce the diagnostic reasoning dataset for clinical notes… ▽ More

    Submitted 6 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: 9 pages,6 figures

  49. arXiv:2408.01880  [pdf, other

    cs.AI cs.LG

    Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration

    Authors: Zijian Wang, Bin Wang, Haifeng Jing, Huayu Li, Hongbo Dou

    Abstract: Recent years, multi-hop reasoning has been widely studied for knowledge graph (KG) reasoning due to its efficacy and interpretability. However, previous multi-hop reasoning approaches are subject to two primary shortcomings. First, agents struggle to learn effective and robust policies at the early phase due to sparse rewards. Second, these approaches often falter on specific datasets like sparse… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  50. arXiv:2408.01044  [pdf, other

    cs.CV

    Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model

    Authors: Yang Jin, Lei Zhang, Shi Yan, Bin Fan, Binglu Wang

    Abstract: Gaze object prediction (GOP) aims to predict the category and location of the object that a human is looking at. Previous methods utilized box-level supervision to identify the object that a person is looking at, but struggled with semantic ambiguity, ie, a single box may contain several items since objects are close together. The Vision foundation model (VFM) has improved in object segmentation u… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024