Skip to main content

Showing 1–50 of 1,342 results for author: Zhang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03643  [pdf, other

    cs.CV cs.CL

    CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

    Authors: Bin Wang, Fan Wu, Linke Ouyang, Zhuangcheng Gu, Rui Zhang, Renqiu Xia, Bo Zhang, Conghui He

    Abstract: Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable limitations. They overlook the fact that the same formula has diverse representations and is highly… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Project Website: https://github.com/opendatalab/UniMERNet/tree/main/cdm

  2. arXiv:2409.01661  [pdf, other

    cs.CR cs.CV cs.LG

    $S^2$NeRF: Privacy-preserving Training Framework for NeRF

    Authors: Bokang Zhang, Yanglin Zhang, Zhikun Zhang, Jinglan Yang, Lingying Huang, Junfeng Wu

    Abstract: Neural Radiance Fields (NeRF) have revolutionized 3D computer vision and graphics, facilitating novel view synthesis and influencing sectors like extended reality and e-commerce. However, NeRF's dependence on extensive data collection, including sensitive scene image data, introduces significant privacy risks when users upload this data for model training. To address this concern, we first propose… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

  3. arXiv:2409.01540  [pdf, other

    cs.CV cs.AI cs.LG

    Long-Range Biometric Identification in Real World Scenarios: A Comprehensive Evaluation Framework Based on Missions

    Authors: Deniz Aykac, Joel Brogan, Nell Barber, Ryan Shivers, Bob Zhang, Dallas Sacca, Ryan Tipton, Gavin Jager, Austin Garret, Matthew Love, Jim Goddard, David Cornett III, David S. Bolme

    Abstract: The considerable body of data available for evaluating biometric recognition systems in Research and Development (R\&D) environments has contributed to the increasingly common problem of target performance mismatch. Biometric algorithms are frequently tested against data that may not reflect the real world applications they target. From a Testing and Evaluation (T\&E) standpoint, this domain misma… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  4. arXiv:2409.01514  [pdf, other

    cs.CV cs.AI cs.LG

    From Data to Insights: A Covariate Analysis of the IARPA BRIAR Dataset for Multimodal Biometric Recognition Algorithms at Altitude and Range

    Authors: David S. Bolme, Deniz Aykac, Ryan Shivers, Joel Brogan, Nell Barber, Bob Zhang, Laura Davies, David Cornett III

    Abstract: This paper examines covariate effects on fused whole body biometrics performance in the IARPA BRIAR dataset, specifically focusing on UAV platforms, elevated positions, and distances up to 1000 meters. The dataset includes outdoor videos compared with indoor images and controlled gait recordings. Normalized raw fusion scores relate directly to predicted false accept rates (FAR), offering an intuit… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  5. arXiv:2409.01245  [pdf, other

    cs.LG cs.AI cs.RO

    Revisiting Safe Exploration in Safe Reinforcement learning

    Authors: David Eckel, Baohe Zhang, Joschka Bödecker

    Abstract: Safe reinforcement learning (SafeRL) extends standard reinforcement learning with the idea of safety, where safety is typically defined through the constraint of the expected cost return of a trajectory being below a set limit. However, this metric fails to distinguish how costs accrue, treating infrequent severe cost events as equal to frequent mild ones, which can lead to riskier behaviors and r… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  6. arXiv:2409.00597  [pdf, other

    cs.MM cs.CL

    Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model

    Authors: Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, Bowen Zhang

    Abstract: Stance detection, which aims to identify public opinion towards specific targets using social media data, is an important yet challenging task. With the proliferation of diverse multimodal social media content including text, and images multimodal stance detection (MSD) has become a crucial research area. However, existing MSD studies have focused on modeling stance within individual text-image pa… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: ACM MM2024

  7. arXiv:2409.00327  [pdf, other

    cs.CR cs.AI cs.DC

    Demo: FedCampus: A Real-world Privacy-preserving Mobile Application for Smart Campus via Federated Learning & Analytics

    Authors: Jiaxiang Geng, Beilong Tang, Boyan Zhang, Jiaqi Shao, Bing Luo

    Abstract: In this demo, we introduce FedCampus, a privacy-preserving mobile application for smart \underline{campus} with \underline{fed}erated learning (FL) and federated analytics (FA). FedCampus enables cross-platform on-device FL/FA for both iOS and Android, supporting continuously models and algorithms deployment (MLOps). Our app integrates privacy-preserving processed data via differential privacy (DP… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: 2 pages, 3 figures, accepted for publication in ACM Mobihoc 2024

  8. arXiv:2409.00303  [pdf, other

    cs.RO

    Rapid and Robust Trajectory Optimization for Humanoids

    Authors: Bohao Zhang, Ram Vasudevan

    Abstract: Performing trajectory design for humanoid robots with high degrees of freedom is computationally challenging. The trajectory design process also often involves carefully selecting various hyperparameters and requires a good initial guess which can further complicate the development process. This work introduces a generalized gait optimization framework that directly generates smooth and physically… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  9. arXiv:2409.00099  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Query-by-Example Keyword Spotting Using Spectral-Temporal Graph Attentive Pooling and Multi-Task Learning

    Authors: Zhenyu Wang, Shuyu Kong, Li Wan, Biqiao Zhang, Yiteng Huang, Mumin Jin, Ming Sun, Xin Lei, Zhaojun Yang

    Abstract: Existing keyword spotting (KWS) systems primarily rely on predefined keyword phrases. However, the ability to recognize customized keywords is crucial for tailoring interactions with intelligent devices. In this paper, we present a novel Query-by-Example (QbyE) KWS system that employs spectral-temporal graph attentive pooling and multi-task learning. This framework aims to effectively learn speake… ▽ More

    Submitted 26 August, 2024; originally announced September 2024.

    Journal ref: INTERSPEECH 2024

  10. arXiv:2408.16559  [pdf, other

    cs.SE cs.RO

    DroneWiS: Automated Simulation Testing of small Unmanned Aerial Systems in Realistic Windy Conditions

    Authors: Bohan Zhang, Ankit Agrawal

    Abstract: The continuous evolution of small Unmanned Aerial Systems (sUAS) demands advanced testing methodologies to ensure their safe and reliable operations in the real-world. To push the boundaries of sUAS simulation testing in realistic environments, we previously developed the DroneReqValidator (DRV) platform, allowing developers to automatically conduct simulation testing in digital twin of earth. In… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Journal ref: ASE 2024 - Tool Demo Track

  11. arXiv:2408.15256  [pdf, other

    cs.HC cs.AI

    Improving Ontology Requirements Engineering with OntoChat and Participatory Prompting

    Authors: Yihang Zhao, Bohui Zhang, Xi Hu, Shuyin Ouyang, Jongmo Kim, Nitisha Jain, Jacopo de Berardinis, Albert Meroño-Peñuela, Elena Simperl

    Abstract: Past ontology requirements engineering (ORE) has primarily relied on manual methods, such as interviews and collaborative forums, to gather user requirements from domain experts, especially in large projects. Current OntoChat offers a framework for ORE that utilises large language models (LLMs) to streamline the process through four key functions: user story creation, competency question (CQ) extr… ▽ More

    Submitted 29 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

  12. arXiv:2408.14380  [pdf, other

    cs.CL cs.AI

    Probing Causality Manipulation of Large Language Models

    Authors: Chenyang Zhang, Haibo Tong, Bin Zhang, Dongyu Zhang

    Abstract: Large language models (LLMs) have shown various ability on natural language processing, including problems about causality. It is not intuitive for LLMs to command causality, since pretrained models usually work on statistical associations, and do not focus on causes and effects in sentences. So that probing internal manipulation of causality is necessary for LLMs. This paper proposes a novel appr… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  13. arXiv:2408.13491  [pdf, other

    cs.CV

    ESA: Annotation-Efficient Active Learning for Semantic Segmentation

    Authors: Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, Bowen Zhang, Akide Liu, Yang Zhao

    Abstract: Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input. Previous methods in semantic segmentation have centered on individual pixels or small areas, neglecting the rich patterns in natural images and the power of advanced pre-trained models. To address these challenges, we propose three key contributio… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  14. arXiv:2408.13355  [pdf, other

    cs.SD cs.AI eess.AS

    Disentangled Training with Adversarial Examples For Robust Small-footprint Keyword Spotting

    Authors: Zhenyu Wang, Li Wan, Biqiao Zhang, Yiteng Huang, Shang-Wen Li, Ming Sun, Xin Lei, Zhaojun Yang

    Abstract: A keyword spotting (KWS) engine that is continuously running on device is exposed to various speech signals that are usually unseen before. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose datasource… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Journal ref: ICASSP 2023

  15. arXiv:2408.12119  [pdf, other

    cs.CR cs.AI

    Understanding Data Reconstruction Leakage in Federated Learning from a Theoretical Perspective

    Authors: Zifan Wang, Binghui Zhang, Meng Pang, Yuan Hong, Binghui Wang

    Abstract: Federated learning (FL) is an emerging collaborative learning paradigm that aims to protect data privacy. Unfortunately, recent works show FL algorithms are vulnerable to the serious data reconstruction attacks. However, existing works lack a theoretical foundation on to what extent the devices' data can be reconstructed and the effectiveness of these attacks cannot be compared fairly due to their… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  16. arXiv:2408.10670  [pdf

    cs.CV eess.IV

    A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning

    Authors: Deyu Li, Longfei Xiao, Handi Wei, Yan Li, Binghua Zhang

    Abstract: The accurate measurement of the wave field and its spatiotemporal evolution is essential in many hydrodynamic experiments and engineering applications. The binocular stereo imaging technique has been widely used to measure waves. However, the optical properties of indoor water surfaces, including transparency, specular reflection, and texture absence, pose challenges for image processing and stere… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  17. arXiv:2408.10555  [pdf, other

    cs.LG cs.IR

    Target-Prompt Online Graph Collaborative Learning for Temporal QoS Prediction

    Authors: Shengxiang Hu, Guobing Zou, Song Yang, Shiyi Lin, Bofeng Zhang, Yixin Chen

    Abstract: In service-oriented architecture, accurately predicting the Quality of Service (QoS) is vital for maintaining reliability and enhancing user satisfaction. However, current methods often neglect high-order latent collaborative relationships and fail to dynamically adjust feature learning for specific user-service invocations, which are critical for precise feature extraction. Moreover, relying on R… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    MSC Class: 68T99 ACM Class: H.4.0; I.2.0

  18. arXiv:2408.10504  [pdf, other

    cs.AI

    QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning

    Authors: Yilun Kong, Hangyu Mao, Qi Zhao, Bin Zhang, Jingqing Ruan, Li Shen, Yongzhe Chang, Xueqian Wang, Rui Zhao, Dacheng Tao

    Abstract: Prompt engineering has demonstrated remarkable success in enhancing the performance of large language models (LLMs) across diverse tasks. However, most existing prompt optimization methods only focus on the task-level performance, overlooking the importance of query-preferred prompts, which leads to suboptimal performances. Additionally, these methods rely heavily on frequent interactions with LLM… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  19. arXiv:2408.09501  [pdf, other

    cs.MA cs.AI

    Beyond Local Views: Global State Inference with Diffusion Models for Cooperative Multi-Agent Reinforcement Learning

    Authors: Zhiwei Xu, Hangyu Mao, Nianmin Zhang, Xin Xin, Pengjie Ren, Dapeng Li, Bin Zhang, Guoliang Fan, Zhumin Chen, Changwei Wang, Jiangjin Yin

    Abstract: In partially observable multi-agent systems, agents typically only have access to local observations. This severely hinders their ability to make precise decisions, particularly during decentralized execution. To alleviate this problem and inspired by image outpainting, we propose State Inference with Diffusion Models (SIDIFF), which uses diffusion models to reconstruct the original global state b… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 15 pages, 12 figures

  20. arXiv:2408.09393  [pdf, other

    cs.LG cs.AI cs.DC

    Federated Graph Learning with Structure Proxy Alignment

    Authors: Xingbo Fu, Zihan Chen, Binchi Zhang, Chen Chen, Jundong Li

    Abstract: Federated Graph Learning (FGL) aims to learn graph learning models over graph data distributed in multiple data owners, which has been applied in various applications such as social recommendation and financial fraud detection. Inherited from generic Federated Learning (FL), FGL similarly has the data heterogeneity issue where the label distribution may vary significantly for distributed graph dat… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by KDD 2024

  21. arXiv:2408.08830  [pdf, other

    cs.RO

    System Identification For Constrained Robots

    Authors: Bohao Zhang, Daniel Haugk, Ram Vasudevan

    Abstract: Identifying the parameters of robotic systems, such as motor inertia or joint friction, is critical to satisfactory controller synthesis, model analysis, and observer design. Conventional identification techniques are designed primarily for unconstrained systems, such as robotic manipulators. In contrast, the growing importance of legged robots that feature closed kinematic chains or other constra… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  22. arXiv:2408.07467  [pdf, other

    cs.CV

    Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification

    Authors: Yongcheng Li, Lingcong Cai, Ying Lu, Cheng Lin, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan

    Abstract: Accurate classification of blood cells is of vital significance in the diagnosis of hematological disorders. However, in real-world scenarios, domain shifts caused by the variability in laboratory procedures and settings, result in a rapid deterioration of the model's generalization performance. To address this issue, we propose a novel framework of domain-invariant representation learning (DoRL)… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  23. arXiv:2408.07410  [pdf, other

    cs.CL

    Aquila2 Technical Report

    Authors: Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu

    Abstract: This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion. These models are trained based on an innovative framework named HeuriMentor (HM), which offers real-time insights into model convergence and enhances the training process and data management. The HM System, comprising the Adaptive Training Engine (ATE), Training… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  24. InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

    Authors: Bo-Wen Zhang, Yan Yan, Lin Li, Guang Liu

    Abstract: Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challen… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by CIKM 2024

    ACM Class: I.2.7

  25. arXiv:2408.07084  [pdf

    cs.LG cs.AI

    Dynamic Hypergraph-Enhanced Prediction of Sequential Medical Visits

    Authors: Wangying Yang, Zitao Zheng, Shi Bo, Zhizhong Wu, Bo Zhang, Yuanfang Yang

    Abstract: This study introduces a pioneering Dynamic Hypergraph Networks (DHCE) model designed to predict future medical diagnoses from electronic health records with enhanced accuracy. The DHCE model innovates by identifying and differentiating acute and chronic diseases within a patient's visit history, constructing dynamic hypergraphs that capture the complex, high-order interactions between diseases. It… ▽ More

    Submitted 19 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  26. arXiv:2408.06716  [pdf, other

    cs.CV

    Towards Cross-Domain Single Blood Cell Image Classification via Large-Scale LoRA-based Segment Anything Model

    Authors: Yongcheng Li, Lingcong Cai, Ying Lu, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan

    Abstract: Accurate classification of blood cells plays a vital role in hematological analysis as it aids physicians in diagnosing various medical conditions. In this study, we present a novel approach for classifying blood cell images known as BC-SAM. BC-SAM leverages the large-scale foundation model of Segment Anything Model (SAM) and incorporates a fine-tuning technique using LoRA, allowing it to extract… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  27. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  28. arXiv:2408.04381  [pdf, other

    cs.IR

    Understanding and Modeling Job Marketplace with Pretrained Language Models

    Authors: Yaochen Zhu, Liang Wu, Binchi Zhang, Song Wang, Qi Guo, Liangjie Hong, Luke Simon, Jundong Li

    Abstract: Job marketplace is a heterogeneous graph composed of interactions among members (job-seekers), companies, and jobs. Understanding and modeling job marketplace can benefit both job seekers and employers, ultimately contributing to the greater good of the society. However, existing graph neural network (GNN)-based methods have shallow understandings of the associated textual features and heterogeneo… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: accepted by CIKM'24 applied research track

  29. arXiv:2408.04325  [pdf, other

    eess.AS cs.CL

    HydraFormer: One Encoder For All Subsampling Rates

    Authors: Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang

    Abstract: In automatic speech recognition, subsampling is essential for tackling diverse scenarios. However, the inadequacy of a single subsampling rate to address various real-world situations often necessitates training and deploying multiple models, consequently increasing associated costs. To address this issue, we propose HydraFormer, comprising HydraSub, a Conformer-based encoder, and a BiTransformer-… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: accepted by ICME 2024

  30. arXiv:2408.03131  [pdf, other

    cs.RO eess.SY

    Stochastic Trajectory Optimization for Demonstration Imitation

    Authors: Chenlin Ming, Zitong Wang, Boxuan Zhang, Xiaoming Duan, Jianping He

    Abstract: Humans often learn new skills by imitating the experts and gradually developing their proficiency. In this work, we introduce Stochastic Trajectory Optimization for Demonstration Imitation (STODI), a trajectory optimization framework for robots to imitate the shape of demonstration trajectories with improved dynamic performance. Consistent with the human learning process, demonstration imitation s… ▽ More

    Submitted 6 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  31. arXiv:2408.02976  [pdf, ps, other

    cs.CL cs.AI

    Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation

    Authors: Hui Ma, Bo Zhang, Bo Xu, Jian Wang, Hongfei Lin, Xiao Sun

    Abstract: Empathetic response generation, aiming at understanding the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Previous methods mainly focus on using maximum likelihood estimation as the optimization objective for training response generation models, without taking into account the empathy level alignment between generated responses and targ… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  32. arXiv:2408.02960  [pdf, other

    cs.AI

    Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic

    Authors: Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig

    Abstract: Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood Search (LNS), is the current state-of-the-art approach where a fast initial solution is iteratively optimized by destroying and repairing selected paths of the solution. Current MAPF-LNS variants commonly use an adaptive selection mechanism to… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.16767

  33. arXiv:2408.02866  [pdf, other

    cs.LG math.NA

    Back-Projection Diffusion: Solving the Wideband Inverse Scattering Problem with Diffusion Models

    Authors: Borong Zhang, Martín Guerra, Qin Li, Leonardo Zepeda-Núñez

    Abstract: We present Wideband back-projection diffusion, an end-to-end probabilistic framework for approximating the posterior distribution induced by the inverse scattering map from wideband scattering data. This framework leverages conditional diffusion models coupled with the underlying physics of wave-propagation and symmetries in the problem, to produce highly accurate reconstructions. The framework in… ▽ More

    Submitted 9 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  34. arXiv:2408.00929  [pdf, other

    cs.LG cs.CR

    Verification of Machine Unlearning is Fragile

    Authors: Binchi Zhang, Zihan Chen, Cong Shen, Jundong Li

    Abstract: As privacy concerns escalate in the realm of machine learning, data owners now have the option to utilize machine unlearning to remove their data from machine learning models, following recent legislation. To enhance transparency in machine unlearning and avoid potential dishonesty by model providers, various verification strategies have been proposed. These strategies enable data owners to ascert… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ICML 2024

  35. arXiv:2408.00920  [pdf, other

    cs.LG stat.ML

    Towards Certified Unlearning for Deep Neural Networks

    Authors: Binchi Zhang, Yushun Dong, Tianhao Wang, Jundong Li

    Abstract: In the field of machine unlearning, certified unlearning has been extensively studied in convex machine learning models due to its high efficiency and strong theoretical guarantees. However, its application to deep neural networks (DNNs), known for their highly nonconvex nature, still poses challenges. To bridge the gap between certified unlearning and DNNs, we propose several simple techniques to… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ICML 2024

  36. arXiv:2408.00481  [pdf, other

    cs.AI

    HBot: A Chatbot for Healthcare Applications in Traditional Chinese Medicine Based on Human Body 3D Visualization

    Authors: Bolin Zhang, Zhiwei Yi, Jiahao Wang, Dianbo Sui, Zhiying Tu, Dianhui Chu

    Abstract: The unique diagnosis and treatment techniques and remarkable clinical efficacy of traditional Chinese medicine (TCM) make it play an important role in the field of elderly care and healthcare, especially in the rehabilitation of some common chronic diseases of the elderly. Therefore, building a TCM chatbot for healthcare application will help users obtain consultation services in a direct and natu… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: System Demonstration

  37. arXiv:2407.21781  [pdf, other

    cs.RO

    Berkeley Humanoid: A Research Platform for Learning-based Control

    Authors: Qiayuan Liao, Bike Zhang, Xuanyu Huang, Xiaoyu Huang, Zhongyu Li, Koushil Sreenath

    Abstract: We introduce Berkeley Humanoid, a reliable and low-cost mid-scale humanoid research platform for learning-based control. Our lightweight, in-house-built robot is designed specifically for learning algorithms with low simulation complexity, anthropomorphic motion, and high reliability against falls. The robot's narrow sim-to-real gap enables agile and robust locomotion across various terrains in ou… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 12 pages, 9 figures

  38. Air-to-Ground Cooperative OAM Communications

    Authors: Ruirui Chen, Yu Ding, Beibei Zhang, Song Li, Liping Liang

    Abstract: For users in hotspot region, orbital angular momentum (OAM) can realize multifold increase of spectrum efficiency (SE), and the flying base station (FBS) can rapidly support the real-time communication demand. However, the hollow divergence and alignment requirement impose crucial challenges for users to achieve air-to-ground OAM communications, where there exists the line-of-sight path. Therefore… ▽ More

    Submitted 1 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Journal ref: IEEE WIRELESS COMMUNICATIONS LETTERS, VOL. 13, NO. 4, APRIL 2024

  39. Precoding Based Downlink OAM-MIMO Communications with Rate Splitting

    Authors: Ruirui Chen, Jinyang Lin, Beibei Zhang, Yu Ding, Keyue Xu

    Abstract: Orbital angular momentum (OAM) and rate splitting (RS) are the potential key techniques for the future wireless communications. As a new orthogonal resource, OAM can achieve the multifold increase of spectrum efficiency to relieve the scarcity of the spectrum resource, but how to enhance the privacy performance imposes crucial challenge for OAM communications. RS technique divides the information… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Journal ref: IEEE TRANSACTIONS ON BROADCASTING, VOL. 69, NO. 4, DECEMBER 2023

  40. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  41. arXiv:2407.20859  [pdf, other

    cs.CR cs.LG

    Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

    Authors: Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, Yang Zhang

    Abstract: Recently, autonomous agents built on large language models (LLMs) have experienced significant development and are being deployed in real-world applications. These agents can extend the base LLM's capabilities in multiple ways. For example, a well-built agent using GPT-3.5-Turbo as its core can outperform the more advanced GPT-4 model by leveraging external components. More importantly, the usage… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  42. arXiv:2407.19398  [pdf, other

    cs.LG

    IDEA: A Flexible Framework of Certified Unlearning for Graph Neural Networks

    Authors: Yushun Dong, Binchi Zhang, Zhenyu Lei, Na Zou, Jundong Li

    Abstract: Graph Neural Networks (GNNs) have been increasingly deployed in a plethora of applications. However, the graph data used for training may contain sensitive personal information of the involved individuals. Once trained, GNNs typically encode such information in their learnable parameters. As a consequence, privacy leakage may happen when the trained GNNs are deployed and exposed to potential attac… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  43. arXiv:2407.16655  [pdf, other

    cs.CV

    MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

    Authors: Canyu Zhao, Mingyu Liu, Wen Wang, Jianlong Yuan, Hao Chen, Bo Zhang, Chunhua Shen

    Abstract: Recent advancements in video generation have primarily leveraged diffusion models for short-duration content. However, these approaches often fall short in modeling complex narratives and maintaining character consistency over extended periods, which is essential for long-form video production like movies. We propose MovieDreamer, a novel hierarchical framework that integrates the strengths of aut… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 23 pages, 18 figures

  44. arXiv:2407.16533  [pdf, other

    cs.AI cs.RO

    HAPFI: History-Aware Planning based on Fused Information

    Authors: Sujin Jeon, Suyeon Shin, Byoung-Tak Zhang

    Abstract: Embodied Instruction Following (EIF) is a task of planning a long sequence of sub-goals given high-level natural language instructions, such as "Rinse a slice of lettuce and place on the white table next to the fork". To successfully execute these long-term horizon tasks, we argue that an agent must consider its past, i.e., historical data, when making decisions in each step. Nevertheless, recent… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 7 pages, 3 figures, published to ICRA 2024

  45. arXiv:2407.16224  [pdf, other

    cs.CV

    OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

    Authors: Ke Sun, Jian Cao, Qi Wang, Linrui Tian, Xindi Zhang, Lian Zhuo, Bang Zhang, Liefeng Bo, Wenbo Zhou, Weiming Zhang, Daiheng Gao

    Abstract: Virtual Try-On (VTON) has become a transformative technology, empowering users to experiment with fashion without ever having to physically try on clothing. However, existing methods often struggle with generating high-fidelity and detail-consistent results. While diffusion models, such as Stable Diffusion series, have shown their capability in creating high-quality and photorealistic images, they… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 10 pages, 13 figures

  46. arXiv:2407.15066  [pdf, other

    cs.CV

    LSReGen: Large-Scale Regional Generator via Backward Guidance Framework

    Authors: Bowen Zhang, Cheng Yang, Xuanhui Liu

    Abstract: In recent years, advancements in AIGC (Artificial Intelligence Generated Content) technology have significantly enhanced the capabilities of large text-to-image models. Despite these improvements, controllable image generation remains a challenge. Current methods, such as training, forward guidance, and backward guidance, have notable limitations. The first two approaches either demand substantial… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  47. Downstream-Pretext Domain Knowledge Traceback for Active Learning

    Authors: Beichen Zhang, Liang Li, Zheng-Jun Zha, Jiebo Luo, Qingming Huang

    Abstract: Active learning (AL) is designed to construct a high-quality labeled dataset by iteratively selecting the most informative samples. Such sampling heavily relies on data representation, while recently pre-training is popular for robust feature learning. However, as pre-training utilizes low-level pretext tasks that lack annotation, directly using pre-trained representation in AL is inadequate for d… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  48. arXiv:2407.14230  [pdf, other

    cs.CV cs.LG

    ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading

    Authors: Zhiyuan Yang, Bo Zhang, Yufei Shi, Ningze Zhong, Johnathan Loh, Huihui Fang, Yanwu Xu, Si Yong Yeo

    Abstract: Glaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, in the field of computer-aided glaucoma diagnosis, multi-modality methods that integrate the CFP and OCT modalities have achieved greater diagnostic accur… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted by Ophthalmic Medical Image Analysis Workshop at MICCAI'24

  49. arXiv:2407.13920  [pdf, other

    cs.CV cs.AI

    DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention

    Authors: Xiaoya Tang, Bodong Zhang, Beatrice S. Knudsen, Tolga Tasdizen

    Abstract: We here propose a novel hierarchical transformer model that adeptly integrates the feature extraction capabilities of Convolutional Neural Networks (CNNs) with the advanced representational potential of Vision Transformers (ViTs). Addressing the lack of inductive biases and dependence on extensive training datasets in ViTs, our model employs a CNN backbone to generate hierarchical visual represent… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures

  50. arXiv:2407.13545  [pdf, other

    eess.IV cs.CV

    DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

    Authors: Xuhui Liu, Zhi Qiao, Runkun Liu, Hong Li, Juan Zhang, Xiantong Zhen, Zhen Qian, Baochang Zhang

    Abstract: Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific res… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.