Skip to main content

Showing 1–50 of 5,035 results for author: Wang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03755  [pdf, other

    cs.CV

    DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

    Authors: Wenliang Zhao, Haolin Wang, Jie Zhou, Jiwen Lu

    Abstract: Diffusion probabilistic models (DPMs) have shown remarkable performance in visual synthesis but are computationally expensive due to the need for multiple evaluations during the sampling. Recent predictor-corrector diffusion samplers have significantly reduced the required number of function evaluations (NFE), but inherently suffer from a misalignment issue caused by the extra corrector step, espe… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024

  2. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  3. arXiv:2409.03504  [pdf, other

    cs.IR

    HGAMN: Heterogeneous Graph Attention Matching Network for Multilingual POI Retrieval at Baidu Maps

    Authors: Jizhou Huang, Haifeng Wang, Yibo Sun, Miao Fan, Zhengjie Huang, Chunyuan Yuan, Yawen Li

    Abstract: The increasing interest in international travel has raised the demand of retrieving point of interests in multiple languages. This is even superior to find local venues such as restaurants and scenic spots in unfamiliar languages when traveling abroad. Multilingual POI retrieval, enabling users to find desired POIs in a demanded language using queries in numerous languages, has become an indispens… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by KDD'21

  4. arXiv:2409.03271  [pdf, other

    cs.AI cs.CL cs.HC

    Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

    Authors: Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu

    Abstract: The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs). However, despite their widespread adoption and success, CoT methods often exhibit instability due to their inability to consistently ensure the quality of generated reasoning paths, leading to sub-optimal reasoning performance. To address this challenge,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  5. arXiv:2409.03256  [pdf, other

    cs.CL cs.AI

    E2CL: Exploration-based Error Correction Learning for Embodied Agents

    Authors: Hanlin Wang, Chak Tou Leong, Jian Wang, Wenjie Li

    Abstract: Language models are exhibiting increasing capability in knowledge utilization and reasoning. However, when applied as agents in embodied environments, they often suffer from misalignment between their intrinsic knowledge and environmental knowledge, leading to infeasible actions. Traditional environment alignment methods, such as supervised learning on expert trajectories and reinforcement learnin… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  6. arXiv:2409.03249  [pdf, other

    cs.CV

    Multiple weather images restoration using the task transformer and adaptive mixup strategy

    Authors: Yang Wen, Anyu Lai, Bo Qian, Hao Wang, Wuzhen Shi, Wenming Cao

    Abstract: The current state-of-the-art in severe weather removal predominantly focuses on single-task applications, such as rain removal, haze removal, and snow removal. However, real-world weather conditions often consist of a mixture of several weather types, and the degree of weather mixing in autonomous driving scenarios remains unknown. In the presence of complex and diverse weather conditions, a singl… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 10 pages, 5 figures and 2 table

  7. arXiv:2409.03215  [pdf, other

    cs.CL cs.AI cs.LG

    xLAM: A Family of Large Action Models to Empower AI Agent Systems

    Authors: Jianguo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Awalgaonkar, Rithesh Murthy, Eric Hu, Zeyuan Chen, Ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

    Abstract: Autonomous agents powered by large language models (LLMs) have attracted significant research interest. However, the open-source community faces many challenges in developing specialized models for agent tasks, driven by the scarcity of high-quality agent datasets and the absence of standard protocols in this area. We introduce and publicly release xLAM, a series of large action models designed fo… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Technical report for the Salesforce xLAM model series

  8. arXiv:2409.02795  [pdf, other

    cs.CL

    Towards a Unified View of Preference Learning for Large Language Models: A Survey

    Authors: Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang

    Abstract: Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to efficiently enhance the LLM's performance. While effective, research in this area spans multiple domains, and the methods involved are relatively complex to unde… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: Initial Commit, 21 pages

  9. arXiv:2409.02638  [pdf, other

    cs.CV

    MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos

    Authors: Junyi Ma, Xieyuanli Chen, Wentao Bao, Jingyi Xu, Hesheng Wang

    Abstract: Understanding human intentions and actions through egocentric videos is important on the path to embodied artificial intelligence. As a branch of egocentric vision techniques, hand trajectory prediction plays a vital role in comprehending human motion patterns, benefiting downstream tasks in extended reality and robot manipulation. However, capturing high-level human intentions consistent with rea… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  10. AlignGroup: Learning and Aligning Group Consensus with Member Preferences for Group Recommendation

    Authors: Jinfeng Xu, Zheyu Chen, Jinze Li, Shuo Yang, Hewei Wang, Edith C. -H. Ngai

    Abstract: Group activities are important behaviors in human society, providing personalized recommendations for groups is referred to as the group recommendation task. Existing methods can usually be categorized into two strategies to infer group preferences: 1) determining group preferences by aggregating members' personalized preferences, and 2) inferring group consensus by capturing group members' cohere… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 10 pages, accepted by CIKM 2024

  11. arXiv:2409.02483  [pdf, other

    cs.CV cs.AI

    TASAR: Transferable Attack on Skeletal Action Recognition

    Authors: Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xingxing Wei, Meng Wang, He Wang

    Abstract: Skeletal sequences, as well-structured representations of human behaviors, are crucial in Human Activity Recognition (HAR). The transferability of adversarial skeletal sequences enables attacks in real-world HAR scenarios, such as autonomous driving, intelligent surveillance, and human-computer interactions. However, existing Skeleton-based HAR (S-HAR) attacks exhibit weak adversarial transferabil… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.08572

  12. arXiv:2409.02246  [pdf, other

    cs.LG math.OC

    Multi-Agent Reinforcement Learning for Joint Police Patrol and Dispatch

    Authors: Matthew Repasky, He Wang, Yao Xie

    Abstract: Police patrol units need to split their time between performing preventive patrol and being dispatched to serve emergency incidents. In the existing literature, patrol and dispatch decisions are often studied separately. We consider joint optimization of these two decisions to improve police operations efficiency and reduce response time to emergency calls. Methodology/results: We propose a novel… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  13. arXiv:2409.02124  [pdf, other

    cs.LG cs.AI

    TrajWeaver: Trajectory Recovery with State Propagation Diffusion Model

    Authors: Jinming Wang, Hai Wang, Hongkai Wen, Geyong Min, Man Luo

    Abstract: With the proliferation of location-aware devices, large amount of trajectories have been generated when agents such as people, vehicles and goods flow around the urban environment. These raw trajectories, typically collected from various sources such as GPS in cars, personal mobile devices, and public transport, are often sparse and fragmented due to limited sampling rates, infrastructure coverage… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: First submission, extended to 10 pages include ref

  14. arXiv:2409.02046  [pdf, other

    cs.CV

    Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

    Authors: Hu Wang, David Butler, Yuan Zhang, Jodie Avery, Steven Knox, Congbo Ma, Louise Hull, Gustavo Carneiro

    Abstract: Endometriosis, affecting about 10\% of individuals assigned female at birth, is challenging to diagnose and manage. Diagnosis typically involves the identification of various signs of the disease using either laparoscopic surgery or the analysis of T1/T2 MRI images, with the latter being quicker and cheaper but less accurate. A key diagnostic sign of endometriosis is the obliteration of the Pouch… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

  15. arXiv:2409.02008  [pdf, other

    cs.NI cs.AI cs.DC

    When Digital Twin Meets 6G: Concepts, Obstacles, and Research Prospects

    Authors: Wenshuai Liu, Yaru Fu, Zheng Shi, Hong Wang

    Abstract: The convergence of digital twin technology and the emerging 6G network presents both challenges and numerous research opportunities. This article explores the potential synergies between digital twin and 6G, highlighting the key challenges and proposing fundamental principles for their integration. We discuss the unique requirements and capabilities of digital twin in the context of 6G networks, s… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 7 pages, 6 figures

  16. arXiv:2409.01995  [pdf, other

    eess.AS cs.AI cs.SD

    vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

    Authors: Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang, Shuai Wang, Xie Chen, Kai Yu

    Abstract: We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice conversion (VC). We use discrete tokens from speech self-supervised models as the content features of source speech, and treat VC as a prompted vocoding task. To amend the loss of speaker timbre in the content tokens, vec2wav 2.0 utilizes the WavLM features to provide strong timbre-dependent information. A novel adap… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures

  17. arXiv:2409.01662  [pdf, other

    cs.CV

    Efficiently Expanding Receptive Fields: Local Split Attention and Parallel Aggregation for Enhanced Large-scale Point Cloud Semantic Segmentation

    Authors: Haodong Wang, Chongyu Wang, Yinghui Quan, Di Wang

    Abstract: Expanding the receptive field in a deep learning model for large-scale 3D point cloud segmentation is an effective technique for capturing rich contextual information, which consequently enhances the network's ability to learn meaningful features. However, this often leads to increased computational complexity and risk of overfitting, challenging the efficiency and effectiveness of the learning pa… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  18. arXiv:2409.01545  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation

    Authors: Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, Berlin Chen, Hsin-Min Wang

    Abstract: Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited tar… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted to IEEE SLT 2024

  19. arXiv:2409.01541  [pdf, other

    cs.CV cs.CR

    Purification-Agnostic Proxy Learning for Agentic Copyright Watermarking against Adversarial Evidence Forgery

    Authors: Erjin Bao, Ching-Chun Chang, Hanrui Wang, Isao Echizen

    Abstract: With the proliferation of AI agents in various domains, protecting the ownership of AI models has become crucial due to the significant investment in their development. Unauthorized use and illegal distribution of these models pose serious threats to intellectual property, necessitating effective copyright protection measures. Model watermarking has emerged as a key technique to address this issue… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  20. arXiv:2409.01148  [pdf, other

    cs.CV cs.AI

    FMRFT: Fusion Mamba and DETR for Query Time Sequence Intersection Fish Tracking

    Authors: Mingyuan Yao, Yukang Huo, Qingbin Tian, Jiayin Zhao, Xiao Liu, Ruifeng Wang, Haihua Wang

    Abstract: Growth, abnormal behavior, and diseases of fish can be early detected by monitoring fish tracking through the method of image processing, which is of great significance for factory aquaculture. However, underwater reflections and some reasons with fish, such as the high similarity , rapid swimming caused by stimuli and multi-object occlusion bring challenges to multi-target tracking of fish. To ad… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 14 pages,14 figures

  21. arXiv:2409.01055  [pdf, other

    cs.CV

    Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation

    Authors: Qihua Chen, Yue Ma, Hongfa Wang, Junkun Yuan, Wenzhe Zhao, Qi Tian, Hongmei Wang, Shaobo Min, Qifeng Chen, Wei Liu

    Abstract: This paper explores higher-resolution video outpainting with extensive content generation. We point out common issues faced by existing methods when attempting to largely outpaint videos: the generation of low-quality content and limitations imposed by GPU memory. To address these challenges, we propose a diffusion-based method called \textit{Follow-Your-Canvas}. It builds upon two core designs. F… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Github: https://github.com/mayuelala/FollowYourCanvas Page: https://follow-your-canvas.github.io/

  22. arXiv:2409.01027  [pdf

    cs.HC

    Mindscape: Research of high-information density street environments based on electroencephalogram recording and virtual reality head-mounted simulation

    Authors: Yijiang Liu, Xiangyu Guan, Hui Wang, Lun Liu

    Abstract: This study aims to investigate, through neuroscientific methods, the effects of particular architectural elements on pedestrian spatial cognition and experience in the analysis and design of walking street spaces. More precisely, this paper will describe the impact of the density variation of storefront signs on the brainwaves of passersby in East Asian city walking streets, providing strategies a… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 10 pages, 10 figures, This paper has been accepted at the eCAADe 2024 Conference

    ACM Class: J.6

  23. arXiv:2409.00962  [pdf, other

    cs.HC

    Mental-Gen: A Brain-Computer Interface-Based Interactive Method for Interior Space Generative Design

    Authors: Yijiang Liu, Hui Wang

    Abstract: Interior space design significantly influences residents' daily lives. However, the process often presents high barriers and complex reasoning for users, leading to semantic losses in articulating comprehensive requirements and communicating them to designers. This study proposes the Mental-Gen design method, which focuses on interpreting users' spatial design intentions at neural level and expres… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 18 pages, 8 figures

    ACM Class: J.6

  24. arXiv:2409.00925  [pdf, other

    eess.SP cs.IT

    Convolutional Beamspace Beamforming for Low-Complexity Far-Field and Near-Field MU-MIMO Communications

    Authors: Chao Feng, Huizhi Wang, Yong Zeng

    Abstract: Inter-user interference (IUI) mitigation has been an essential issue for multi-user multiple-input multiple-output (MU-MIMO) communications. The commonly used linear processing schemes include the maximum-ratio combining (MRC), zero-forcing (ZF) and minimum mean squared error (MMSE) beamforming, which may result in the unfavorable performance or complexity as the antenna number grows. In this pape… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  25. arXiv:2409.00889  [pdf, ps, other

    cs.IT

    Geno-Weaving: Low-Complexity Capacity-Achieving DNA Storage

    Authors: Hsin-Po Wang, Venkatesan Guruswami

    Abstract: As a possible implementation of data storage using DNA, multiple strands of DNA are stored in a liquid container so that, in the future, they can be read by an array of DNA readers in parallel. These readers will sample the strands with replacement to produce a random number of noisy reads for each strand. An essential component of such a data storage system is how to reconstruct data out of these… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 18 pages, 5 figures

  26. arXiv:2409.00744  [pdf, other

    cs.CV cs.RO

    DSLO: Deep Sequence LiDAR Odometry Based on Inconsistent Spatio-temporal Propagation

    Authors: Huixin Zhang, Guangming Wang, Xinrui Wu, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Hesheng Wang

    Abstract: This paper introduces a 3D point cloud sequence learning model based on inconsistent spatio-temporal propagation for LiDAR odometry, termed DSLO. It consists of a pyramid structure with a spatial information reuse strategy, a sequential pose initialization module, a gated hierarchical pose refinement module, and a temporal feature propagation module. First, spatial features are encoded using a poi… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 6 pages, 5 figures, accepted by IROS 2024

  27. arXiv:2409.00486  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Multi-scale Multi-instance Visual Sound Localization and Segmentation

    Authors: Shentong Mo, Haofan Wang

    Abstract: Visual sound localization is a typical and challenging problem that predicts the location of objects corresponding to the sound source in a video. Previous methods mainly used the audio-visual association between global audio and one-scale visual features to localize sounding objects in each image. Despite their promising performance, they omitted multi-scale visual features of the corresponding i… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  28. arXiv:2409.00461  [pdf, other

    cs.IT

    Interference-Cancellation-Based Channel Knowledge Map Construction and Its Applications to Channel Estimation

    Authors: Wenjun Jiang, Xiaojun Yuan, Boyu Teng, Hao Wang, Jing Qian

    Abstract: Channel knowledge map (CKM) is viewed as a digital twin of wireless channels, providing location-specific channel knowledge for environment-aware communications. A fundamental problem in CKM-assisted communications is how to construct the CKM efficiently. Current research focuses on interpolating or predicting channel knowledge based on error-free channel knowledge from measured regions, ignoring… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  29. Dynamical system prediction from sparse observations using deep neural networks with Voronoi tessellation and physics constraint

    Authors: Hanyang Wang, Hao Zhou, Sibo Cheng

    Abstract: Despite the success of various methods in addressing the issue of spatial reconstruction of dynamical systems with sparse observations, spatio-temporal prediction for sparse fields remains a challenge. Existing Kriging-based frameworks for spatio-temporal sparse field prediction fail to meet the accuracy and inference time required for nonlinear dynamic prediction problems. In this paper, we intro… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Journal ref: Computer Methods in Applied Mechanics and Engineering. 2024 Dec 1

  30. arXiv:2409.00388  [pdf, other

    cs.CV

    A method for detecting dead fish on large water surfaces based on improved YOLOv10

    Authors: Qingbin Tian, Yukang Huo, Mingyuan Yao, Haihua Wang

    Abstract: Dead fish frequently appear on the water surface due to various factors. If not promptly detected and removed, these dead fish can cause significant issues such as water quality deterioration, ecosystem damage, and disease transmission. Consequently, it is imperative to develop rapid and effective detection methods to mitigate these challenges. Conventional methods for detecting dead fish are ofte… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  31. arXiv:2408.17209  [pdf, other

    cs.DB

    Updateable Data-Driven Cardinality Estimator with Bounded Q-error

    Authors: Yingze Li, Xianglong Liu, Hongzhi Wang, Kaixin Zhang, Zixuan Wang

    Abstract: Modern Cardinality Estimators struggle with data updates. This research tackles this challenge within single-table. We introduce ICE, an Index-based Cardinality Estimator, the first data-driven estimator that enables instant, tuple-leveled updates. ICE has learned two key lessons from the multidimensional index and applied them to solve cardinality estimation in dynamic scenarios: (1) Index poss… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  32. arXiv:2408.16979  [pdf, other

    cs.CV

    Cross Fusion RGB-T Tracking with Bi-directional Adapter

    Authors: Zhirong Zeng, Xiaotao Liu, Meng Sun, Hongyu Wang, Jing Liu

    Abstract: Many state-of-the-art RGB-T trackers have achieved remarkable results through modality fusion. However, these trackers often either overlook temporal information or fail to fully utilize it, resulting in an ineffective balance between multi-modal and temporal information. To address this issue, we propose a novel Cross Fusion RGB-T Tracking architecture (CFBT) that ensures the full participation o… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  33. arXiv:2408.16767  [pdf, other

    cs.CV cs.AI cs.GR

    ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

    Authors: Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan

    Abstract: Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Project page: https://liuff19.github.io/ReconX

  34. arXiv:2408.16766  [pdf, other

    cs.CV

    CSGO: Content-Style Composition in Text-to-Image Generation

    Authors: Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li

    Abstract: The diffusion model has shown exceptional capabilities in controlled image generation, which has further fueled interest in image style transfer. Existing works mainly focus on training free-based methods (e.g., image inversion) due to the scarcity of specific data. In this study, we present a data construction pipeline for content-style-stylized image triplets that generates and automatically cle… ▽ More

    Submitted 4 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

  35. arXiv:2408.16757  [pdf, other

    cs.CV cs.AI

    Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks

    Authors: Hongjun Wang, Sagar Vaze, Kai Han

    Abstract: Detecting test-time distribution shift has emerged as a key capability for safely deployed machine learning models, with the question being tackled under various guises in recent years. In this paper, we aim to provide a consolidated view of the two largest sub-fields within the community: out-of-distribution (OOD) detection and open-set recognition (OSR). In particular, we aim to provide rigorous… ▽ More

    Submitted 29 August, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted to IJCV, preprint version; v2: add supplementary

  36. CanCal: Towards Real-time and Lightweight Ransomware Detection and Response in Industrial Environments

    Authors: Shenao Wang, Feng Dong, Hangfeng Yang, Jingheng Xu, Haoyu Wang

    Abstract: Ransomware attacks have emerged as one of the most significant cybersecurity threats. Despite numerous proposed detection and defense methods, existing approaches face two fundamental limitations in large-scale industrial applications: intolerable system overheads and notorious alert fatigue. To address these challenges, we propose CanCal, a real-time and lightweight ransomware detection system. S… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: To appear in the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS'24), October 14--18, 2024, Salt Lake City

  37. arXiv:2408.16313  [pdf, other

    cs.CV cs.AI

    FA-YOLO: Research On Efficient Feature Selection YOLO Improved Algorithm Based On FMDS and AGMF Modules

    Authors: Yukang Huo, Mingyuan Yao, Qingbin Tian, Tonghao Wang, Ruifeng Wang, Haihua Wang

    Abstract: Over the past few years, the YOLO series of models has emerged as one of the dominant methodologies in the realm of object detection. Many studies have advanced these baseline models by modifying their architectures, enhancing data quality, and developing new loss functions. However, current models still exhibit deficiencies in processing feature maps, such as overlooking the fusion of cross-scale… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 11 pages and 4 figures

  38. arXiv:2408.16307  [pdf, other

    cs.RO cs.AI

    Safe Bayesian Optimization for High-Dimensional Control Systems via Additive Gaussian Processes

    Authors: Hongxuan Wang, Xiaocong Li, Adrish Bhaumik, Prahlad Vadakkepat

    Abstract: Controller tuning and optimization have been among the most fundamental problems in robotics and mechatronic systems. The traditional methodology is usually model-based, but its performance heavily relies on an accurate mathematical model of the system. In control applications with complex dynamics, obtaining a precise model is often challenging, leading us towards a data-driven approach. While op… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  39. arXiv:2408.16094  [pdf, ps, other

    cs.DC

    Monadring: A lightweight consensus protocol to offer Validation-as-a-Service to AVS nodes

    Authors: Yu Zhang, Xiao Yan, Gang Tang, Helena Wang

    Abstract: Existing blockchain networks are often large-scale, requiring transactions to be synchronized across the entire network to reach consensus. On-chain computations can be prohibitively expensive, making many CPU-intensive computations infeasible. Inspired by the structure of IBM's token ring networks, we propose a lightweight consensus protocol called Monadring to address these issues. Monadring all… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 23 pages, 3 figures

  40. arXiv:2408.16061  [pdf, other

    cs.CV

    3D Reconstruction with Spatial Memory

    Authors: Hengyi Wang, Lourdes Agapito

    Abstract: We present Spann3R, a novel approach for dense 3D reconstruction from ordered or unordered image collections. Built on the DUSt3R paradigm, Spann3R uses a transformer-based architecture to directly regress pointmaps from images without any prior knowledge of the scene or camera parameters. Unlike DUSt3R, which predicts per image-pair pointmaps each expressed in its local coordinate frame, Spann3R… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Project page: \url{https://hengyiwang.github.io/projects/spanner}

  41. arXiv:2408.15778  [pdf, other

    cs.AI cs.CL

    LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

    Authors: Jiayi Gui, Yiming Liu, Jiale Cheng, Xiaotao Gu, Xiao Liu, Hongning Wang, Yuxiao Dong, Jie Tang, Minlie Huang

    Abstract: Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to logical reasoning and critical for practical LLM agents and decision-making systems. However, evaluating LLMs as effective rule-based executors and planners remains under… ▽ More

    Submitted 5 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  42. arXiv:2408.15207  [pdf, other

    cs.SE

    Investigating Coverage Criteria in Large Language Models: An In-Depth Study Through Jailbreak Attacks

    Authors: Shide Zhou, Tianlin Li, Kailong Wang, Yihao Huang, Ling Shi, Yang Liu, Haoyu Wang

    Abstract: The swift advancement of large language models (LLMs) has profoundly shaped the landscape of artificial intelligence; however, their deployment in sensitive domains raises grave concerns, particularly due to their susceptibility to malicious exploitation. This situation underscores the insufficiencies in pre-deployment testing, highlighting the urgent need for more rigorous and comprehensive evalu… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  43. arXiv:2408.14506  [pdf, other

    cs.LG

    Distilling Long-tailed Datasets

    Authors: Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

    Abstract: Dataset distillation (DD) aims to distill a small, information-rich dataset from a larger one for efficient neural network training. However, existing DD methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) Expert networks trained on imbalanced data develop biased gradi… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  44. arXiv:2408.14505  [pdf, other

    cs.LG cs.AI cs.CL

    Empowering Pre-Trained Language Models for Spatio-Temporal Forecasting via Decoupling Enhanced Discrete Reprogramming

    Authors: Hao Wang, Jindong Han, Wei Fan, Hao Liu

    Abstract: Spatio-temporal time series forecasting plays a critical role in various real-world applications, such as transportation optimization, energy management, and climate analysis. The recent advancements in Pre-trained Language Models (PLMs) have inspired efforts to reprogram these models for time series forecasting tasks, by leveraging their superior reasoning and generalization capabilities. However… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  45. arXiv:2408.14491  [pdf, other

    cs.LG cs.MM

    Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

    Authors: Clayton Cohn, Eduardo Davalos, Caleb Vatral, Joyce Horn Fonteles, Hanchen David Wang, Meiyi Ma, Gautam Biswas

    Abstract: Recent technological advancements have enhanced our ability to collect and analyze rich multimodal data (e.g., speech, video, and eye gaze) to better inform learning and training experiences. While previous reviews have focused on parts of the multimodal pipeline (e.g., conceptual models and data fusion), a comprehensive literature review on the methods informing multimodal learning and training e… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Submitted to ACM Computing Surveys. Currently under review

  46. arXiv:2408.14023  [pdf, other

    cs.CV cs.AI

    Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos

    Authors: Jiajun Fei, Dian Li, Zhidong Deng, Zekun Wang, Gang Liu, Hui Wang

    Abstract: Multi-modal large language models (MLLMs) have demonstrated considerable potential across various downstream tasks that require cross-domain knowledge. MLLMs capable of processing videos, known as Video-MLLMs, have attracted broad interest in video-language understanding. However, videos, especially long videos, contain more visual tokens than images, making them difficult for LLMs to process. Exi… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  47. arXiv:2408.13987  [pdf, other

    cs.CL cs.AI

    Focused Large Language Models are Stable Many-Shot Learners

    Authors: Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations. With the increase in available context length of LLMs, recent experiments have shown that the performance of ICL does not necessarily scale well in many-shot (demonstration) settings. We theoretically and experimentally confirm that the reason lies in more demonstrations… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 15 pages

  48. arXiv:2408.13770  [pdf, other

    cs.CV

    TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

    Authors: Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, Haoqian Wang

    Abstract: Compared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlap… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  49. arXiv:2408.13756  [pdf, ps, other

    cs.DS

    Revisit the Partial Coloring Method: Prefix Spencer and Sampling

    Authors: Dongrun Cai, Xue Chen, Wenxuan Shu, Haoyu Wang, Guangyi Zou

    Abstract: As the most powerful tool in discrepancy theory, the partial coloring method has wide applications in many problems including the Beck-Fiala problem and Spencer's celebrated result. Currently, there are two major algorithmic methods for the partial coloring method: the first approach uses linear algebraic tools; and the second is called Gaussian measure algorithm. We explore the advantages of thes… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  50. arXiv:2408.13738  [pdf, other

    cs.CL

    Poor-Supervised Evaluation for SuperLLM via Mutual Consistency

    Authors: Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: The guidance from capability evaluations has greatly propelled the progress of both human society and Artificial Intelligence. However, as LLMs evolve, it becomes challenging to construct evaluation benchmarks for them with accurate labels on hard tasks that approach the boundaries of human capabilities. To credibly conduct evaluation without accurate labels (denoted as poor-supervised evaluation)… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: ACL findings