Skip to main content

Showing 1–50 of 1,466 results for author: Guo, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02919  [pdf, other

    cs.CV

    HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

    Authors: Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qingfeng Liu, Yike Guo

    Abstract: The potential for higher-resolution image generation using pretrained diffusion models is immense, yet these models often struggle with issues of object repetition and structural artifacts especially when scaling to 4K resolution and higher. We figure out that the problem is caused by that, a single prompt for the generation of multiple scales provides insufficient efficacy. In response, we propos… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.01995  [pdf, other

    eess.AS cs.AI cs.SD

    vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders

    Authors: Yiwei Guo, Zhihan Li, Junjie Li, Chenpeng Du, Hankun Wang, Shuai Wang, Xie Chen, Kai Yu

    Abstract: We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice conversion (VC). We use discrete tokens from speech self-supervised models as the content features of source speech, and treat VC as a prompted vocoding task. To amend the loss of speaker timbre in the content tokens, vec2wav 2.0 utilizes the WavLM features to provide strong timbre-dependent information. A novel adap… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 5 pages, 4 figures

  3. arXiv:2409.01500  [pdf, other

    cs.CV

    Real-Time Multi-Scene Visibility Enhancement for Promoting Navigational Safety of Vessels Under Complex Weather Conditions

    Authors: Ryan Wen Liu, Yuxu Lu, Yuan Gao, Yu Guo, Wenqi Ren, Fenghua Zhu, Fei-Yue Wang

    Abstract: The visible-light camera, which is capable of environment perception and navigation assistance, has emerged as an essential imaging sensor for marine surface vessels in intelligent waterborne transportation systems (IWTS). However, the visual imaging quality inevitably suffers from several kinds of degradations (e.g., limited visibility, low contrast, color distortion, etc.) under complex weather… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 15 pages, 13 figures

    Journal ref: IEEE Transactions on Intelligent Transportation Systems, 2024

  4. arXiv:2409.01092  [pdf, other

    cs.ET cs.AI cs.NI

    Two-Timescale Synchronization and Migration for Digital Twin Networks: A Multi-Agent Deep Reinforcement Learning Approach

    Authors: Wenshuai Liu, Yaru Fu, Yongna Guo, Fu Lee Wang, Wen Sun, Yan Zhang

    Abstract: Digital twins (DTs) have emerged as a promising enabler for representing the real-time states of physical worlds and realizing self-sustaining systems. In practice, DTs of physical devices, such as mobile users (MUs), are commonly deployed in multi-access edge computing (MEC) networks for the sake of reducing latency. To ensure the accuracy and fidelity of DTs, it is essential for MUs to regularly… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 15 pages, 14 figures

    ACM Class: C.2.3; C.2.4

  5. GCCRR: A Short Sequence Gait Cycle Segmentation Method Based on Ear-Worn IMU

    Authors: Zhenye Xu, Yao Guo

    Abstract: This paper addresses the critical task of gait cycle segmentation using short sequences from ear-worn IMUs, a practical and non-invasive approach for home-based monitoring and rehabilitation of patients with impaired motor function. While previous studies have focused on IMUs positioned on the lower limbs, ear-worn IMUs offer a unique advantage in capturing gait dynamics with minimal intrusion. To… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by EarComp2024

  6. arXiv:2409.00304  [pdf, other

    cs.CV

    StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models

    Authors: Yuxiang Guo, Faizan Siddiqui, Yang Zhao, Rama Chellappa, Shao-Yuan Lo

    Abstract: Predicting and reasoning how a video would make a human feel is crucial for developing socially intelligent systems. Although Multimodal Large Language Models (MLLMs) have shown impressive video understanding capabilities, they tend to focus more on the semantic content of videos, often overlooking emotional stimuli. Hence, most existing MLLMs fall short in estimating viewers' emotional reactions… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  7. Deep learning surrogate models of JULES-INFERNO for wildfire prediction on a global scale

    Authors: Sibo Cheng, Hector Chassagnon, Matthew Kasoar, Yike Guo, Rossella Arcucci

    Abstract: Global wildfire models play a crucial role in anticipating and responding to changing wildfire regimes. JULES-INFERNO is a global vegetation and fire model simulating wildfire emissions and area burnt on a global scale. However, because of the high data dimensionality and system complexity, JULES-INFERNO's computational costs make it challenging to apply to fire risk forecasting with unseen initia… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

  8. arXiv:2409.00097  [pdf, other

    cs.CL cs.AI

    Large Language Models for Disease Diagnosis: A Scoping Review

    Authors: Shuang Zhou, Zidu Xu, Mian Zhang, Chunpu Xu, Yawen Guo, Zaifu Zhan, Sirui Ding, Jiashuo Wang, Kaishuai Xu, Yi Fang, Liqiao Xia, Jeremy Yeung, Daochen Zha, Mingquan Lin, Rui Zhang

    Abstract: Automatic disease diagnosis has become increasingly valuable in clinical practice. The advent of large language models (LLMs) has catalyzed a paradigm shift in artificial intelligence, with growing evidence supporting the efficacy of LLMs in diagnostic tasks. Despite the growing attention in this field, many critical research questions remain under-explored. For instance, what diseases and LLM tec… ▽ More

    Submitted 26 August, 2024; originally announced September 2024.

    Comments: 57 pages

  9. arXiv:2408.17175  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

    Authors: Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for audio tokenization. However, these codecs were or… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  10. arXiv:2408.16400  [pdf, other

    cs.CR

    Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection

    Authors: Yuejun Guo, Constantinos Patsakis, Qiang Hu, Qiang Tang, Fran Casino

    Abstract: The significant increase in software production driven by automation and faster development lifecycles has resulted in a corresponding surge in software vulnerabilities. In parallel, the evolving landscape of software vulnerability detection, highlighting the shift from traditional methods to machine learning and large language models (LLMs), provides massive opportunities at the cost of resource-… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted to ESORICS 2024

  11. arXiv:2408.16373  [pdf, other

    cs.SD eess.AS

    Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis

    Authors: Zehai Tu, Guangyan Zhang, Yiting Lu, Adaeze Adigwe, Simon King, Yiwen Guo

    Abstract: Tokenising continuous speech into sequences of discrete tokens and modelling them with language models (LMs) has led to significant success in text-to-speech (TTS) synthesis. Although these models can generate speech with high quality and naturalness, their synthesised samples can still suffer from artefacts, mispronunciation, word repeating, etc. In this paper, we argue these undesirable properti… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  12. arXiv:2408.16300  [pdf, other

    cs.NE math.OC

    A Distance Similarity-based Genetic Optimization Algorithm for Satellite Ground Network Planning Considering Feeding Mode

    Authors: Yingying Ren, Qiuli Li, Yangyang Guo, Witold Pedrycz, Lining Xing, Anfeng Liu, Yanjie Song

    Abstract: With the rapid development of the satellite industry, the information transmission network based on communication satellites has gradually become a major and important part of the future satellite ground integration network. However, the low transmission efficiency of the satellite data relay back mission has become a problem that is currently constraining the construction of the system and needs… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 25 pages

  13. arXiv:2408.16251  [pdf, other

    cs.IT eess.SP

    Neural Network-Assisted Hybrid Model Based Message Passing for Parametric Holographic MIMO Near Field Channel Estimation

    Authors: Zhengdao Yuan, Yabo Guo, Dawei Gao, Qinghua Guo, Zhongyong Wang, Chongwen Huang, Ming Jin, Kai-Kit Wong

    Abstract: Holographic multiple-input and multiple-output (HMIMO) is a promising technology with the potential to achieve high energy and spectral efficiencies, enhance system capacity and diversity, etc. In this work, we address the challenge of HMIMO near field (NF) channel estimation, which is complicated by the intricate model introduced by the dyadic Green's function. Despite its complexity, the channel… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  14. arXiv:2408.15076  [pdf, other

    cs.LG cs.AI

    MiWaves Reinforcement Learning Algorithm

    Authors: Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy

    Abstract: The escalating prevalence of cannabis use poses a significant public health challenge globally. In the U.S., cannabis use is more prevalent among emerging adults (EAs) (ages 18-25) than any other age group, with legalization in the multiple states contributing to a public perception that cannabis is less risky than in prior decades. To address this growing concern, we developed MiWaves, a reinforc… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.17739

  15. arXiv:2408.14972  [pdf, other

    cs.CL

    AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

    Authors: Chi-Min Chan, Jianxuan Yu, Weize Chen, Chunyang Jiang, Xinyu Liu, Weijie Shi, Zhiyuan Liu, Wei Xue, Yike Guo

    Abstract: The rapid advancement of large language models (LLMs) has led to the rise of LLM-based agents. Recent research shows that multi-agent systems (MAS), where each agent plays a specific role, can outperform individual LLMs. However, configuring an MAS for a task remains challenging, with performance only observable post-execution. Inspired by scaling laws in LLM development, we investigate whether MA… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  16. arXiv:2408.14957  [pdf, other

    cs.CV

    Applying ViT in Generalized Few-shot Semantic Segmentation

    Authors: Liyuan Geng, Jinhong Xia, Yuanhe Guo

    Abstract: This paper explores the capability of ViT-based models under the generalized few-shot semantic segmentation (GFSS) framework. We conduct experiments with various combinations of backbone models, including ResNets and pretrained Vision Transformer (ViT)-based models, along with decoders featuring a linear classifier, UPerNet, and Mask Transformer. The structure made of DINOv2 and linear classifier… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 7 pages, 4 figures

  17. arXiv:2408.14585  [pdf, other

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Audio-Visual Speaker Tracking with Incomplete Modalities

  18. arXiv:2408.14472  [pdf, other

    cs.RO cs.AI eess.SY

    Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

    Authors: Xinyang Gu, Yen-Jen Wang, Xiang Zhu, Chengming Shi, Yanjiang Guo, Yichen Liu, Jianyu Chen

    Abstract: Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinfor… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Robotics: Science and Systems (RSS), 2024. (Best Paper Award Finalist)

  19. arXiv:2408.14158  [pdf, other

    cs.DC cs.AI

    Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning

    Authors: Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, Jianzhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei , et al. (27 additional authors not shown)

    Abstract: The rapid progress in Deep Learning (DL) and Large Language Models (LLMs) has exponentially increased demands of computational power and bandwidth. This, combined with the high costs of faster computing chips and interconnects, has significantly inflated High Performance Computing (HPC) construction costs. To address these challenges, we introduce the Fire-Flyer AI-HPC architecture, a synergistic… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: This is the preprint version of the paper accepted for presentation at the 2024 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'24). \c{opyright} 2024 IEEE. Personal use of this material is permitted. For other uses, permission from IEEE must be obtained. Please refer to IEEE Xplore for the final published version

  20. arXiv:2408.13454  [pdf, other

    cs.CV

    AdaOcc: Adaptive-Resolution Occupancy Prediction

    Authors: Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren

    Abstract: Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computationa… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  21. arXiv:2408.13370  [pdf, other

    cs.CV cs.GR

    BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting

    Authors: Zhenyuan Liu, Yu Guo, Xinyuan Li, Bernd Bickel, Ran Zhang

    Abstract: We present Bidirectional Gaussian Primitives, an image-based novel view synthesis technique designed to represent and render 3D objects with surface and volumetric materials under dynamic illumination. Our approach integrates light intrinsic decomposition into the Gaussian splatting framework, enabling real-time relighting of 3D objects. To unify surface and volumetric material within a cohesive a… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  22. arXiv:2408.13005  [pdf, other

    cs.CV

    EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation

    Authors: Cong Wang, Jiaxi Gu, Panwen Hu, Haoyu Zhao, Yuanfan Guo, Jianhua Han, Hang Xu, Xiaodan Liang

    Abstract: Following the advancements in text-guided image generation technology exemplified by Stable Diffusion, video generation is gaining increased attention in the academic community. However, relying solely on text guidance for video generation has serious limitations, as videos contain much richer content than images, especially in terms of motion. This information can hardly be adequately described w… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  23. arXiv:2408.11656  [pdf, other

    cs.LG

    Macformer: Transformer with Random Maclaurin Feature Attention

    Authors: Yuhan Guo, Lizhong Ding, Ye Yuan, Guoren Wang

    Abstract: Random feature attention (RFA) adopts random fourier feature (RFF) methods to approximate the softmax function, resulting in a linear time and space attention mechanism that enables the construction of an efficient Transformer. Inspired by RFA, we propose Macformer, a Transformer architecture that employs random Maclaurin features (RMF) to approximate various dot-product kernels, thereby accelerat… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  24. arXiv:2408.11155  [pdf, other

    cs.RO

    Range-based Multi-Robot Integrity Monitoring Against Cyberattacks and Faults: An Anchor-Free Approach

    Authors: Vishnu Vijay, Kartik A. Pant, Minhyun Cho, Yifan Guo, James M. Goppert, Inseok Hwang

    Abstract: Coordination of multi-robot systems (MRSs) relies on efficient sensing and reliable communication among the robots. However, the sensors and communication channels of these robots are often vulnerable to cyberattacks and faults, which can disrupt their individual behavior and the overall objective of the MRS. In this work, we present a multi-robot integrity monitoring framework that utilizes inter… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 8 pages, 7 figures

  25. arXiv:2408.10527  [pdf, other

    cs.CV cs.AI

    EdgeNAT: Transformer for Efficient Edge Detection

    Authors: Jinghuai Jie, Yan Guo, Guixing Wu, Junmin Wu, Baojian Hua

    Abstract: Transformers, renowned for their powerful feature extraction capabilities, have played an increasingly prominent role in various vision tasks. Especially, recent advancements present transformer with hierarchical structures such as Dilated Neighborhood Attention Transformer (DiNAT), demonstrating outstanding ability to efficiently capture both global and local features. However, transformers' appl… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  26. arXiv:2408.10280  [pdf, other

    cs.LG

    NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

    Authors: Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo

    Abstract: In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition… ▽ More

    Submitted 27 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Work in progress, revisions ongoing

  27. arXiv:2408.09849  [pdf, other

    cs.CL cs.AI

    Importance Weighting Can Help Large Language Models Self-Improve

    Authors: Chunyang Jiang, Chi-min Chan, Wei Xue, Qifeng Liu, Yike Guo

    Abstract: Large language models (LLMs) have shown remarkable capability in numerous tasks and applications. However, fine-tuning LLMs using high-quality datasets under external supervision remains prohibitively expensive. In response, LLM self-improvement approaches have been vibrantly developed recently. The typical paradigm of LLM self-improvement involves training LLM on self-generated data, part of whic… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  28. arXiv:2408.09333  [pdf, other

    cs.CL

    SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama

    Authors: Jing Tang, Quanlu Jia, Yuqiang Xie, Zeyu Gong, Xiang Wen, Jiayi Zhang, Yalong Guo, Guibin Chen, Jiangping Yang

    Abstract: Generating high-quality shooting scripts containing information such as scene and shot language is essential for short drama script generation. We collect 6,660 popular short drama episodes from the Internet, each with an average of 100 short episodes, and the total number of short episodes is about 80,000, with a total duration of about 2,000 hours and totaling 10 terabytes (TB). We perform keyfr… ▽ More

    Submitted 28 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Comments: 18 pages, 12 figures

  29. arXiv:2408.09198  [pdf, other

    cs.RO

    Learning Based Toolpath Planner on Diverse Graphs for 3D Printing

    Authors: Yuming Huang, Yuhu Guo, Renbo Su, Xingjian Han, Junhao Ding, Tianyu Zhang, Tao Liu, Weiming Wang, Guoxin Fang, Xu Song, Emily Whiting, Charlie C. L. Wang

    Abstract: This paper presents a learning based planner for computing optimized 3D printing toolpaths on prescribed graphs, the challenges of which include the varying graph structures on different models and the large scale of nodes & edges on a graph. We adopt an on-the-fly strategy to tackle these challenges, formulating the planner as a Deep Q-Network (DQN) based optimizer to decide the next `best' node… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  30. arXiv:2408.09191  [pdf, other

    cs.CV

    GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System

    Authors: Shuo Wang, Yongcai Wang, Zhimin Xu, Yongyu Guo, Wanting Li, Zhe Huang, Xuewei Bai, Deying Li

    Abstract: For interacting with mobile objects in unfamiliar environments, simultaneously locating, mapping, and tracking the 3D poses of multiple objects are crucially required. This paper proposes a Tracklet Graph and Query Graph-based framework, i.e., GSLAMOT, to address this challenge. GSLAMOT utilizes camera and LiDAR multimodal information as inputs and divides the representation of the dynamic scene i… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 11 pages, 9 figures, ACM MM 2024

  31. arXiv:2408.09013  [pdf, other

    cs.LG eess.SP

    An optimal pairwise merge algorithm improves the quality and consistency of nonnegative matrix factorization

    Authors: Youdong Guo, Timothy E. Holy

    Abstract: Non-negative matrix factorization (NMF) is a key technique for feature extraction and widely used in source separation. However, existing algorithms may converge to poor local minima, or to one of several minima with similar objective value but differing feature parametrizations. Additionally, the performance of NMF greatly depends on the number of components, but choosing the optimal count remain… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  32. arXiv:2408.08537  [pdf, other

    cs.CR cs.SE

    SeeWasm: An Efficient and Fully-Functional Symbolic Execution Engine for WebAssembly Binaries

    Authors: Ningyu He, Zhehao Zhao, Hanqin Guan, Jikai Wang, Shuo Peng, Ding Li, Haoyu Wang, Xiangqun Chen, Yao Guo

    Abstract: WebAssembly (Wasm), as a compact, fast, and isolation-guaranteed binary format, can be compiled from more than 40 high-level programming languages. However, vulnerabilities in Wasm binaries could lead to sensitive data leakage and even threaten their hosting environments. To identify them, symbolic execution is widely adopted due to its soundness and the ability to automatically generate exploitat… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by ISSTA'24 Demo Track, the tool can be accessed at https://github.com/PKU-ASAL/SeeWasm

  33. arXiv:2408.08515  [pdf, other

    cs.SE

    Selecting Initial Seeds for Better JVM Fuzzing

    Authors: Tianchang Gao, Junjie Chen, Dong Wang, Yile Guo, Yingquan Zhao, Zan Wang

    Abstract: Literature in traditional program fuzzing has confirmed that effectiveness is largely impacted by redundancy among initial seeds, thereby proposing a series of seed selection methods. JVM fuzzing, compared to traditional ones, presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features. However, it remains unclear whether the ex… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  34. arXiv:2408.08260  [pdf, other

    cs.LG eess.SP

    GSVD-NMF: Recovering Missing Features in Non-negative Matrix Factorization

    Authors: Youdong Guo, Timothy E. Holy

    Abstract: Non-negative matrix factorization (NMF) is an important tool in signal processing and widely used to separate mixed sources into their components. However, NMF is NP-hard and thus may fail to discover the ideal factorization; moreover, the number of components may not be known in advance and thus features may be missed or incompletely separated. To recover missing components from under-complete NM… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  35. arXiv:2408.07476  [pdf, other

    cs.CV

    One Step Diffusion-based Super-Resolution with Time-Aware Distillation

    Authors: Xiao He, Huaao Tang, Zhijun Tu, Junchao Zhang, Kun Cheng, Hanting Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu

    Abstract: Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. However, these approaches typically require tens or even hundreds of iterative samplings, resulting in significant latency. Recently, techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowl… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 18 pages

  36. arXiv:2408.06776  [pdf, other

    eess.SY cs.AI

    Robust Deep Reinforcement Learning for Inverter-based Volt-Var Control in Partially Observable Distribution Networks

    Authors: Qiong Liu, Ye Guo, Tong Xu

    Abstract: Inverter-based volt-var control is studied in this paper. One key issue in DRL-based approaches is the limited measurement deployment in active distribution networks, which leads to problems of a partially observable state and unknown reward. To address those problems, this paper proposes a robust DRL approach with a conservative critic and a surrogate reward. The conservative critic utilizes the… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  37. How to Best Combine Demosaicing and Denoising?

    Authors: Yu Guo, Qiyu Jin, Jean-Michel Morel, Gabriele Facciolo

    Abstract: Image demosaicing and denoising play a critical role in the raw imaging pipeline. These processes have often been treated as independent, without considering their interactions. Indeed, most classic denoising methods handle noisy RGB images, not raw images. Conversely, most demosaicing methods address the demosaicing of noise free images. The real problem is to jointly denoise and demosaic noisy r… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: This paper was accepted by Inverse Problems and Imaging on October, 2023

    Journal ref: Inverse Problems and Imaging, 2024, 18(3):571-599

  38. arXiv:2408.06656  [pdf, other

    cs.RO

    MAPPO-PIS: A Multi-Agent Proximal Policy Optimization Method with Prior Intent Sharing for CAVs' Cooperative Decision-Making

    Authors: Yicheng Guo, Jiaqi Liu, Rongjie Yu, Peng Hang, Jian Sun

    Abstract: Vehicle-to-Vehicle (V2V) technologies have great potential for enhancing traffic flow efficiency and safety. However, cooperative decision-making in multi-agent systems, particularly in complex human-machine mixed merging areas, remains challenging for connected and autonomous vehicles (CAVs). Intent sharing, a key aspect of human coordination, may offer an effective solution to these decision-mak… ▽ More

    Submitted 26 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  39. Deep Inertia $L_p$ Half-Quadratic Splitting Unrolling Network for Sparse View CT Reconstruction

    Authors: Yu Guo, Caiying Wu, Yaxin Li, Qiyu Jin, Tieyong Zeng

    Abstract: Sparse view computed tomography (CT) reconstruction poses a challenging ill-posed inverse problem, necessitating effective regularization techniques. In this letter, we employ $L_p$-norm ($0<p<1$) regularization to induce sparsity and introduce inertial steps, leading to the development of the inertial $L_p$-norm half-quadratic splitting algorithm. We rigorously prove the convergence of this algor… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: This paper was accepted by IEEE Signal Processing Letters on July 28, 2024

    Journal ref: IEEE Signal Processing Letters, 2024, 31:2030-2034

  40. arXiv:2408.06569  [pdf, other

    cs.CL cs.AI

    Social Debiasing for Fair Multi-modal LLMs

    Authors: Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian Gan, Liqiang Nie

    Abstract: Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions based on attributes like race and gender. This paper addresses the issue of social biases in MLLMs by i) Introducing a comprehensive Counterfactual da… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  41. Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection

    Authors: Yixin Guo, Yu Liu, Jianghao Li, Weimin Wang, Qi Jia

    Abstract: Zero-shot human-object interaction (HOI) detector is capable of generalizing to HOI categories even not encountered during training. Inspired by the impressive zero-shot capabilities offered by CLIP, latest methods strive to leverage CLIP embeddings for improving zero-shot HOI detection. However, these embedding-based methods train the classifier on seen classes only, inevitably resulting in seen-… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  42. arXiv:2408.04344  [pdf, other

    cs.SE

    Semantic-Enhanced Indirect Call Analysis with Large Language Models

    Authors: Baijun Cheng, Cen Zhang, Kailong Wang, Ling Shi, Yang Liu, Haoyu Wang, Yao Guo, Xiangqun Chen

    Abstract: In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the pr… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ASE'24

  43. arXiv:2408.03538  [pdf, other

    cs.CV

    PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting

    Authors: Yijia Guo, Yuanxi Bai, Liwen Hu, Ziyi Guo, Mianzhi Liu, Yu Cai, Tiejun Huang, Lei Ma

    Abstract: We proposed Precomputed RadianceTransfer of GaussianSplats (PRTGS), a real-time high-quality relighting method for Gaussian splats in low-frequency lighting environments that captures soft shadows and interreflections by precomputing 3D Gaussian splats' radiance transfer. Existing studies have demonstrated that 3D Gaussian splatting (3DGS) outperforms neural fields' efficiency for dynamic lighting… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  44. arXiv:2408.03482  [pdf, other

    cs.CR

    Beyond App Markets: Demystifying Underground Mobile App Distribution Via Telegram

    Authors: Yanhui Guo, Dong Wang, Liu Wang, Yongsheng Fang, Chao Wang, Minghui Yang, Tianming Liu, Haoyu Wang

    Abstract: The thriving mobile app ecosystem encompasses a wide range of functionalities. However, within this ecosystem, a subset of apps provides illicit services such as gambling and pornography to pursue economic gains, collectively referred to as "underground economy apps". While previous studies have examined these apps' characteristics and identification methods, investigations into their distribution… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  45. arXiv:2408.01803  [pdf, other

    cs.LG cs.CL

    STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

    Authors: Peijie Dong, Lujun Li, Dayou Du, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo, Xiaowen Chu

    Abstract: In this paper, we present STBLLM, the first structural binarization framework for compressing Large Language Models (LLMs) to less than 1-bit precision. LLMs have achieved remarkable performance, but their heavy memory requirements have hindered widespread adoption, particularly on resource-constrained devices. Binarization, which quantifies weights to a mere 1-bit, achieves a milestone in increas… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  46. arXiv:2408.01471  [pdf, other

    cs.CV cs.RO

    Enhancing Online Road Network Perception and Reasoning with Standard Definition Maps

    Authors: Hengyuan Zhang, David Paz, Yuliang Guo, Arun Das, Xinyu Huang, Karsten Haug, Henrik I. Christensen, Liu Ren

    Abstract: Autonomous driving for urban and highway driving applications often requires High Definition (HD) maps to generate a navigation plan. Nevertheless, various challenges arise when generating and maintaining HD maps at scale. While recent online mapping methods have started to emerge, their performance especially for longer ranges is limited by heavy occlusion in dynamic environments. With these cons… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  47. arXiv:2408.01038  [pdf, other

    cs.CL

    UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents

    Authors: Yi Tu, Chong Zhang, Ya Guo, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang

    Abstract: The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role in various real-world scenarios and applications. However, the research in VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, and unsuitable task formulations. To address these challenges, we propose a query-aware entity extraction head, namely UNER, to collaborate wi… ▽ More

    Submitted 11 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: accepted by ACM Multimedia 2024

  48. arXiv:2407.21531  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

    Authors: Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo

    Abstract: Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step re… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ISMIR2024

  49. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  50. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report