Skip to main content

Showing 1–50 of 483 results for author: Zheng, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02979  [pdf, other

    cs.CV

    Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors

    Authors: Haiyu Wu, Jaskirat Singh, Sicong Tian, Liang Zheng, Kevin W. Bowyer

    Abstract: This paper studies how to synthesize face images of non-existent persons, to create a dataset that allows effective training of face recognition (FR) models. Two important goals are (1) the ability to generate a large number of distinct identities (inter-class separation) with (2) a wide variation in appearance of each identity (intra-class variation). However, existing works 1) are typically limi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02802  [pdf, other

    cs.LG cs.CR stat.ML

    Boosting Certificate Robustness for Time Series Classification with Efficient Self-Ensemble

    Authors: Chang Dong, Zhengyang Li, Liangwei Zheng, Weitong Chen, Wei Emma Zhang

    Abstract: Recently, the issue of adversarial robustness in the time series domain has garnered significant attention. However, the available defense mechanisms remain limited, with adversarial training being the predominant approach, though it does not provide theoretical guarantees. Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 6 figures, 4 tables, 10 pages

    ACM Class: H.3.3

  3. arXiv:2409.02581  [pdf, other

    cs.CV

    Object Gaussian for Monocular 6D Pose Estimation from Sparse Views

    Authors: Luqing Luo, Shichu Sun, Jiangang Yang, Linfang Zheng, Jinwei Du, Jian Liu

    Abstract: Monocular object pose estimation, as a pivotal task in computer vision and robotics, heavily depends on accurate 2D-3D correspondences, which often demand costly CAD models that may not be readily available. Object 3D reconstruction methods offer an alternative, among which recent advancements in 3D Gaussian Splatting (3DGS) afford a compelling potential. Yet its performance still suffers and tend… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2408.16258  [pdf, other

    cs.GR cs.CV

    Advancing Architectural Floorplan Design with Geometry-enhanced Graph Diffusion

    Authors: Sizhe Hu, Wenming Wu, Yuntao Wang, Benzhu Xu, Liping Zheng

    Abstract: Automating architectural floorplan design is vital for housing and interior design, offering a faster, cost-effective alternative to manual sketches by architects. However, existing methods, including rule-based and learning-based approaches, face challenges in design complexity and constrained generation with extensive post-processing, and tend to obvious geometric inconsistencies such as misalig… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  5. arXiv:2408.11172  [pdf, other

    cs.LG cs.AI cs.CL cs.LO

    SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

    Authors: Xueliang Zhao, Lin Zheng, Haige Bo, Changran Hu, Urmish Thakker, Lingpeng Kong

    Abstract: Formal theorem proving, a field at the intersection of mathematics and computer science, has seen renewed interest with advancements in large language models (LLMs). This paper introduces SubgoalXL, a novel approach that synergizes subgoal-based proofs with expert learning to enhance LLMs' capabilities in formal theorem proving within the Isabelle environment. SubgoalXL addresses two critical chal… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.08576  [pdf, other

    cs.CV

    Tuning a SAM-Based Model with Multi-Cognitive Visual Adapter to Remote Sensing Instance Segmentation

    Authors: Linghao Zheng, Xinyang Pu, Feng Xu

    Abstract: The Segment Anything Model (SAM), a foundational model designed for promptable segmentation tasks, demonstrates exceptional generalization capabilities, making it highly promising for natural scene image segmentation. However, SAM's lack of pretraining on massive remote sensing images and its interactive structure limit its automatic mask prediction capabilities. In this paper, a Multi-Cognitive S… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  7. arXiv:2408.07314  [pdf, other

    cs.LG cs.AI

    Kolmogorov-Arnold Networks (KAN) for Time Series Classification and Robust Analysis

    Authors: Chang Dong, Liangwei Zheng, Weitong Chen, Wei Emma Zhang

    Abstract: Kolmogorov-Arnold Networks (KAN) has recently attracted significant attention as a promising alternative to traditional Multi-Layer Perceptrons (MLP). Despite their theoretical appeal, KAN require validation on large-scale benchmark datasets. Time series data, which has become increasingly prevalent in recent years, especially univariate time series are naturally suited for validating KAN. Therefo… ▽ More

    Submitted 18 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: 14 pages, 8 figs

    ACM Class: I.2.0

  8. arXiv:2408.07092  [pdf, other

    cs.LG cs.AI cs.CL

    Post-Training Sparse Attention with Double Sparsity

    Authors: Shuo Yang, Ying Sheng, Joseph E. Gonzalez, Ion Stoica, Lianmin Zheng

    Abstract: The inference process for large language models is slow and memory-intensive, with one of the most critical bottlenecks being excessive Key-Value (KV) cache accesses. This paper introduces "Double Sparsity," a novel post-training sparse attention technique designed to alleviate this bottleneck by reducing KV cache access. Double Sparsity combines token sparsity, which focuses on utilizing only the… ▽ More

    Submitted 18 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

  9. arXiv:2408.07055  [pdf, other

    cs.CL cs.LG

    LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

    Authors: Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li

    Abstract: Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation length is inherently bounded by the sample it has seen during supervised fine-tuning (SFT). In other words, their output limitation is due to the scarc… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  10. arXiv:2408.06017  [pdf, other

    cs.CE

    HyperCAN: Hypernetwork-Driven Deep Parameterized Constitutive Models for Metamaterials

    Authors: Li Zheng, Dennis M. Kochmann, Siddhant Kumar

    Abstract: We introduce HyperCAN, a machine learning framework that utilizes hypernetworks to construct adaptable constitutive artificial neural networks for a wide range of beam-based metamaterials exhibiting diverse mechanical behavior under finite deformations. HyperCAN integrates an input convex network that models the nonlinear stress-strain map of a truss lattice, while ensuring adherence to fundamenta… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  11. arXiv:2408.04168   

    cs.AI

    Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions

    Authors: Qingbin Zeng, Qinglong Yang, Shunan Dong, Heming Du, Liang Zheng, Fengli Xu, Yong Li

    Abstract: This paper considers a scenario in city navigation: an AI agent is provided with language descriptions of the goal location with respect to some well-known landmarks; By only observing the scene around, including recognizing landmarks and road network connections, the agent has to make decisions to navigate to the goal location without instructions. This problem is very challenging, because it req… ▽ More

    Submitted 5 September, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: The experiment and dataset are not enough, and we need more experiments to verify our model

  12. arXiv:2408.01056  [pdf, other

    cs.RO

    The NING Humanoid: The Concurrent Design and Development of a Dynamic and Agile Platform

    Authors: Yan Ning, Song Liu, Taiwen Yang, Liang Zheng, Ling Shi

    Abstract: The recent surge of interest in agile humanoid robots achieving dynamic tasks like jumping and flipping necessitates the concurrent design of a robot platform that combines exceptional hardware performance with effective control algorithms. This paper introduces the NING Humanoid, an agile and robust platform aimed at achieving human-like athletic capabilities. The NING humanoid features high-torq… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: This is a workshop paper for ICRA 2024 in Japan. The workshop is Advancements in Trajectory Optimization and Model Predictive Control for Legged System on May 17th 2024, with the URL as: https://atompc-workshop.github.io/

  13. arXiv:2407.17618  [pdf

    cs.CY

    Productive self/vulnerable body: self-tracking, overworking culture, and conflicted data practices

    Authors: Elise Li Zheng

    Abstract: Self-tracking, the collection, analysis, and interpretation of personal data, signifies an individualized way of health governance as people are demanded to build a responsible self by internalizing norms. However, the technological promises often bear conflicts with various social factors such as a strenuous schedule, a lack of motivation, stress, and anxieties, which fail to deliver health outco… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  14. arXiv:2407.14936  [pdf, other

    cs.MM

    EidetiCom: A Cross-modal Brain-Computer Semantic Communication Paradigm for Decoding Visual Perception

    Authors: Linfeng Zheng, Peilin Chen, Shiqi Wang

    Abstract: Brain-computer interface (BCI) facilitates direct communication between the human brain and external systems by utilizing brain signals, eliminating the need for conventional communication methods such as speaking, writing, or typing. Nevertheless, the continuous generation of brain signals in BCI frameworks poses challenges for efficient storage and real-time transmission. While considering the h… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  15. arXiv:2407.13218  [pdf, other

    cs.LG cs.AI

    LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

    Authors: Fedor Borisyuk, Qingquan Song, Mingzhou Zhou, Ganesh Parameswaran, Madhu Arun, Siva Popuri, Tugrul Bingol, Zhuotao Pei, Kuang-Hsuan Lee, Lu Zheng, Qizhan Shao, Ali Naqvi, Sen Zhou, Aman Gupta

    Abstract: This paper introduces LiNR, LinkedIn's large-scale, GPU-based retrieval system. LiNR supports a billion-sized index on GPU models. We discuss our experiences and challenges in creating scalable, differentiable search indexes using TensorFlow and PyTorch at production scale. In LiNR, both items and model weights are integrated into the model binary. Viewing index construction as a form of model tra… ▽ More

    Submitted 7 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  16. arXiv:2407.09466  [pdf, other

    cs.RO cs.GR

    TRAVERSE: Traffic-Responsive Autonomous Vehicle Experience & Rare-event Simulation for Enhanced safety

    Authors: Sandeep Thalapanane, Sandip Sharan Senthil Kumar, Guru Nandhan Appiya Dilipkumar Peethambari, Sourang SriHari, Laura Zheng, Julio Poveda, Ming C. Lin

    Abstract: Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  17. arXiv:2407.08529  [pdf, other

    cs.CR

    Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

    Authors: Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

    Abstract: Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion a… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by DASFAA 2024, 16 pages

  18. arXiv:2407.07443  [pdf, other

    cs.AI

    Secondary Structure-Guided Novel Protein Sequence Generation with Latent Graph Diffusion

    Authors: Yutong Hu, Yang Tan, Andi Han, Lirong Zheng, Liang Hong, Bingxin Zhou

    Abstract: The advent of deep learning has introduced efficient approaches for de novo protein sequence design, significantly improving success rates and reducing development costs compared to computational or experimental methods. However, existing methods face challenges in generating proteins with diverse lengths and shapes while maintaining key structural features. To address these challenges, we introdu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 10 pages, 4 figures

  19. arXiv:2407.06951  [pdf, other

    cs.RO

    RoboCAS: A Benchmark for Robotic Manipulation in Complex Object Arrangement Scenarios

    Authors: Liming Zheng, Feng Yan, Fanfan Liu, Chengjian Feng, Zhuoliang Kang, Lin Ma

    Abstract: Foundation models hold significant potential for enabling robots to perform long-horizon general manipulation tasks. However, the simplicity of tasks and the uniformity of environments in existing benchmarks restrict their effective deployment in complex scenarios. To address this limitation, this paper introduces the \textit{RoboCAS} benchmark, the first benchmark specifically designed for comple… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  20. arXiv:2407.01601  [pdf, other

    cs.LG cs.AI

    Unveiling and Controlling Anomalous Attention Distribution in Transformers

    Authors: Ruiqing Yan, Xingbo Du, Haoyu Deng, Linghan Zheng, Qiuzhuang Sun, Jifang Hu, Yuhang Shao, Penghao Jiang, Jinrong Jiang, Lian Zhao

    Abstract: With the advent of large models based on the Transformer architecture, researchers have observed an anomalous phenomenon in the Attention mechanism--there is a very high attention on the first element, which is prevalent across Transformer-based models. It is crucial to understand it for the development of techniques focusing on attention distribution, such as Key-Value (KV) Cache compression and… ▽ More

    Submitted 3 July, 2024; v1 submitted 26 June, 2024; originally announced July 2024.

  21. arXiv:2406.19755  [pdf, other

    q-bio.QM cs.AI

    Protein Representation Learning with Sequence Information Embedding: Does it Always Lead to a Better Performance?

    Authors: Yang Tan, Lirong Zheng, Bozitao Zhong, Liang Hong, Bingxin Zhou

    Abstract: Deep learning has become a crucial tool in studying proteins. While the significance of modeling protein structure has been discussed extensively in the literature, amino acid types are typically included in the input as a default operation for many inference tasks. This study demonstrates with structure alignment task that embedding amino acid types in some cases may not help a deep learning mode… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 8 pages, 4 figures

  22. arXiv:2406.18977  [pdf, other

    cs.RO cs.CL cs.CV

    RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton

    Authors: Fanfan Liu, Feng Yan, Liming Zheng, Chengjian Feng, Yiyang Huang, Lin Ma

    Abstract: Utilizing Vision-Language Models (VLMs) for robotic manipulation represents a novel paradigm, aiming to enhance the model's ability to generalize to new objects and instructions. However, due to variations in camera specifications and mounting positions, existing methods exhibit significant performance disparities across different robotic platforms. To address this challenge, we propose RoboUniVie… ▽ More

    Submitted 12 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  23. arXiv:2406.16868  [pdf, other

    eess.SP cs.AI

    Neural Network-based Two-Dimensional Filtering for OTFS Symbol Detection

    Authors: Jiarui Xu, Karim Said, Lizhong Zheng, Lingjia Liu

    Abstract: Orthogonal time frequency space (OTFS) is a promising modulation scheme for wireless communication in high-mobility scenarios. Recently, a reservoir computing (RC) based approach has been introduced for online subframe-based symbol detection in the OTFS system, where only the limited over-the-air (OTA) pilot symbols are utilized for training. However, the previous RC-based approach does not design… ▽ More

    Submitted 8 March, 2024; originally announced June 2024.

    Comments: 6 pages, conference paper. arXiv admin note: substantial text overlap with arXiv:2311.08543

  24. arXiv:2406.09908  [pdf, other

    cs.LG cs.CV

    What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?

    Authors: Weijie Tu, Weijian Deng, Liang Zheng, Tom Gedeon

    Abstract: This work aims to develop a measure that can accurately rank the performance of various classifiers when they are tested on unlabeled data from out-of-distribution (OOD) distributions. We commence by demonstrating that conventional uncertainty metrics, notably the maximum Softmax prediction probability, possess inherent utility in forecasting model generalization across certain OOD contexts. Build… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: TMLR 2024 (https://openreview.net/forum?id=vtiDUgGjyx)

  25. Optimal Kernel Orchestration for Tensor Programs with Korch

    Authors: Muyan Hu, Ashwin Venkatram, Shreyashri Biswas, Balamurugan Marimuthu, Bohan Hou, Gabriele Oliaro, Haojie Wang, Liyan Zheng, Xupeng Miao, Jidong Zhai

    Abstract: Kernel orchestration is the task of mapping the computation defined in different operators of a deep neural network (DNN) to the execution of GPU kernels on modern hardware platforms. Prior approaches optimize kernel orchestration by greedily applying operator fusion, which fuses the computation of multiple operators into a single kernel, and miss a variety of optimization opportunities in kernel… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Fix some typos in the ASPLOS version

    Journal ref: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 3 (2024) 755-769

  26. arXiv:2406.09257  [pdf, other

    cs.LG cs.CV

    Assessing Model Generalization in Vicinity

    Authors: Yuchi Liu, Yifan Sun, Jingdong Wang, Liang Zheng

    Abstract: This paper evaluates the generalization ability of classification models on out-of-distribution test sets without depending on ground truth labels. Common approaches often calculate an unsupervised metric related to a specific model property, like confidence or invariance, which correlates with out-of-distribution accuracy. However, these metrics are typically computed for each test sample individ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  27. arXiv:2406.09187  [pdf, other

    cs.LG

    GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

    Authors: Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li

    Abstract: The rapid advancement of large language models (LLMs) has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the f… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  28. arXiv:2406.06977  [pdf, other

    cs.LG cs.DB

    Cross-domain-aware Worker Selection with Training for Crowdsourced Annotation

    Authors: Yushi Sun, Jiachuan Wang, Peng Cheng, Libin Zheng, Lei Chen, Jian Yin

    Abstract: Annotation through crowdsourcing draws incremental attention, which relies on an effective selection scheme given a pool of workers. Existing methods propose to select workers based on their performance on tasks with ground truth, while two important points are missed. 1) The historical performances of workers in other tasks. In real-world scenarios, workers need to solve a new task whose correlat… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ICDE 2024

  29. arXiv:2406.06776  [pdf, other

    cs.CV cs.LG

    SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models

    Authors: James Lowman, Kelly Liu Zheng, Roydon Fraser, Jesse Van Griensven The, Mojtaba Valipour

    Abstract: SeeFar is an evolving collection of multi-resolution satellite images from public and commercial satellites. We specifically curated this dataset for training geospatial foundation models, unconstrained by satellite type. In recent years, advances in technology have made satellite imagery more accessible than ever. More earth-observing satellites have been launched in the last five years than in t… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Work in Progress!

  30. arXiv:2406.06475  [pdf, other

    cs.IR cs.AI

    Survey for Landing Generative AI in Social and E-commerce Recsys -- the Industry Perspectives

    Authors: Da Xu, Danqing Zhang, Guangyu Yang, Bo Yang, Shuyuan Xu, Lingling Zheng, Cindy Liang

    Abstract: Recently, generative AI (GAI), with their emerging capabilities, have presented unique opportunities for augmenting and revolutionizing industrial recommender systems (Recsys). Despite growing research efforts at the intersection of these fields, the integration of GAI into industrial Recsys remains in its infancy, largely due to the intricate nature of modern industrial Recsys infrastructure, ope… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  31. arXiv:2406.05375  [pdf, other

    cs.AI cs.LG

    LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis

    Authors: Lecheng Zheng, Zhengzhang Chen, Dongjie Wang, Chengyuan Deng, Reon Matsuoka, Haifeng Chen

    Abstract: Root cause analysis (RCA) is crucial for enhancing the reliability and performance of complex systems. However, progress in this field has been hindered by the lack of large-scale, open-source datasets tailored for RCA. To bridge this gap, we introduce LEMMA-RCA, a large dataset designed for diverse RCA tasks across multiple domains and modalities. LEMMA-RCA features various real-world fault scena… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  32. arXiv:2406.04314  [pdf, other

    cs.CV

    Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

    Authors: Zhanhao Liang, Yuhui Yuan, Shuyang Gu, Bohan Chen, Tiankai Hang, Ji Li, Liang Zheng

    Abstract: Recently, Direct Preference Optimization (DPO) has extended its success from aligning large language models (LLMs) to aligning text-to-image diffusion models with human preferences. Unlike most existing DPO methods that assume all diffusion steps share a consistent preference order with the final generated images, we argue that this assumption neglects step-specific denoising performance and that… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  33. arXiv:2406.01431  [pdf, other

    cs.RO

    Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic

    Authors: Laura Zheng, Sanghyun Son, Jing Liang, Xijun Wang, Brian Clipp, Ming C. Lin

    Abstract: In trajectory forecasting tasks for traffic, future output trajectories can be computed by advancing the ego vehicle's state with predicted actions according to a kinematics model. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories should not only be kinematically feasible but also relate uncertainty from one timestep to the next. Whil… ▽ More

    Submitted 17 August, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 8 pages

  34. arXiv:2406.01425  [pdf, other

    cs.CV

    Sensitivity-Informed Augmentation for Robust Segmentation

    Authors: Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. Lin

    Abstract: Segmentation is an integral module in many visual computing applications such as virtual try-on, medical imaging, autonomous driving, and agricultural automation. These applications often involve either widespread consumer use or highly variable environments, both of which can degrade the quality of visual sensor data, whether from a common mobile phone or an expensive satellite imaging camera. In… ▽ More

    Submitted 16 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages

  35. arXiv:2405.20252  [pdf, other

    cs.CL

    Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization

    Authors: Yuchi Liu, Jaskirat Singh, Gaowen Liu, Ali Payani, Liang Zheng

    Abstract: Large language models (LLMs) have shown great progress in responding to user questions, allowing for a multitude of diverse applications. Yet, the quality of LLM outputs heavily depends on the prompt design, where a good prompt might enable the LLM to answer a very challenging question correctly. Therefore, recent works have developed many strategies for improving the prompt, including both manual… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  36. arXiv:2405.15013  [pdf, other

    cs.LG

    Make Inference Faster: Efficient GPU Memory Management for Butterfly Sparse Matrix Multiplication

    Authors: Antoine Gonon, Léon Zheng, Pascal Carrivain, Quoc-Tung Le

    Abstract: This paper is the first to assess the state of existing sparse matrix multiplication algorithms on GPU for the butterfly structure, a promising form of sparsity. This is achieved through a comprehensive benchmark that can be easily modified to add a new implementation. The goal is to provide a simple tool for users to select the optimal implementation based on their settings. Using this benchmark,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  37. arXiv:2405.14359  [pdf, other

    cs.IR

    Look into the Future: Deep Contextualized Sequential Recommendation

    Authors: Lei Zheng, Ning Li, Yanhuan Huang, Ruiwen Xu, Weinan Zhang, Yong Yu

    Abstract: Sequential recommendation aims to estimate how a user's interests evolve over time via uncovering valuable patterns from user behavior history. Many previous sequential models have solely relied on users' historical information to model the evolution of their interests, neglecting the crucial role that future information plays in accurately capturing these dynamics. However, effectively incorporat… ▽ More

    Submitted 14 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.18304 by other authors

  38. arXiv:2405.13548  [pdf, other

    cs.SE cs.CL

    ECLIPSE: Semantic Entropy-LCS for Cross-Lingual Industrial Log Parsing

    Authors: Wei Zhang, Xianfu Cheng, Yi Zhang, Jian Yang, Hongcheng Guo, Zhoujun Li, Xiaolin Yin, Xiangyuan Guan, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: Log parsing, a vital task for interpreting the vast and complex data produced within software architectures faces significant challenges in the transition from academic benchmarks to the industrial domain. Existing log parsers, while highly effective on standardized public datasets, struggle to maintain performance and efficiency when confronted with the sheer scale and diversity of real-world ind… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  39. arXiv:2405.12503  [pdf, other

    cs.CV

    CLRKDNet: Speeding up Lane Detection with Knowledge Distillation

    Authors: Weiqing Qi, Guoyang Zhao, Fulong Ma, Linwei Zheng, Ming Liu

    Abstract: Road lanes are integral components of the visual perception systems in intelligent vehicles, playing a pivotal role in safe navigation. In lane detection tasks, balancing accuracy with real-time performance is essential, yet existing methods often sacrifice one for the other. To address this trade-off, we introduce CLRKDNet, a streamlined model that balances detection accuracy with real-time perfo… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  40. arXiv:2405.12107  [pdf, other

    cs.CV cs.CL

    Imp: Highly Capable Large Multimodal Models for Mobile Devices

    Authors: Zhenwei Shao, Zhou Yu, Jun Yu, Xuecheng Ouyang, Lihao Zheng, Zhenbiao Gai, Mingyang Wang, Jiajun Ding

    Abstract: By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximiz… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: fix some typos and correct a few number in the tables

  41. arXiv:2405.09523  [pdf, ps, other

    math.ST cs.IT

    On Semi-supervised Estimation of Discrete Distributions under f-divergences

    Authors: Hasan Sabri Melihcan Erol, Lizhong Zheng

    Abstract: We study the problem of estimating the joint probability mass function (pmf) over two random variables. In particular, the estimation is based on the observation of $m$ samples containing both variables and $n$ samples missing one fixed variable. We adopt the minimax framework with $l^p_p$ loss functions. Recent work established that univariate minimax estimator combinations achieve minimax risk w… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Full version. Presented in ISIT-24. arXiv admin note: text overlap with arXiv:2305.07955

  42. arXiv:2405.07029  [pdf

    cs.SD eess.AS

    A framework of text-dependent speaker verification for chinese numerical string corpus

    Authors: Litong Zheng, Feng Hong, Weijie Xu, Wan Zheng

    Abstract: The Chinese numerical string corpus, serves as a valuable resource for speaker verification, particularly in financial transactions. Researches indicate that in short speech scenarios, text-dependent speaker verification (TD-SV) consistently outperforms text-independent speaker verification (TI-SV). However, TD-SV potentially includes the validation of text information, that can be negatively impa… ▽ More

    Submitted 21 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.01645

  43. arXiv:2404.15678  [pdf, other

    cs.IR cs.AI

    Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System

    Authors: Lei Zheng, Ning Li, Weinan Zhang, Yong Yu

    Abstract: Current recommendation systems are significantly affected by a serious issue of temporal data shift, which is the inconsistency between the distribution of historical data and that of online data. Most existing models focus on utilizing updated data, overlooking the transferable, temporal data shift-free information that can be learned from shifting data. We propose the Temporal Invariance of Asso… ▽ More

    Submitted 13 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  44. arXiv:2404.14850  [pdf, other

    cs.CL cs.LG q-bio.BM

    Simple, Efficient and Scalable Structure-aware Adapter Boosts Protein Language Models

    Authors: Yang Tan, Mingchen Li, Bingxin Zhou, Bozitao Zhong, Lirong Zheng, Pan Tan, Ziyi Zhou, Huiqun Yu, Guisheng Fan, Liang Hong

    Abstract: Fine-tuning Pre-trained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing Parameter-Efficient Fine-Tuning techniques could potentially enhance the performance of PLMs. However, the direct transfe… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 30 pages, 4 figures, 8 tables

  45. arXiv:2404.13016  [pdf, other

    cs.CV cs.LG stat.ML

    Optimizing Calibration by Gaining Aware of Prediction Correctness

    Authors: Yuchi Liu, Lei Wang, Yuli Zou, James Zou, Liang Zheng

    Abstract: Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification, a calibrator trained by the CE loss often produces high confidence on the wrongly pr… ▽ More

    Submitted 24 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  46. arXiv:2404.12135  [pdf, other

    cs.MA cs.CR cs.DC

    mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture

    Authors: Wei Zhang, Hongcheng Guo, Jian Yang, Yi Zhang, Chaoran Yan, Zhoujin Tian, Hangyuan Ji, Zhoujun Li, Tongliang Li, Tieqiao Zheng, Chao Chen, Yi Liang, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI… ▽ More

    Submitted 3 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  47. arXiv:2404.11943  [pdf, other

    cs.HC

    AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration

    Authors: Bo Pan, Jiaying Lu, Ke Wang, Li Zheng, Zhen Wen, Yingchaojie Feng, Minfeng Zhu, Wei Chen

    Abstract: The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  48. arXiv:2404.11139  [pdf, other

    cs.CV

    GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement

    Authors: Linfang Zheng, Tze Ho Elden Tse, Chen Wang, Yinghan Sun, Hua Chen, Ales Leonardis, Wei Zhang

    Abstract: Object pose refinement is essential for robust object pose estimation. Previous work has made significant progress towards instance-level object pose refinement. Yet, category-level pose refinement is a more challenging problem due to large shape variations within a category and the discrepancies between the target object and the shape prior. To address these challenges, we introduce a novel archi… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  49. arXiv:2404.09432  [pdf, other

    cs.CV cs.AI cs.LG

    The 8th AI City Challenge

    Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

    Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

  50. arXiv:2404.06860  [pdf, other

    cs.CV

    Monocular 3D lane detection for Autonomous Driving: Recent Achievements, Challenges, and Outlooks

    Authors: Fulong Ma, Weiqing Qi, Guoyang Zhao, Linwei Zheng, Sheng Wang, Yuxuan Liu, Ming Liu

    Abstract: 3D lane detection is essential in autonomous driving as it extracts structural and traffic information from the road in three-dimensional space, aiding self-driving cars in logical, safe, and comfortable path planning and motion control. Given the cost of sensors and the advantages of visual data in color information, 3D lane detection based on monocular vision is an important research direction i… ▽ More

    Submitted 19 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.