Skip to main content

Showing 1–50 of 1,191 results for author: Li, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.11307  [pdf, other

    cs.CV

    GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

    Authors: Yichen Zhang, Zihan Wang, Jiali Han, Peilin Li, Jiaxun Zhang, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: 3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering. However, 3DGS models typically overfit to single-scene training and are highly sensitive to the initialization of Gaussian ellipsoids, heuristically derived from Structure from Motion (SfM) point clouds, which limits both generali… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  2. arXiv:2409.11230  [pdf, other

    cs.RO

    Resilient and Adaptive Replanning for Multi-Robot Target Tracking with Sensing and Communication Danger Zones

    Authors: Peihan Li, Yuwei Wu, Jiazhen Liu, Gaurav S. Sukhatme, Vijay Kumar, Lifeng Zhou

    Abstract: Multi-robot collaboration for target tracking presents significant challenges in hazardous environments, including addressing robot failures, dynamic priority changes, and other unpredictable factors. Moreover, these challenges are increased in adversarial settings if the environment is unknown. In this paper, we propose a resilient and adaptive framework for multi-robot, multi-target tracking in… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  3. arXiv:2409.10899  [pdf, ps, other

    cs.DM math.CO

    Conflict-free chromatic index of trees

    Authors: Shanshan Guo, Ethan Y. H. Li, Luyi Li, Ping Li

    Abstract: A graph $G$ is conflict-free $k$-edge-colorable if there exists an assignment of $k$ colors to $E(G)$ such that for every edge $e\in E(G)$, there is a color that is assigned to exactly one edge among the closed neighborhood of $e$. The smallest $k$ such that $G$ is conflict-free $k$-edge-colorable is called the conflict-free chromatic index of $G$, denoted $χ'_{CF}(G)$. Dȩbski and Przyby\a{l}o sho… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  4. arXiv:2409.10141  [pdf, other

    cs.CV

    PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

    Authors: Peng Li, Wangguandong Zheng, Yuan Liu, Tao Yu, Yangguang Li, Xingqun Qi, Mengfei Li, Xiaowei Chi, Siyu Xia, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Detailed and photorealistic 3D human modeling is essential for various applications and has seen tremendous progress. However, full-body reconstruction from a monocular RGB image remains challenging due to the ill-posed nature of the problem and sophisticated clothing topology with self-occlusions. In this paper, we propose PSHuman, a novel framework that explicitly reconstructs human meshes utili… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  5. arXiv:2409.09564  [pdf, other

    cs.CV cs.AI

    TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings

    Authors: Dawei Yan, Pengcheng Li, Yang Li, Hao Chen, Qingguo Chen, Weihua Luo, Wei Dong, Qingsen Yan, Haokui Zhang, Chunhua Shen

    Abstract: Currently, inspired by the success of vision-language models (VLMs), an increasing number of researchers are focusing on improving VLMs and have achieved promising results. However, most existing methods concentrate on optimizing the connector and enhancing the language model component, while neglecting improvements to the vision encoder itself. In contrast, we propose Text Guided LLaVA (TG-LLaVA)… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

  6. arXiv:2409.09295  [pdf, other

    cs.RO

    GEVO: Memory-Efficient Monocular Visual Odometry Using Gaussians

    Authors: Dasong Gao, Peter Zhi Xuan Li, Vivienne Sze, Sertac Karaman

    Abstract: Constructing a high-fidelity representation of the 3D scene using a monocular camera can enable a wide range of applications on mobile devices, such as micro-robots, smartphones, and AR/VR headsets. On these devices, memory is often limited in capacity and its access often dominates the consumption of compute energy. Although Gaussian Splatting (GS) allows for high-fidelity reconstruction of 3D sc… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: 8 pages

  7. arXiv:2409.06078  [pdf, other

    cs.RO eess.SY

    PEERNet: An End-to-End Profiling Tool for Real-Time Networked Robotic Systems

    Authors: Aditya Narayanan, Pranav Kasibhatla, Minkyu Choi, Po-han Li, Ruihan Zhao, Sandeep Chinchali

    Abstract: Networked robotic systems balance compute, power, and latency constraints in applications such as self-driving vehicles, drone swarms, and teleoperated surgery. A core problem in this domain is deciding when to offload a computationally expensive task to the cloud, a remote server, at the cost of communication latency. Task offloading algorithms often rely on precise knowledge of system-specific p… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Accepted at IROS 2024

  8. arXiv:2409.04637  [pdf, other

    quant-ph cs.AI cs.CR cs.LG

    Enhancing Quantum Security over Federated Learning via Post-Quantum Cryptography

    Authors: Pingzhi Li, Tianlong Chen, Junyu Liu

    Abstract: Federated learning (FL) has become one of the standard approaches for deploying machine learning models on edge devices, where private training data are distributed across clients, and a shared model is learned by aggregating locally computed updates from each client. While this paradigm enhances communication efficiency by only requiring updates at the end of each training epoch, the transmitted… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: Submission for IEEE 2024 IEEE Workshop on Quantum IntelLigence, Learning & Security (QUILLS), https://sites.google.com/pitt.edu/quills/home

  9. arXiv:2409.04009  [pdf, other

    cs.CL

    Large Margin Prototypical Network for Few-shot Relation Classification with Fine-grained Features

    Authors: Miao Fan, Yeqi Bai, Mingming Sun, Ping Li

    Abstract: Relation classification (RC) plays a pivotal role in both natural language understanding and knowledge graph completion. It is generally formulated as a task to recognize the relationship between two entities of interest appearing in a free-text sentence. Conventional approaches on RC, regardless of feature engineering or deep learning based, can obtain promising performance on categorizing common… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by CIKM'19

  10. arXiv:2409.03449  [pdf, other

    cs.IR

    MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search

    Authors: Miao Fan, Jiacheng Guo, Shuai Zhu, Shuo Miao, Mingming Sun, Ping Li

    Abstract: Baidu runs the largest commercial web search engine in China, serving hundreds of millions of online users every day in response to a great variety of queries. In order to build a high-efficiency sponsored search engine, we used to adopt a three-layer funnel-shaped structure to screen and sort hundreds of ads from billions of ad candidates subject to the requirement of low response latency and the… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by KDD'19

  11. arXiv:2409.03272  [pdf, other

    cs.CV cs.RO

    OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving

    Authors: Julong Wei, Shanshuai Yuan, Pengfei Li, Qingda Hu, Zhongxue Gan, Wenchao Ding

    Abstract: The rise of multi-modal large language models(MLLMs) has spurred their applications in autonomous driving. Recent MLLM-based methods perform action by learning a direct mapping from perception to action, neglecting the dynamics of the world and the relations between action and world dynamics. In contrast, human beings possess world model that enables them to simulate the future states based on 3D… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  12. arXiv:2409.02919  [pdf, other

    cs.CV

    HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

    Authors: Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: The potential for higher-resolution image generation using pretrained diffusion models is immense, yet these models often struggle with issues of object repetition and structural artifacts especially when scaling to 4K resolution and higher. We figure out that the problem is caused by that, a single prompt for the generation of multiple scales provides insufficient efficacy. In response, we propos… ▽ More

    Submitted 9 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: https://liuxinyv.github.io/HiPrompt/

  13. arXiv:2409.02849  [pdf, other

    cs.NI

    Anomaly Detection in Offshore Open Radio Access Network Using Long Short-Term Memory Models on a Novel Artificial Intelligence-Driven Cloud-Native Data Platform

    Authors: Abdelrahim Ahmad, Peizheng Li, Robert Piechocki, Rui Inacio

    Abstract: The radio access network (RAN) is a critical component of modern telecom infrastructure, currently undergoing significant transformation towards disaggregated and open architectures. These advancements are pivotal for integrating intelligent, data-driven applications aimed at enhancing network reliability and operational autonomy through the introduction of cognition capabilities, exemplified by t… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 16 pages, 12 figures. This manuscript has been submitted to Elsevier for possible publication

  14. arXiv:2409.02139  [pdf, other

    cs.LG cs.AI cs.CR

    The Role of Transformer Models in Advancing Blockchain Technology: A Systematic Survey

    Authors: Tianxu Liu, Yanbin Wang, Jianguo Sun, Ye Tian, Yanyu Huang, Tao Xue, Peiyue Li, Yiwei Liu

    Abstract: As blockchain technology rapidly evolves, the demand for enhanced efficiency, security, and scalability grows.Transformer models, as powerful deep learning architectures,have shown unprecedented potential in addressing various blockchain challenges. However, a systematic review of Transformer applications in blockchain is lacking. This paper aims to fill this research gap by surveying over 200 rel… ▽ More

    Submitted 5 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

  15. arXiv:2409.01515  [pdf, other

    cs.CY

    METcross: A framework for short-term forecasting of cross-city metro passenger flow

    Authors: Wenbo Lu, Jinhua Xu, Peikun Li, Ting Wang, Yong Zhang

    Abstract: Metro operation management relies on accurate predictions of passenger flow in the future. This study begins by integrating cross-city (including source and target city) knowledge and developing a short-term passenger flow prediction framework (METcross) for the metro. Firstly, we propose a basic framework for modeling cross-city metro passenger flow prediction from the perspectives of data fusion… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  16. arXiv:2409.00973  [pdf, other

    cs.CV

    IVGF: The Fusion-Guided Infrared and Visible General Framework

    Authors: Fangcen Liu, Chenqiang Gao, Fang Chen, Pengcheng Li, Junjie Guo, Deyu Meng

    Abstract: Infrared and visible dual-modality tasks such as semantic segmentation and object detection can achieve robust performance even in extreme scenes by fusing complementary information. Most current methods design task-specific frameworks, which are limited in generalization across multiple tasks. In this paper, we propose a fusion-guided infrared and visible general framework, IVGF, which can be eas… ▽ More

    Submitted 14 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: 11 pages, 8 figures

  17. arXiv:2409.00614  [pdf, other

    cs.CL cs.AI

    DAMe: Personalized Federated Social Event Detection with Dual Aggregation Mechanism

    Authors: Xiaoyan Yu, Yifan Wei, Pu Li, Shuaishuai Zhou, Hao Peng, Li Sun, Liehuang Zhu, Philip S. Yu

    Abstract: Training social event detection models through federated learning (FedSED) aims to improve participants' performance on the task. However, existing federated learning paradigms are inadequate for achieving FedSED's objective and exhibit limitations in handling the inherent heterogeneity in social data. This paper proposes a personalized federated learning framework with a dual aggregation mechanis… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: CIKM 2024

  18. arXiv:2408.16031  [pdf, other

    cs.LG cs.AI

    EMP: Enhance Memory in Data Pruning

    Authors: Jinying Xiao, Ping Li, Jie Nie, Zhe Tang

    Abstract: Recently, large language and vision models have shown strong performance, but due to high pre-training and fine-tuning costs, research has shifted towards faster training via dataset pruning. Previous methods used sample loss as an evaluation criterion, aiming to select the most "difficult" samples for training. However, when the pruning rate increases, the number of times each sample is trained b… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  19. arXiv:2408.14368  [pdf, other

    cs.RO cs.AI

    GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy

    Authors: Peiyan Li, Hongtao Wu, Yan Huang, Chilam Cheang, Liang Wang, Tao Kong

    Abstract: The robotics community has consistently aimed to achieve generalizable robot manipulation with flexible natural language instructions. One of the primary challenges is that obtaining robot data fully annotated with both actions and texts is time-consuming and labor-intensive. However, partially annotated data, such as human activity videos without action labels and robot play data without language… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures, letter

  20. arXiv:2408.14189  [pdf, other

    cs.CV

    EMDFNet: Efficient Multi-scale and Diverse Feature Network for Traffic Sign Detection

    Authors: Pengyu Li, Chenhe Liu, Tengfei Li, Xinyu Wang, Shihui Zhang, Dongyang Yu

    Abstract: The detection of small objects, particularly traffic signs, is a critical subtask within object detection and autonomous driving. Despite the notable achievements in previous research, two primary challenges persist. Firstly, the main issue is the singleness of feature extraction. Secondly, the detection process fails to effectively integrate with objects of varying sizes or scales. These issues a… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 15 pages,5 figures,accepted to ICANN

  21. arXiv:2408.13724  [pdf, other

    cs.CV cs.RO

    PhysPart: Physically Plausible Part Completion for Interactable Objects

    Authors: Rundong Luo, Haoran Geng, Congyue Deng, Puhao Li, Zan Wang, Baoxiong Jia, Leonidas Guibas, Siyuan Huang

    Abstract: Interactable objects are ubiquitous in our daily lives. Recent advances in 3D generative models make it possible to automate the modeling of these objects, benefiting a range of applications from 3D printing to the creation of robot simulation environments. However, while significant progress has been made in modeling 3D shapes and appearances, modeling object physics, particularly for interactabl… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

  22. arXiv:2408.12142  [pdf, other

    cs.CL cs.AI

    MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents

    Authors: Congchi Yin, Feng Li, Shu Zhang, Zike Wang, Jun Shao, Piji Li, Jianhua Chen, Xun Jiang

    Abstract: The clinical diagnosis of most mental disorders primarily relies on the conversations between psychiatrist and patient. The creation of such diagnostic conversation datasets is promising to boost the AI mental healthcare community. However, directly collecting the conversations in real diagnosis scenarios is near impossible due to stringent privacy and ethical considerations. To address this issue… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  23. Timeline and Boundary Guided Diffusion Network for Video Shadow Detection

    Authors: Haipeng Zhou, Honqiu Wang, Tian Ye, Zhaohu Xing, Jun Ma, Ping Li, Qiong Wang, Lei Zhu

    Abstract: Video Shadow Detection (VSD) aims to detect the shadow masks with frame sequence. Existing works suffer from inefficient temporal learning. Moreover, few works address the VSD problem by considering the characteristic (i.e., boundary) of shadow. Motivated by this, we propose a Timeline and Boundary Guided Diffusion (TBGDiff) network for VSD where we take account of the past-future temporal guidanc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: ACM MM2024

  24. arXiv:2408.11280  [pdf, other

    cs.CV

    Exploring Scene Coherence for Semi-Supervised 3D Semantic Segmentation

    Authors: Chuandong Liu, Shuguo Jiang, Xingxing Weng, Lei Yu, Pengcheng Li, Gui-Song Xia

    Abstract: Semi-supervised semantic segmentation, which efficiently addresses the limitation of acquiring dense annotations, is essential for 3D scene understanding. Most methods leverage the teacher model to generate pseudo labels, and then guide the learning of the student model on unlabeled scenes. However, they focus only on points with pseudo labels while directly overlooking points without pseudo label… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  25. arXiv:2408.08518  [pdf, other

    cs.CV

    Visual-Friendly Concept Protection via Selective Adversarial Perturbations

    Authors: Xiaoyue Mi, Fan Tang, Juan Cao, Peng Li, Yang Liu

    Abstract: Personalized concept generation by tuning diffusion models with a few images raises potential legal and ethical concerns regarding privacy and intellectual property rights. Researchers attempt to prevent malicious personalization using adversarial perturbations. However, previous efforts have mainly focused on the effectiveness of protection while neglecting the visibility of perturbations. They u… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Under Review

  26. arXiv:2408.08147  [pdf, other

    cs.DC cs.CL cs.LG

    P/D-Serve: Serving Disaggregated Large Language Model at Scale

    Authors: Yibo Jin, Tao Wang, Huimin Lin, Mingyang Song, Peiyang Li, Yipeng Ma, Yicheng Shan, Zhengfan Yuan, Cailong Li, Yajing Sun, Tiandeng Wu, Xing Chu, Ruizhi Huan, Li Ma, Xiao You, Wenting Zhou, Yunpeng Ye, Wen Liu, Xiangkun Xu, Yongsheng Zhang, Tiantian Dong, Jiawei Zhu, Zhe Wang, Xijian Ju, Jianxun Song , et al. (5 additional authors not shown)

    Abstract: Serving disaggregated large language models (LLMs) over tens of thousands of xPU devices (GPUs or NPUs) with reliable performance faces multiple challenges. 1) Ignoring the diversity (various prefixes and tidal requests), treating all the prompts in a mixed pool is inadequate. To facilitate the similarity per scenario and minimize the inner mismatch on P/D (prefill and decoding) processing, fine-g… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  27. arXiv:2408.07613  [pdf, other

    cs.CV

    Rethinking the Key Factors for the Generalization of Remote Sensing Stereo Matching Networks

    Authors: Liting Jiang, Feng Wang, Wenyi Zhang, Peifeng Li, Hongjian You, Yuming Xiang

    Abstract: Stereo matching, a critical step of 3D reconstruction, has fully shifted towards deep learning due to its strong feature representation of remote sensing images. However, ground truth for stereo matching task relies on expensive airborne LiDAR data, thus making it difficult to obtain enough samples for supervised learning. To improve the generalization ability of stereo matching networks on cross-… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: submitted to IEEE jstars

  28. arXiv:2408.06634  [pdf, other

    q-fin.CP cs.AI cs.CL cs.LG q-fin.ST

    Harnessing Earnings Reports for Stock Predictions: A QLoRA-Enhanced LLM Approach

    Authors: Haowei Ni, Shuchen Meng, Xupeng Chen, Ziqing Zhao, Andi Chen, Panfeng Li, Shiyao Zhang, Qifu Yin, Yuanqing Wang, Yuxi Chan

    Abstract: Accurate stock market predictions following earnings reports are crucial for investors. Traditional methods, particularly classical machine learning models, struggle with these predictions because they cannot effectively process and interpret extensive textual data contained in earnings reports and often overlook nuances that influence market movements. This paper introduces an advanced approach b… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted by 2024 6th International Conference on Data-driven Optimization of Complex Systems

  29. arXiv:2408.05575  [pdf, other

    cs.AI cs.GT

    In-Context Exploiter for Extensive-Form Games

    Authors: Shuxin Li, Chang Yang, Youzhi Zhang, Pengdeng Li, Xinrun Wang, Xiao Huang, Hau Chan, Bo An

    Abstract: Nash equilibrium (NE) is a widely adopted solution concept in game theory due to its stability property. However, we observe that the NE strategy might not always yield the best results, especially against opponents who do not adhere to NE strategies. Based on this observation, we pose a new game-solving question: Can we learn a model that can exploit any, even NE, opponent to maximize their own u… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  30. arXiv:2408.04268  [pdf, other

    cs.CV cs.AI cs.LG

    Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods

    Authors: Yiming Zhou, Zixuan Zeng, Andi Chen, Xiaofan Zhou, Haowei Ni, Shiyao Zhang, Panfeng Li, Liangxi Liu, Mengyao Zheng, Xupeng Chen

    Abstract: Exploring the capabilities of Neural Radiance Fields (NeRF) and Gaussian-based methods in the context of 3D scene reconstruction, this study contrasts these modern approaches with traditional Simultaneous Localization and Mapping (SLAM) systems. Utilizing datasets such as Replica and ScanNet, we assess performance based on tracking accuracy, mapping fidelity, and view synthesis. Findings reveal th… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by 2024 6th International Conference on Data-driven Optimization of Complex Systems

  31. arXiv:2408.04034  [pdf, other

    cs.CV

    Task-oriented Sequential Grounding in 3D Scenes

    Authors: Zhuofan Zhang, Ziyu Zhu, Pengxiang Li, Tengyu Liu, Xiaojian Ma, Yixin Chen, Baoxiong Jia, Siyuan Huang, Qing Li

    Abstract: Grounding natural language in physical 3D environments is essential for the advancement of embodied artificial intelligence. Current datasets and models for 3D visual grounding predominantly focus on identifying and localizing objects from static, object-centric descriptions. These approaches do not adequately address the dynamic and sequential nature of task-oriented grounding necessary for pract… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: website: https://sg-3d.github.io/

  32. arXiv:2408.03728  [pdf, other

    cs.LG math.OC

    A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

    Authors: Pengxiang Zhao, Hanyu Hu, Ping Li, Yi Zheng, Zhefeng Wang, Xiaoming Yuan

    Abstract: Pruning is a critical strategy for compressing trained large language models (LLMs), aiming at substantial memory conservation and computational acceleration without compromising performance. However, existing pruning methods often necessitate inefficient retraining for billion-scale LLMs or rely on heuristic methods such as the optimal brain surgeon framework, which degrade performance. In this p… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  33. arXiv:2408.02907  [pdf, other

    cs.CL

    Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

    Authors: Tiezheng Guo, Chen Wang, Yanyi Liu, Jiawei Tang, Pan Li, Sai Xu, Qingwen Yang, Xianlin Gao, Zhi Li, Yingyou Wen

    Abstract: Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we pr… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  34. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  35. arXiv:2407.20912  [pdf, other

    cs.LG

    What Are Good Positional Encodings for Directed Graphs?

    Authors: Yinan Huang, Haoyu Wang, Pan Li

    Abstract: Positional encodings (PE) for graphs are essential in constructing powerful and expressive graph neural networks and graph transformers as they effectively capture relative spatial relations between nodes. While PEs for undirected graphs have been extensively studied, those for directed graphs remain largely unexplored, despite the fundamental role of directed graphs in representing entities with… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  36. arXiv:2407.19493  [pdf, ps, other

    cs.CV cs.AI cs.MM

    Official-NV: A News Video Dataset for Multimodal Fake News Detection

    Authors: Yihao Wang, Lizhi Chen, Zhong Qian, Peifeng Li

    Abstract: News media, especially video news media, have penetrated into every aspect of daily life, which also brings the risk of fake news. Therefore, multimodal fake news detection has recently received more attention. However, the number of fake news detection data sets for video modal is small, and these data sets are composed of unofficial videos uploaded by users, so there is too much useless data. To… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  37. ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models

    Authors: Peiming Li, Ziyi Wang, Mengyuan Liu, Hong Liu, Chen Chen

    Abstract: Grasp generation aims to create complex hand-object interactions with a specified object. While traditional approaches for hand generation have primarily focused on visibility and diversity under scene constraints, they tend to overlook the fine-grained hand-object interactions such as contacts, resulting in inaccurate and undesired grasps. To address these challenges, we propose a controllable gr… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: ACM Multimedia 2024

  38. arXiv:2407.19282  [pdf, other

    eess.IV cs.CV

    A self-supervised and adversarial approach to hyperspectral demosaicking and RGB reconstruction in surgical imaging

    Authors: Peichao Li, Oscar MacCormac, Jonathan Shapey, Tom Vercauteren

    Abstract: Hyperspectral imaging holds promises in surgical imaging by offering biological tissue differentiation capabilities with detailed information that is invisible to the naked eye. For intra-operative guidance, real-time spectral data capture and display is mandated. Snapshot mosaic hyperspectral cameras are currently seen as the most suitable technology given this requirement. However, snapshot mosa… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  39. WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds

    Authors: Peizhuo Li, Sebastian Starke, Yuting Ye, Olga Sorkine-Hornung

    Abstract: We present a new approach for understanding the periodicity structure and semantics of motion datasets, independently of the morphology and skeletal structure of characters. Unlike existing methods using an overly sparse high-dimensional latent, we propose a phase manifold consisting of multiple closed curves, each corresponding to a latent amplitude. With our proposed vector quantized periodic au… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024. Project page: https://peizhuoli.github.io/walkthedog Video: https://www.youtube.com/watch?v=tNVO2jqeTNw

  40. arXiv:2407.18716  [pdf, other

    cs.CL

    ChatSchema: A pipeline of extracting structured information with Large Multimodal Models based on schema

    Authors: Fei Wang, Yuewen Zheng, Qin Li, Jingyi Wu, Pengfei Li, Luxia Zhang

    Abstract: Objective: This study introduces ChatSchema, an effective method for extracting and structuring information from unstructured data in medical paper reports using a combination of Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) based on the schema. By integrating predefined schema, we intend to enable LMMs to directly extract and standardize information according to the schem… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  41. arXiv:2407.14676  [pdf, other

    cs.CV

    On Learning Discriminative Features from Synthesized Data for Self-Supervised Fine-Grained Visual Recognition

    Authors: Zihu Wang, Lingqiao Liu, Scott Ricardo Figueroa Weston, Samuel Tian, Peng Li

    Abstract: Self-Supervised Learning (SSL) has become a prominent approach for acquiring visual representations across various tasks, yet its application in fine-grained visual recognition (FGVR) is challenged by the intricate task of distinguishing subtle differences between categories. To overcome this, we introduce an novel strategy that boosts SSL's ability to extract critical discriminative features vita… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  42. arXiv:2407.13803  [pdf, other

    cs.CR cs.AI cs.CL

    Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality

    Authors: Duy C. Hoang, Hung T. Q. Le, Rui Chu, Ping Li, Weijie Zhao, Yingjie Lao, Khoa D. Doan

    Abstract: With the widespread adoption of Large Language Models (LLMs), concerns about potential misuse have emerged. To this end, watermarking has been adapted to LLM, enabling a simple and effective way to detect and monitor generated text. However, while the existing methods can differentiate between watermarked and unwatermarked text with high accuracy, they often face a trade-off between the quality of… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  43. arXiv:2407.13553  [pdf, other

    cs.CV

    SAM-Driven Weakly Supervised Nodule Segmentation with Uncertainty-Aware Cross Teaching

    Authors: Xingyue Zhao, Peiqi Li, Xiangde Luo, Meng Yang, Shi Chang, Zhongyu Li

    Abstract: Automated nodule segmentation is essential for computer-assisted diagnosis in ultrasound images. Nevertheless, most existing methods depend on precise pixel-level annotations by medical professionals, a process that is both costly and labor-intensive. Recently, segmentation foundation models like SAM have shown impressive generalizability on natural images, suggesting their potential as pseudo-lab… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ISBI 2024 Oral

  44. arXiv:2407.13284  [pdf, other

    cs.IR

    Semantic-aware Representation Learning for Homography Estimation

    Authors: Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang

    Abstract: Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a… ▽ More

    Submitted 5 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  45. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  46. arXiv:2407.11522  [pdf, other

    cs.CV

    FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

    Authors: Pengxiang Li, Zhi Gao, Bofei Zhang, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li

    Abstract: Vision language models (VLMs) have achieved impressive progress in diverse applications, becoming a prevalent research direction. In this paper, we build FIRE, a feedback-refinement dataset, consisting of 1.1M multi-turn conversations that are derived from 27 source datasets, empowering VLMs to spontaneously refine their responses based on user feedback across diverse tasks. To scale up the data c… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  47. Incremental high average-utility itemset mining: survey and challenges

    Authors: Jing Chen, Shengyi Yang, Weiping Ding, Peng Li, Aijun Liu, Hongjun Zhang, Tian Li

    Abstract: The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However, practical applications like market basket analysis and business decision-making necessitate regular updates of the database with new transactions. As a result, researc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 25 pages, 23 figures

  48. arXiv:2407.09057  [pdf, other

    cs.CV

    PersonificationNet: Making customized subject act like a person

    Authors: Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua

    Abstract: Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a Personifi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  49. arXiv:2407.08109  [pdf, other

    cs.CV cs.AI cs.LG

    Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter

    Authors: Suqi Song, Chenxu Zhang, Peng Zhang, Pengkun Li, Fenglong Song, Lei Zhang

    Abstract: Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Be… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  50. arXiv:2407.06394  [pdf, other

    cs.RO cs.MA

    Modeling and Analysis of Multi-Line Orders in Multi-Tote Storage and Retrieval Autonomous Mobile Robot Systems

    Authors: Xiaotao Shan, Yichao Jin, Peizheng Li, Koichi Kondo

    Abstract: As warehouses are emphasizing space utilization and the ability to handle multi-line orders, multi-tote storage and retrieval (MTSR) autonomous mobile robot systems, where robots directly retrieve totes from high shelves, are becoming increasingly popular. This paper presents a novel shared-token, multi-class, semi-open queueing network model to account for multi-line orders with general distribut… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures. This paper has been accepted for publication in IEEE 20th International Conference on Automation Science and Engineering (IEEE CASE 2024)