Skip to main content

Showing 1–50 of 685 results for author: Liang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01151  [pdf, other

    cs.CV cs.LG

    Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

    Authors: Yueqian Wang, Jianxin Liang, Yuxuan Wang, Huishuai Zhang, Dongyan Zhao

    Abstract: Hallucination is a common issue in Multimodal Large Language Models (MLLMs), yet the underlying principles remain poorly understood. In this paper, we investigate which components of MLLMs contribute to object hallucinations. To analyze image representations while completely avoiding the influence of all other factors other than the image representation itself, we propose a parametric-free represe… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  2. arXiv:2409.01128  [pdf, other

    cs.LG cs.CV

    Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning

    Authors: Jinglin Liang, Jin Zhong, Hanlin Gu, Zhongqi Lu, Xingxing Tang, Gang Dai, Shuangping Huang, Lixin Fan, Qiang Yang

    Abstract: Federated Class Continual Learning (FCCL) merges the challenges of distributed client learning with the need for seamless adaptation to new classes without forgetting old ones. The key challenge in FCCL is catastrophic forgetting, an issue that has been explored to some extent in Continual Learning (CL). However, due to privacy preservation requirements, some conventional methods, such as experien… ▽ More

    Submitted 3 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by ECCV 2024 Oral

  3. arXiv:2408.14975  [pdf, other

    cs.CV

    MegActor-$Σ$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

    Authors: Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan, Jin Wang

    Abstract: Diffusion models have demonstrated superior performance in the field of portrait animation. However, current approaches relied on either visual or audio modality to control character movements, failing to exploit the potential of mixed-modal control. This challenge arises from the difficulty in balancing the weak control strength of audio modality and the strong control strength of visual modality… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  4. arXiv:2408.14744  [pdf, other

    cs.CV cs.AI

    RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models

    Authors: Junyao Ge, Yang Zheng, Kaitai Guo, Jimin Liang

    Abstract: Abundant, well-annotated multimodal data in remote sensing are pivotal for aligning complex visual remote sensing (RS) scenes with human language, enabling the development of specialized vision language models across diverse RS interpretation tasks. However, annotating RS images with rich linguistic semantics at scale demands expertise in RS and substantial human labor, making it costly and often… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Submitted to ISPRS

    ACM Class: I.4.8; I.2.10

  5. arXiv:2408.12769  [pdf, other

    cs.CV cs.NI

    Enhancing Vehicle Environmental Awareness via Federated Learning and Automatic Labeling

    Authors: Chih-Yu Lin, Jin-Wei Liang

    Abstract: Vehicle environmental awareness is a crucial issue in improving road safety. Through a variety of sensors and vehicle-to-vehicle communication, vehicles can collect a wealth of data. However, to make these data useful, sensor data must be integrated effectively. This paper focuses on the integration of image data and vehicle-to-vehicle communication data. More specifically, our goal is to identify… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  6. arXiv:2408.11557  [pdf, other

    cs.IR

    A Quick, trustworthy spectral detection Q&A system based on the SDAAP Dataset and large language model

    Authors: Jiheng Liang, Ziru Yu, Zujie Xie, Xiangyang Yu

    Abstract: Large Language Model (LLM) has demonstrated significant success in a range of natural language processing (NLP) tasks within general domain. The emergence of LLM has introduced innovative methodologies across diverse fields, including the natural sciences. Researchers aim to implement automated, concurrent process driven by LLM to supplant conventional manual, repetitive and labor-intensive work.… ▽ More

    Submitted 23 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 16 pages,10 figures,3 tables

  7. arXiv:2408.10602  [pdf, other

    cs.CV cs.AI

    MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

    Authors: Jintao Cheng, Xingming Chen, Jinxin Liang, Xiaoyu Tang, Xieyuanli Chen, Dachuan Li

    Abstract: Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by f… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 7 pages, 4 figures

  8. arXiv:2408.09955  [pdf, other

    cs.MA

    MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems

    Authors: Qian Wang, Tianyu Wang, Qinbin Li, Jingsheng Liang, Bingsheng He

    Abstract: With the emergence of large language models (LLMs), LLM-powered multi-agent systems (LLM-MA systems) have been proposed to tackle real-world tasks. However, their agents mostly follow predefined Standard Operating Procedures (SOPs) that remain unchanged across the whole interaction, lacking autonomy and scalability. Additionally, current solutions often overlook the necessity for effective agent c… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  9. arXiv:2408.09103  [pdf

    cs.CE

    Provide Proactive Reproducible Analysis Transparency with Every Publication

    Authors: Paul Meijer, Nicole Howard, Jessica Liang, Autumn Kelsey, Sathya Subramanian, Ed Johnson, Paul Mariz, James Harvey, Madeline Ambrose, Vitalii Tereshchenko, Aldan Beaubien, Neelima Inala, Yousef Aggoune, Stark Pister, Anne Vetto, Melissa Kinsey, Tom Bumol, Ananda Goldrath, Xiaojun Li, Troy Torgerson, Peter Skene, Lauren Okada, Christian La France, Zach Thomson, Lucas Graybuck

    Abstract: The high incidence of irreproducible research has led to urgent appeals for transparency and equitable practices in open science. For the scientific disciplines that rely on computationally intensive analyses of large data sets, a granular understanding of the analysis methodology is an essential component of reproducibility. This paper discusses the guiding principles of a computational reproduci… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  10. arXiv:2408.05713  [pdf, other

    cs.CV

    SSL: A Self-similarity Loss for Improving Generative Image Super-resolution

    Authors: Du Chen, Zhengqiang Zhang, Jie Liang, Lei Zhang

    Abstract: Generative adversarial networks (GAN) and generative diffusion models (DM) have been widely used in real-world image super-resolution (Real-ISR) to enhance the image perceptual quality. However, these generative models are prone to generating visual artifacts and false image structures, resulting in unnatural Real-ISR results. Based on the fact that natural images exhibit high self-similarities, i… ▽ More

    Submitted 18 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  11. arXiv:2408.03768  [pdf, other

    cs.RO

    HDPlanner: Advancing Autonomous Deployments in Unknown Environments through Hierarchical Decision Networks

    Authors: Jingsong Liang, Yuhong Cao, Yixiao Ma, Hanqi Zhao, Guillaume Sartoretti

    Abstract: In this paper, we introduce HDPlanner, a deep reinforcement learning (DRL) based framework designed to tackle two core and challenging tasks for mobile robots: autonomous exploration and navigation, where the robot must optimize its trajectory adaptively to achieve the task objective through continuous interactions in unknown environments. Specifically, HDPlanner relies on novel hierarchical atten… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Submitted to RA-L

  12. arXiv:2408.02693  [pdf, other

    physics.comp-ph cs.AI

    Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models

    Authors: Chuan Liu, Chunshu Wu, Shihui Cao, Mingkai Chen, James Chenhao Liang, Ang Li, Michael Huang, Chuang Ren, Dongfang Liu, Ying Nian Wu, Tong Geng

    Abstract: Sustainable energy is a crucial global challenge, and recent breakthroughs in nuclear fusion ignition underscore the potential of harnessing energy extracted from nuclear fusion in everyday life, thereby drawing significant attention to fusion ignition research, especially Laser-Plasma Interaction (LPI). Unfortunately, the complexity of LPI at ignition scale renders theory-based analysis nearly im… ▽ More

    Submitted 19 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

  13. arXiv:2408.02454  [pdf, other

    cs.RO

    TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments

    Authors: Daeun Song, Jing Liang, Xuesu Xiao, Dinesh Manocha

    Abstract: We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in challenging scenarios with unstructured off-road features like buildings, grass, and curbs. Our goal is to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating in crosswalks, sidewalks, e… ▽ More

    Submitted 7 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  14. arXiv:2408.01791  [pdf

    cs.NI

    Implementing NAT Hole Punching with QUIC

    Authors: Jinyu Liang, Wei Xu, Taotao Wang, Qing Yang, Shengli Zhang

    Abstract: The widespread adoption of Network Address Translation (NAT) technology has led to a significant number of network end nodes being located in private networks behind NAT devices, impeding direct communication between these nodes. To solve this problem, a technique known as "hole punching" has been devised for NAT traversal to facilitate peer-to-peer communication among end nodes located in distinc… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted for oral presentation at the VTC2024-Fall Conference

  15. arXiv:2407.20203  [pdf, other

    cs.RO

    Privileged Reinforcement and Communication Learning for Distributed, Bandwidth-limited Multi-robot Exploration

    Authors: Yixiao Ma, Jingsong Liang, Yuhong Cao, Derek Ming Siang Tan, Guillaume Sartoretti

    Abstract: Communication bandwidth is an important consideration in multi-robot exploration, where information exchange among robots is critical. While existing methods typically aim to reduce communication throughput, they either require significant computation or significantly compromise exploration efficiency. In this work, we propose a deep reinforcement learning framework based on communication and priv… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by DARS2024

  16. arXiv:2407.18242  [pdf, other

    cs.LG cs.AI cs.CL

    LoRA-Pro: Are Low-Rank Adapters Properly Optimized?

    Authors: Zhengbo Wang, Jian Liang

    Abstract: Low-Rank Adaptation, also known as LoRA, has emerged as a prominent method for parameter-efficient fine-tuning foundation models by re-parameterizing the original matrix into the product of two low-rank matrices. Despite its efficiency, LoRA often yields inferior performance compared to full fine-tuning. In this paper, we propose LoRA-Pro to bridge this performance gap. Firstly, we delve into the… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  17. arXiv:2407.17457  [pdf, other

    cs.CV cs.RO

    CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

    Authors: Jing Liang, Zhuo Deng, Zheming Zhou, Min Sun, Omid Ghasemalizadeh, Cheng-Hao Kuo, Arnie Sen, Dinesh Manocha

    Abstract: We present a new algorithm, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition that integrates global retrieval and reranking into a single end-to-end model. Unlike prior approaches that primarily focus on the RGB domain, CSCPR is designed to handle the RGB-D data. We extend the Context-of-Clusters (CoCs) for handling noisy colorized point clouds and introduce two n… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  18. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  19. arXiv:2407.15773  [pdf, other

    cs.LG cs.CV

    STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay

    Authors: Yongcan Yu, Lijun Sheng, Ran He, Jian Liang

    Abstract: Test-time adaptation (TTA) aims to address the distribution shift between the training and test data with only unlabeled data at test time. Existing TTA methods often focus on improving recognition performance specifically for test data associated with classes in the training set. However, during the open-world inference process, there are inevitably test data instances from unknown classes, commo… ▽ More

    Submitted 27 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024; Fixed a bug in calculating OOD score of STAMP and updated the results

  20. arXiv:2407.15047  [pdf, other

    cs.CV cs.CL

    End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling

    Authors: Jianxin Liang, Xiaojun Meng, Yueqian Wang, Chang Liu, Qun Liu, Dongyan Zhao

    Abstract: Video Question Answering (VideoQA) has emerged as a challenging frontier in the field of multimedia processing, requiring intricate interactions between visual and textual modalities. Simply uniformly sampling frames or indiscriminately aggregating frame-level visual features often falls short in capturing the nuanced and relevant contexts of videos to well perform VideoQA. To mitigate these issue… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  21. arXiv:2407.14655  [pdf, ps, other

    cs.CV

    LORTSAR: Low-Rank Transformer for Skeleton-based Action Recognition

    Authors: Soroush Oraki, Harry Zhuang, Jie Liang

    Abstract: The complexity of state-of-the-art Transformer-based models for skeleton-based action recognition poses significant challenges in terms of computational efficiency and resource utilization. In this paper, we explore the application of Singular Value Decomposition (SVD) to effectively reduce the model sizes of these pre-trained models, aiming to minimize their resource consumption while preserving… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 12 pages

  22. arXiv:2407.13642  [pdf, other

    cs.CV

    Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

    Authors: Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher

    Abstract: In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  23. arXiv:2407.11098  [pdf, other

    cs.LG cs.AI

    Inertial Confinement Fusion Forecasting via LLMs

    Authors: Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng, Dongfang Liu

    Abstract: Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{Fusion-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address challenges in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key contributions: Firstly, we propose the… ▽ More

    Submitted 8 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  24. arXiv:2407.08231  [pdf, other

    cs.CV

    E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors

    Authors: Jinxiu Liang, Bohan Yu, Yixin Yang, Yiming Han, Boxin Shi

    Abstract: Event cameras, mimicking the human retina, capture brightness changes with unparalleled temporal resolution and dynamic range. Integrating events into intensities poses a highly ill-posed challenge, marred by initial condition ambiguities. Traditional regression-based deep learning methods fall short in perceptual quality, offering deterministic and often unrealistic reconstructions. In this paper… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  25. arXiv:2407.08164  [pdf, other

    cs.AI cs.MA cs.RO

    Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

    Authors: Pu Feng, Junkang Liang, Size Wang, Xin Yu, Xin Ji, Yiting Chen, Kui Zhang, Rongye Shi, Wenjun Wu

    Abstract: In multi-agent reinforcement learning (MARL), the Centralized Training with Decentralized Execution (CTDE) framework is pivotal but struggles due to a gap: global state guidance in training versus reliance on local observations in execution, lacking global signals. Inspired by human societal consensus mechanisms, we introduce the Hierarchical Consensus-based Multi-Agent Reinforcement Learning (HC-… ▽ More

    Submitted 23 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 10 figures. Accepted for presentation at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  26. arXiv:2407.06935  [pdf, other

    cs.LG stat.CO stat.ML

    Bayesian Federated Learning with Hamiltonian Monte Carlo: Algorithm and Theory

    Authors: Jiajun Liang, Qian Zhang, Wei Deng, Qifan Song, Guang Lin

    Abstract: This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification. We establish rigorous convergence guarantees of FA-HMC on non-iid distributed data sets, under the strong convexity and Hessian smoothness assumptions. Our analysis investigates the ef… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  27. arXiv:2407.06536  [pdf, other

    cs.NE

    A Two-stage Evolutionary Framework For Multi-objective Optimization

    Authors: Peng Chen, Jing Liang, Kangjia Qiao, Ponnuthurai Nagaratnam Suganthan, Xuanxuan Ban

    Abstract: In the field of evolutionary multi-objective optimization, the approximation of the Pareto front (PF) is achieved by utilizing a collection of representative candidate solutions that exhibit desirable convergence and diversity. Although several multi-objective evolutionary algorithms (MOEAs) have been designed, they still have difficulties in keeping balance between convergence and diversity of po… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by the CEC conference of WCCI2024

  28. arXiv:2407.05726  [pdf, other

    cs.CV eess.IV

    Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis

    Authors: Zirui Zhou, Junhao Liang, Zizhao Peng, Chao Fan, Fengwei An, Shiqi Yu

    Abstract: Scoliosis presents significant diagnostic challenges, particularly in adolescents, where early detection is crucial for effective treatment. Traditional diagnostic and follow-up methods, which rely on physical examinations and radiography, face limitations due to the need for clinical expertise and the risk of radiation exposure, thus restricting their use for widespread early screening. In respon… ▽ More

    Submitted 23 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  29. arXiv:2407.04105  [pdf, other

    cs.CL cs.AI

    Can Pre-trained Language Models Understand Chinese Humor?

    Authors: Yuyan Chen, Zhixu Li, Jiaqing Liang, Yanghua Xiao, Bang Liu, Yunwen Chen

    Abstract: Humor understanding is an important and challenging research in natural language processing. As the popularity of pre-trained language models (PLMs), some recent work makes preliminary attempts to adopt PLMs for humor recognition and generation. However, these simple attempts do not substantially answer the question: {\em whether PLMs are capable of humor understanding?} This paper is the first wo… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to WSDM 2022

  30. arXiv:2407.03131  [pdf, other

    cs.NE cs.AI eess.SP

    MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition

    Authors: Yanjie Cui, Xiaohong Liu, Jing Liang, Yamin Fu

    Abstract: Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information. However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain… ▽ More

    Submitted 6 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  31. arXiv:2407.01796  [pdf, other

    cs.CL

    Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation

    Authors: Sirui Xia, Xintao Wang, Jiaqing Liang, Yifei Zhang, Weikang Zhou, Jiaji Deng, Fei Yu, Yanghua Xiao

    Abstract: Retrieval-Augmented Generation (RAG) has been widely adopted to enhance Large Language Models (LLMs) in knowledge-intensive tasks. Recently, Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG, so as to enhance the credibility of LLM-generated content and facilitate verification. Prior methods mainly adopt coarse-graine… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 15 pages,2 figures

  32. arXiv:2407.01607  [pdf, other

    cs.LG cs.IR stat.ML

    Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction

    Authors: Zhongxiang Fan, Zhaocheng Liu, Jian Liang, Dongying Kong, Han Li, Peng Jiang, Shuang Li, Kun Gai

    Abstract: This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary is… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  33. arXiv:2407.00653  [pdf, other

    cs.CL cs.AI

    Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs

    Authors: Yifei Zhang, Xintao Wang, Jiaqing Liang, Sirui Xia, Lida Chen, Yanghua Xiao

    Abstract: Large Language Models (LLMs) have exhibited impressive proficiency in various natural language processing (NLP) tasks, which involve increasingly complex reasoning. Knowledge reasoning, a primary type of reasoning, aims at deriving new knowledge from existing one.While it has been widely studied in the context of knowledge graphs (KGs), knowledge reasoning in LLMs remains underexplored. In this pa… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  34. arXiv:2406.18921  [pdf, other

    cs.CL

    Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data

    Authors: Yiting Ran, Xintao Wang, Rui Xu, Xinfeng Yuan, Jiaqing Liang, Yanghua Xiao, Deqing Yang

    Abstract: Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indi… ▽ More

    Submitted 29 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 10pages

  35. arXiv:2406.18844  [pdf, other

    cs.CV

    Revisiting Backdoor Attacks against Large Vision-Language Models

    Authors: Siyuan Liang, Jiawei Liang, Tianyu Pang, Chao Du, Aishan Liu, Ee-Chien Chang, Xiaochun Cao

    Abstract: Instruction tuning enhances large vision-language models (LVLMs) but raises security risks through potential backdoor attacks due to their openness. Previous backdoor studies focus on enclosed scenarios with consistent training and testing instructions, neglecting the practical domain gaps that could affect attack effectiveness. This paper empirically examines the generalizability of backdoor atta… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 24 pages, 8 figures

  36. arXiv:2406.18345  [pdf, other

    cs.LG eess.SP

    EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition

    Authors: Yi Ding, Chengxuan Tong, Shuailei Zhang, Muyun Jiang, Yong Li, Kevin Lim Jun Liang, Cuntai Guan

    Abstract: Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  37. arXiv:2406.18227  [pdf, other

    cs.CV cs.CL

    GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

    Authors: Jiafeng Liang, Shixin Jiang, Zekun Wang, Haojie Pan, Zerui Chen, Zheng Chu, Ming Liu, Ruiji Fu, Zhongyuan Wang, Bing Qin

    Abstract: There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: IJCAI 2024

  38. arXiv:2406.18115  [pdf, other

    cs.RO cs.AI cs.CV

    Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps

    Authors: Dicong Qiu, Wenzong Ma, Zhenfu Pan, Hui Xiong, Junwei Liang

    Abstract: Open-Vocabulary Mobile Manipulation (OVMM) is a crucial capability for autonomous robots, especially when faced with the challenges posed by unknown and dynamic environments. This task requires robots to explore and build a semantic understanding of their surroundings, generate feasible plans to achieve manipulation goals, adapt to environmental changes, and comprehend natural language instruction… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Open-vocabulary, Mobile Manipulation, Dynamic Environments, 3D Semantic Maps, Zero-shot, LLMs, VLMs, 18 pages, 2 figures

  39. arXiv:2406.16862  [pdf, other

    cs.RO cs.CV

    Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

    Authors: Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, Carl Vondrick

    Abstract: A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments. A promising mechanism for learning robust policies is to leverage video generative models, which are pretrained on large-scale datasets of internet videos. In this paper, we propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations o… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Project page: https://dreamitate.cs.columbia.edu/

  40. arXiv:2406.14503  [pdf, other

    cs.CL

    Overview of the CAIL 2023 Argument Mining Track

    Authors: Jingcong Liang, Junlong Wang, Xinyu Zhai, Yungui Zhuang, Yiyang Zheng, Xin Xu, Xiandong Ran, Xiaozheng Dong, Honghui Rong, Yanlun Liu, Hao Chen, Yuhan Wei, Donghai Li, Jiajie Peng, Xuanjing Huang, Chongde Shi, Yansong Feng, Yun Song, Zhongyu Wei

    Abstract: We give a detailed overview of the CAIL 2023 Argument Mining Track, one of the Chinese AI and Law Challenge (CAIL) 2023 tracks. The main goal of the track is to identify and extract interacting argument pairs in trial dialogs. It mainly uses summarized judgment documents but can also refer to trial recordings. The track consists of two stages, and we introduce the tasks designed for each stage; we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  41. arXiv:2406.14235  [pdf, other

    cs.CV cs.RO

    Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

    Authors: Jiaming Zhou, Teli Ma, Kun-Yu Lin, Ronghe Qiu, Zifan Wang, Junwei Liang

    Abstract: Learning generalizable visual dynamic representation across different embodied environments is crucial for real-world robotic manipulation. As the scale and diversity of robot demonstration data are limited, recent works have turned to large-scale pre-training using human data. However, the morphological differences between humans and robots introduce a significant human-robot domain discrepancy,… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  42. arXiv:2406.11455  [pdf, other

    cs.CL cs.AI

    Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

    Authors: Zepeng Ding, Ruiyang Ke, Wenhao Huang, Guochao Jiang, Yanda Li, Deqing Yang, Jiaqing Liang

    Abstract: Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performan… ▽ More

    Submitted 29 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  43. arXiv:2406.10902  [pdf, other

    cs.CV cs.CL

    Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models

    Authors: Yikai Zhang, Qianyu He, Xintao Wang, Siyu Yuan, Jiaqing Liang, Yanghua Xiao

    Abstract: Multi-Modal Knowledge Graphs (MMKGs) have proven valuable for various downstream tasks. However, scaling them up is challenging because building large-scale MMKGs often introduces mismatched images (i.e., noise). Most entities in KGs belong to the long tail, meaning there are few images of them available online. This scarcity makes it difficult to determine whether a found image matches the entity… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  44. arXiv:2406.10881  [pdf, other

    cs.CL

    Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals

    Authors: Lida Chen, Zujie Liang, Xintao Wang, Jiaqing Liang, Yanghua Xiao, Feng Wei, Jinglei Chen, Zhenghong Hao, Bing Han, Wei Wang

    Abstract: Large language models (LLMs) have achieved great success, but their occasional content fabrication, or hallucination, limits their practical application. Hallucination arises because LLMs struggle to admit ignorance due to inadequate training on knowledge boundaries. We call it a limitation of LLMs that they can not accurately express their knowledge boundary, answering questions they know while a… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  45. arXiv:2406.10813  [pdf, other

    cs.CL

    Self-Evolution Fine-Tuning for Policy Optimization

    Authors: Ruijun Chen, Jiehao Liang, Shiping Gao, Fanqi Wan, Xiaojun Quan

    Abstract: The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement lea… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  46. arXiv:2406.09838  [pdf, other

    cs.CV cs.AI

    Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps

    Authors: Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang

    Abstract: Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, the… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  47. arXiv:2406.09738  [pdf, other

    cs.RO cs.CV

    Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation

    Authors: Teli Ma, Jiaming Zhou, Zifan Wang, Ronghe Qiu, Junwei Liang

    Abstract: Developing robots capable of executing various manipulation tasks, guided by natural language instructions and visual observations of intricate real-world environments, remains a significant challenge in robotics. Such robot agents need to understand linguistic commands and distinguish between the requirements of different tasks. In this work, we present Sigma-Agent, an end-to-end imitation learni… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  48. arXiv:2406.09534  [pdf, other

    cs.DB cs.LG

    FeatNavigator: Automatic Feature Augmentation on Tabular Data

    Authors: Jiaming Liang, Chuan Lei, Xiao Qin, Jiani Zhang, Asterios Katsifodimos, Christos Faloutsos, Huzefa Rangwala

    Abstract: Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and gene… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 15 pages, 41 figures

  49. arXiv:2406.09159  [pdf

    q-bio.QM cs.AI cs.CG q-bio.GN

    ALPHAGMUT: A Rationale-Guided Alpha Shape Graph Neural Network to Evaluate Mutation Effects

    Authors: Boshen Wang, Bowei Ye, Lin Xu, Jie Liang

    Abstract: In silico methods evaluating the mutation effects of missense mutations are providing an important approach for understanding mutations in personal genomes and identifying disease-relevant biomarkers. However, existing methods, including deep learning methods, heavily rely on sequence-aware information, and do not fully leverage the potential of available 3D structural information. In addition, th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 2 figures, 2 tables

  50. arXiv:2406.07895  [pdf, other

    cs.CV

    Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation

    Authors: Jiadong Liang, Feng Lu

    Abstract: Vivid talking face generation holds immense potential applications across diverse multimedia domains, such as film and game production. While existing methods accurately synchronize lip movements with input audio, they typically ignore crucial alignments between emotion and facial cues, which include expression, gaze, and head pose. These alignments are indispensable for synthesizing realistic vid… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.