Skip to main content

Showing 1–50 of 720 results for author: Yang, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03218  [pdf, other

    cs.PF cs.LG

    Application Research On Real-Time Perception Of Device Performance Status

    Authors: Zhe Wang, Zhen Wang, Jianwen Wu, Wangzhong Xiao, Yidong Chen, Zihua Feng, Dian Yang, Hongchen Liu, Bo Liang, Jiaojiao Fu

    Abstract: In order to accurately identify the performance status of mobile devices and finely adjust the user experience, a real-time performance perception evaluation method based on TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) combined with entropy weighting method and time series model construction was studied. After collecting the performance characteristics of various mobile… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.01957  [pdf, ps, other

    cs.IT eess.SP

    Power Control and Random Serving Mode Allocation for CJT-NCJT Hybrid Mode Enabled Cell-Free Massive MIMO With Limited Fronthauls

    Authors: Hangyu Zhang, Rui Zhang, Yongzhao Li, Yuhan Ruan, Tao Li, Dong Yang

    Abstract: With a great potential of improving the service fairness and quality for user equipments (UEs), cell-free massive multiple-input multiple-output (mMIMO) has been regarded as an emerging candidate for 6G network architectures. Under ideal assumptions, the coherent joint transmission (CJT) serving mode has been considered as an optimal option for cell-free mMIMO systems, since it can achieve coheren… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 6 pages, 2 figures, accepted by GLOBECOM 2024

  3. AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction

    Authors: Yuchen Shi, Guochao Jiang, Tian Qiu, Deqing Yang

    Abstract: The relation extraction (RE) in complex scenarios faces challenges such as diverse relation types and ambiguous relations between entities within a single sentence, leading to the poor performance of pure "text-in, text-out" language models (LMs). To address these challenges, in this paper, we propose an agent-based RE framework, namely AgentRE, which fully leverages the potential of large languag… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by CIKM 2024

  4. arXiv:2409.00933  [pdf, other

    cs.SD eess.AS

    SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

    Authors: Haohan Guo, Fenglong Xie, Kun Xie, Dongchao Yang, Dake Guo, Xixin Wu, Helen Meng

    Abstract: The long speech sequence has been troubling language models (LM) based TTS approaches in terms of modeling complexity and efficiency. This work proposes SoCodec, a semantic-ordered multi-stream speech codec, to address this issue. It compresses speech into a shorter, multi-stream discrete semantic sequence with multiple tokens at each frame. Meanwhile, the ordered product quantization is proposed… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted by SLT 2024

  5. arXiv:2409.00897  [pdf, other

    cs.NI cs.CR cs.ET

    Infiltrating the Sky: Data Delay and Overflow Attacks in Earth Observation Constellations

    Authors: Xiaojian Wang, Ruozhou Yu, Dejun Yang, Guoliang Xue

    Abstract: Low Earth Orbit (LEO) Earth Observation (EO) satellites have changed the way we monitor Earth. Acting like moving cameras, EO satellites are formed in constellations with different missions and priorities, and capture vast data that needs to be transmitted to the ground for processing. However, EO satellites have very limited downlink communication capability, limited by transmission bandwidth, nu… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  6. arXiv:2409.00355  [pdf, other

    cs.CL

    YA-TA: Towards Personalized Question-Answering Teaching Assistants using Instructor-Student Dual Retrieval-augmented Knowledge Fusion

    Authors: Dongil Yang, Suyeon Lee, Minjin Kim, Jungsoo Won, Namyoung Kim, Dongha Lee, Jinyoung Yeo

    Abstract: Engagement between instructors and students plays a crucial role in enhancing students'academic performance. However, instructors often struggle to provide timely and personalized support in large classes. To address this challenge, we propose a novel Virtual Teaching Assistant (VTA) named YA-TA, designed to offer responses to students that are grounded in lectures and are easy to understand. To f… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 9 pages, 5 figures

  7. arXiv:2409.00138  [pdf, other

    cs.CL cs.AI cs.CR

    PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

    Authors: Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, Diyi Yang

    Abstract: As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challe… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

    Comments: Under review

  8. arXiv:2408.14622  [pdf, other

    cs.CL

    What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation

    Authors: Dingyi Yang, Qin Jin

    Abstract: With the development of artificial intelligence, particularly the success of Large Language Models (LLMs), the quantity and quality of automatically generated stories have significantly increased. This has led to the need for automatic story evaluation to assess the generative capabilities of computing systems and analyze the quality of both automatic-generated and human-written stories. Evaluatin… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    ACM Class: A.1; I.2.7; I.2.10

  9. arXiv:2408.13893  [pdf, other

    cs.SD cs.CL eess.AS

    SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng

    Abstract: Scaling Text-to-speech (TTS) to large-scale datasets has been demonstrated as an effective method for improving the diversity and naturalness of synthesized speech. At the high level, previous large-scale TTS models can be categorized into either Auto-regressive (AR) based (\textit{e.g.}, VALL-E) or Non-auto-regressive (NAR) based models (\textit{e.g.}, NaturalSpeech 2/3). Although these works dem… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Submit to TASLP

  10. arXiv:2408.13782  [pdf

    eess.IV cs.CV physics.optics

    Batch-FPM: Random batch-update multi-parameter physical Fourier ptychography neural network

    Authors: Ruiqing Sun, Delong Yang, Yiyan Su, Shaohui Zhang, Qun Hao

    Abstract: Fourier Ptychographic Microscopy (FPM) is a computational imaging technique that enables high-resolution imaging over a large field of view. However, its application in the biomedical field has been limited due to the long image reconstruction time and poor noise robustness. In this paper, we propose a fast and robust FPM reconstruction method based on physical neural networks with batch update st… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  11. arXiv:2408.13195  [pdf, other

    cs.AR cs.LG

    NAS-Cap: Deep-Learning Driven 3-D Capacitance Extraction with Neural Architecture Search and Data Augmentation

    Authors: Haoyuan Li, Dingcheng Yang, Chunyan Pei, Wenjian Yu

    Abstract: More accurate capacitance extraction is demanded for designing integrated circuits under advanced process technology. The pattern matching approach and the field solver for capacitance extraction have the drawbacks of inaccuracy and large computational cost, respectively. Recent work \cite{yang2023cnn} proposes a grid-based data representation and a convolutional neural network (CNN) based capacit… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  12. arXiv:2408.12325  [pdf, other

    cs.CL

    Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

    Authors: Dingkang Yang, Dongling Xiao, Jinjie Wei, Mingcheng Li, Zhaoyu Chen, Ke Li, Lihua Zhang

    Abstract: Despite their remarkable capabilities, Large Language Models (LLMs) are prone to generate responses that contradict verifiable facts, i.e., unfaithful hallucination content. Existing efforts generally focus on optimizing model parameters or editing semantic representations, which compromise the internal factual knowledge of target LLMs. In addition, hallucinations typically exhibit multifaceted pa… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Hallucination Mitigation in LLMs

  13. arXiv:2408.12109  [pdf, other

    cs.CV cs.CL

    RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data

    Authors: Chenglong Wang, Yang Gan, Yifu Huo, Yongyu Mu, Murun Yang, Qiaozhi He, Tong Xiao, Chunliang Zhang, Tongran Liu, Quan Du, Di Yang, Jingbo Zhu

    Abstract: Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using human-preference alignment techniques, such as best-of-n sampling and reinforcement learning. However, these techniques face the difficulty arising from the sc… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  14. arXiv:2408.12056  [pdf, other

    cs.SE cs.AI

    Enhancing LLM-Based Automated Program Repair with Design Rationales

    Authors: Jiuang Zhao, Donghao Yang, Li Zhang, Xiaoli Lian, Zitian Yang

    Abstract: Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It's worth noting that, typically, engineers have design… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  15. arXiv:2408.11505  [pdf, other

    cs.CV

    MSCPT: Few-shot Whole Slide Image Classification with Multi-scale and Context-focused Prompt Tuning

    Authors: Minghao Han, Linhao Qu, Dingkang Yang, Xukun Zhang, Xiaoying Wang, Lihua Zhang

    Abstract: Multiple instance learning (MIL) has become a standard paradigm for weakly supervised classification of whole slide images (WSI). However, this paradigm relies on the use of a large number of labelled WSIs for training. The lack of training data and the presence of rare diseases present significant challenges for these methods. Prompt tuning combined with the pre-trained Vision-Language models (VL… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 11 pages, 5 figures, 5tables

  16. arXiv:2408.11210  [pdf, other

    cs.CV

    A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation

    Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Daguang Xu, Wenqi Li

    Abstract: Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  17. arXiv:2408.09395  [pdf, other

    cs.CV

    OU-CoViT: Copula-Enhanced Bi-Channel Multi-Task Vision Transformers with Dual Adaptation for OU-UWF Images

    Authors: Yang Li, Jianing Deng, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Xingtao Zhou, Catherine C. Liu, Bo Fu

    Abstract: Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging and joint modeling of multiple discrete and continuous clinical scores presents a promising new paradigm for multi-task problems in Ophthalmology. The bi-channel framework that arises from the Ophthalmic phenomenon of ``interocular asymmetries'' of both eyes (OU) calls for new employment on the SOTA transformer-based models.… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  18. arXiv:2408.09122  [pdf, other

    cs.CV

    MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation

    Authors: Xiao Zhao, Xukun Zhang, Dingkang Yang, Mingyang Sun, Mingcheng Li, Shunli Wang, Lihua Zhang

    Abstract: Accurate and robust multimodal multi-task perception is crucial for modern autonomous driving systems. However, current multimodal perception research follows independent paradigms designed for specific perception tasks, leading to a lack of complementary learning among tasks and decreased performance in multi-task learning (MTL) due to joint training. In this paper, we propose MaskBEV, a masked a… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  19. HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

    Authors: Xiao Zhao, Bo Chen, Mingyang Sun, Dingkang Yang, Youxing Wang, Xukun Zhang, Mingcheng Li, Dongliang Kou, Xiaoyi Wei, Lihua Zhang

    Abstract: Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE RAL

  20. An Unsupervised Learning Framework Combined with Heuristics for the Maximum Minimal Cut Problem

    Authors: Huaiyuan Liu, Xianzhang Liu, Donghua Yang, Hongzhi Wang, Yingchi Long, Mengtong Ji, Dongjing Miao, Zhiyu Liang

    Abstract: The Maximum Minimal Cut Problem (MMCP), a NP-hard combinatorial optimization (CO) problem, has not received much attention due to the demanding and challenging bi-connectivity constraint. Moreover, as a CO problem, it is also a daunting task for machine learning, especially without labeled instances. To deal with these problems, this work proposes an unsupervised learning framework combined with h… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  21. arXiv:2408.08152  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

    Authors: Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  22. arXiv:2408.04914  [pdf, other

    cs.CV

    GuidedNet: Semi-Supervised Multi-Organ Segmentation via Labeled Data Guide Unlabeled Data

    Authors: Haochen Zhao, Hui Meng, Deqian Yang, Xiaozheng Xie, Xiaoze Wu, Qingfeng Li, Jianwei Niu

    Abstract: Semi-supervised multi-organ medical image segmentation aids physicians in improving disease diagnosis and treatment planning and reduces the time and effort required for organ annotation.Existing state-of-the-art methods train the labeled data with ground truths and train the unlabeled data with pseudo-labels. However, the two training flows are separate, which does not reflect the interrelationsh… ▽ More

    Submitted 2 September, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM2024, 10 pages, 5 figures

  23. arXiv:2408.04686  [pdf, other

    cs.CL cs.AI

    Multi-Turn Context Jailbreak Attack on Large Language Models From First Principles

    Authors: Xiongtao Sun, Deyue Zhang, Dongdong Yang, Quanchen Zou, Hui Li

    Abstract: Large language models (LLMs) have significantly enhanced the performance of numerous applications, from intelligent conversations to text generation. However, their inherent security vulnerabilities have become an increasingly significant challenge, especially with respect to jailbreak attacks. Attackers can circumvent the security mechanisms of these LLMs, breaching security constraints and causi… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  24. arXiv:2408.02024  [pdf, other

    cs.CV

    Faster Diffusion Action Segmentation

    Authors: Shuaibing Wang, Shunli Wang, Mingcheng Li, Dingkang Yang, Haopeng Kuang, Ziyun Qian, Lihua Zhang

    Abstract: Temporal Action Segmentation (TAS) is an essential task in video analysis, aiming to segment and classify continuous frames into distinct action segments. However, the ambiguous boundaries between actions pose a significant challenge for high-precision segmentation. Recent advances in diffusion models have demonstrated substantial success in TAS tasks due to their stable training process and high-… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 25 pages, 6 figures

  25. arXiv:2408.02023  [pdf, other

    cs.CR

    A Smart City Infrastructure Ontology for Threats, Cybercrime, and Digital Forensic Investigation

    Authors: Yee Ching Tok, Davis Zheng Yang, Sudipta Chattopadhyay

    Abstract: Cybercrime and the market for cyber-related compromises are becoming attractive revenue sources for state-sponsored actors, cybercriminals and technical individuals affected by financial hardships. Due to burgeoning cybercrime on new technological frontiers, efforts have been made to assist digital forensic investigators (DFI) and law enforcement agencies (LEA) in their investigative efforts. Fo… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  26. arXiv:2408.00441  [pdf, other

    cs.CV cs.AI

    Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval

    Authors: Gangyan Zeng, Yuan Zhang, Jin Wei, Dongbao Yang, Peng Zhang, Yiwen Gao, Xugong Qin, Yu Zhou

    Abstract: Scene text retrieval aims to find all images containing the query text from an image gallery. Current efforts tend to adopt an Optical Character Recognition (OCR) pipeline, which requires complicated text detection and/or recognition processes, resulting in inefficient and inflexible retrieval. Different from them, in this work we propose to explore the intrinsic potential of Contrastive Language-… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  27. arXiv:2407.17817  [pdf, other

    cs.CL cs.LG

    Demystifying Verbatim Memorization in Large Language Models

    Authors: Jing Huang, Diyi Yang, Christopher Potts

    Abstract: Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications. Much prior work has studied such verbatim memorization using observational data. To complement such work, we develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences. We find that (1… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  28. arXiv:2407.12403  [pdf, ps, other

    quant-ph cs.IT

    Reliability Function of Classical-Quantum Channels

    Authors: Ke Li, Dong Yang

    Abstract: Reliability function, defined as the optimal error exponent describing the exponential decay of decoding error probability when the communicating rate is below the capacity of the channel, is one of the fundamental problems in information theory. In this work, we determine the reliability function for a general cq channel. The main contribution is a lower bound for the error exponent which is char… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 15 pages, no figure. See the independent work arXiv:2407.11118 by Joseph M. Renes

  29. arXiv:2407.12248  [pdf, other

    cs.DC

    Mitigating Interference of Microservices with a Scoring Mechanism in Large-scale Clusters

    Authors: Dingyu Yang, Kangpeng Zheng, Shiyou Qian, Jian Cao, Guangtao Xue

    Abstract: Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we ob… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  30. arXiv:2407.11300  [pdf, other

    cs.CV cs.AI

    Large Vision-Language Models as Emotion Recognizers in Context Awareness

    Authors: Yuxuan Lei, Dingkang Yang, Zhaoyu Chen, Jiawei Chen, Peng Zhai, Lihua Zhang

    Abstract: Context-aware emotion recognition (CAER) is a complex and significant task that requires perceiving emotions from various contextual cues. Previous approaches primarily focus on designing sophisticated architectures to extract emotional cues from images. However, their knowledge is confined to specific training datasets and may reflect the subjective emotional biases of the annotators. Furthermore… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  31. arXiv:2407.10814  [pdf, other

    cs.CV

    Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification

    Authors: Linhao Qu, Dingkang Yang, Dan Huang, Qinhao Guo, Rongkui Luo, Shaoting Zhang, Xiaosong Wang

    Abstract: Current multi-instance learning algorithms for pathology image analysis often require a substantial number of Whole Slide Images for effective training but exhibit suboptimal performance in scenarios with limited learning data. In clinical settings, restricted access to pathology slides is inevitable due to patient privacy concerns and the prevalence of rare or emerging diseases. The emergence of… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  32. arXiv:2407.09285  [pdf, other

    cs.CV

    MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results

    Authors: Jiangpeng He, Yuhao Chen, Gautham Vinod, Talha Ibn Mahmud, Fengqing Zhu, Edward Delp, Alexander Wong, Pengcheng Xi, Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Yawei Jueluo, Chengyu Shi, Pengyu Wang

    Abstract: The increasing interest in computer vision applications for nutrition and dietary monitoring has led to the development of advanced 3D reconstruction techniques for food items. However, the scarcity of high-quality data and limited collaboration between industry and academia have constrained progress in this field. Building on recent advancements in 3D reconstruction, we host the MetaFood Workshop… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Technical report for MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction. arXiv admin note: substantial text overlap with arXiv:2407.01717

  33. arXiv:2407.08926  [pdf, other

    cs.IR

    Toward Automatic Group Membership Annotation for Group Fairness Evaluation

    Authors: Fumian Chen, Dayu Yang, Hui Fang

    Abstract: With the increasing research attention on fairness in information retrieval systems, more and more fairness-aware algorithms have been proposed to ensure fairness for a sustainable and healthy retrieval ecosystem. However, as the most adopted measurement of fairness-aware algorithms, group fairness evaluation metrics, require group membership information that needs massive human annotations and is… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Journal ref: NLDB2024

  34. arXiv:2407.07026  [pdf, other

    cs.CV cs.CL cs.MM cs.SI

    Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

    Authors: Daiqing Wu, Dongbao Yang, Huawen Shen, Can Ma, Yu Zhou

    Abstract: With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image and text from the same post can express different or even contradictory sentiments, leading to potential \textbf{sentiment discrepancy}. However, existing works mainly adopt a single-branch fusion structure that… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures

  35. arXiv:2407.06084  [pdf, other

    cs.CV

    3D Vision and Language Pretraining with Large-Scale Synthetic Data

    Authors: Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu

    Abstract: 3D Vision-Language Pre-training (3D-VLP) aims to provide a pre-train model which can bridge 3D scenes with natural language, which is an important technique for embodied intelligence. However, current 3D-VLP datasets are hindered by limited scene-level diversity and insufficient fine-grained annotations (only 1.2K scenes and 280K textual annotations in ScanScribe), primarily due to the labor-inten… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: accepted by IJCAI2024

  36. arXiv:2407.05352  [pdf, other

    cs.CV cs.MM

    Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

    Authors: Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recently, diffusion models have increasingly demonstrated their capabilities in vision understanding. By leveraging prompt-based learning to construct sentences, these models have shown proficiency in classification and visual grounding tasks. However, existing approaches primarily showcase their ability to perform sentence-level localization, leaving the potential for leveraging contextual inform… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  37. arXiv:2407.04963  [pdf, other

    cs.CV

    Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training

    Authors: Dingkang Yang, Kun Yang, Haopeng Kuang, Zhaoyu Chen, Yuzheng Wang, Lihua Zhang

    Abstract: Understanding emotions from diverse contexts has received widespread attention in computer vision communities. The core philosophy of Context-Aware Emotion Recognition (CAER) is to provide valuable semantic cues for recognizing the emotions of target persons by leveraging rich contextual information. Current approaches invariably focus on designing sophisticated structures to extract perceptually… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: TPAMI 2024

  38. arXiv:2407.04955  [pdf, other

    cs.CV

    Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

    Authors: Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang

    Abstract: Understanding human intentions (e.g., emotions) from videos has received considerable attention recently. Video streams generally constitute a blend of temporal data stemming from distinct modalities, including natural language, facial expressions, and auditory clues. Despite the impressive advancements of previous works via attention-based paradigms, the inherent temporal asynchrony and modality… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: TCSVT 2024

  39. arXiv:2407.03384  [pdf, other

    physics.flu-dyn cs.CE

    Topological Separation of Vortices

    Authors: Adeel Zafar, Zahra Poorshayegh, Di Yang, Guoning Chen

    Abstract: Vortices and their analysis play a critical role in the understanding of complex phenomena in turbulent flows. Traditional vortex extraction methods, notably region-based techniques, often overlook the entanglement phenomenon, resulting in the inclusion of multiple vortices within a single extracted region. Their separation is necessary for quantifying different types of vortices and their statist… ▽ More

    Submitted 6 August, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted for presentation at IEEE Visualization (VIS) 2024 short paper track and will appear in the conference proceedings

  40. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Under Review

  41. arXiv:2407.02996  [pdf, other

    cs.CL cs.AI

    Are Large Language Models Consistent over Value-laden Questions?

    Authors: Jared Moore, Tanvi Deshpande, Diyi Yang

    Abstract: Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are they? To answer, we first define value consistency as the similarity of answers across (1) paraphrases of one question, (2) related questions under one topic, (3) multiple-choice and open-ended use-cases of one question, a… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 9 figures

  42. arXiv:2407.01930  [pdf, other

    cs.CV

    Self-Cooperation Knowledge Distillation for Novel Class Discovery

    Authors: Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Yunquan Sun, Lizhe Qi

    Abstract: Novel Class Discovery (NCD) aims to discover unknown and novel classes in an unlabeled set by leveraging knowledge already learned about known classes. Existing works focus on instance-level or class-level knowledge representation and build a shared representation space to achieve performance improvements. However, a long-neglected issue is the potential imbalanced number of samples from known and… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  43. arXiv:2407.01111  [pdf, other

    cs.LG cs.AI stat.ML

    Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation

    Authors: Hao Wang, Zhichao Chen, Yuan Shen, Jiajun Fan, Zhaoran Liu, Degui Yang, Xinggao Liu, Haoxuan Li

    Abstract: Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Code is available at https://anonymous.4open.science/status/ncr-B697

  44. arXiv:2407.00870  [pdf, other

    cs.CL cs.HC

    Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

    Authors: Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang

    Abstract: Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits… ▽ More

    Submitted 14 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: 34 pages, 24 figures, 11 Tables

  45. arXiv:2406.18921  [pdf, other

    cs.CL

    Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data

    Authors: Yiting Ran, Xintao Wang, Rui Xu, Xinfeng Yuan, Jiaqing Liang, Yanghua Xiao, Deqing Yang

    Abstract: Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indi… ▽ More

    Submitted 29 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 10pages

  46. arXiv:2406.17271  [pdf, other

    cs.CL

    DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

    Authors: Zhehao Zhang, Jiaao Chen, Diyi Yang

    Abstract: The current paradigm of evaluating Large Language Models (LLMs) through static benchmarks comes with significant limitations, such as vulnerability to data contamination and a lack of adaptability to the evolving capabilities of LLMs. Therefore, evaluation methods that can adapt and generate evaluation data with controlled complexity are urgently needed. In this work, we introduce Dynamic Evaluati… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  47. arXiv:2406.16992  [pdf, other

    cs.LG cs.AI

    Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction

    Authors: Yicheng Zhou, Pengfei Wang, Hao Dong, Denghui Zhang, Dingqi Yang, Yanjie Fu, Pengyang Wang

    Abstract: Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still su… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to IJCAI 2024

  48. arXiv:2406.15769  [pdf, other

    cs.DC

    Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

    Authors: Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

    Abstract: An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two signif… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages; 27 figures

  49. arXiv:2406.14958  [pdf, other

    cs.CV

    Skip and Skip: Segmenting Medical Images with Prompts

    Authors: Jiawei Chen, Dingkang Yang, Yuxuan Lei, Lihua Zhang

    Abstract: Most medical image lesion segmentation methods rely on hand-crafted accurate annotations of the original image for supervised learning. Recently, a series of weakly supervised or unsupervised methods have been proposed to reduce the dependence on pixel-level annotations. However, these methods are essentially based on pixel-level annotation, ignoring the image-level diagnostic results of the curre… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Work in progress

  50. arXiv:2406.14282  [pdf, other

    cs.CL cs.AI

    Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

    Authors: Junjie Wang, Mingyang Chen, Binbin Hu, Dan Yang, Ziqi Liu, Yue Shen, Peng Wei, Zhiqiang Zhang, Jinjie Gu, Jun Zhou, Jeff Z. Pan, Wen Zhang, Huajun Chen

    Abstract: Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fin… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress