Search | arXiv e-print repository

RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation

Authors: Qingyao Li, Wei Xia, Kounianhua Du, Xinyi Dai, Ruiming Tang, Yasheng Wang, Yong Yu, Weinan Zhang

Abstract: LLM agents enhanced by tree search algorithms have yielded notable performances in code generation. However, current search algorithms in this domain suffer from low search quality due to several reasons: 1) Ineffective design of the search space for the high-reasoning demands of code generation tasks, 2) Inadequate integration of code feedback with the search algorithm, and 3) Poor handling of ne… ▽ More LLM agents enhanced by tree search algorithms have yielded notable performances in code generation. However, current search algorithms in this domain suffer from low search quality due to several reasons: 1) Ineffective design of the search space for the high-reasoning demands of code generation tasks, 2) Inadequate integration of code feedback with the search algorithm, and 3) Poor handling of negative feedback during the search, leading to reduced search efficiency and quality. To address these challenges, we propose to search for the reasoning process of the code and use the detailed feedback of code execution to refine erroneous thoughts during the search. In this paper, we introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS) algorithm to conduct thought-level searches before generating code, thereby exploring a wider range of strategies. More importantly, we construct verbal feedback from fine-grained code execution feedback to refine erroneous thoughts during the search. This ensures that the search progresses along the correct reasoning paths, thus improving the overall search quality of the tree by leveraging execution feedback. Through extensive experiments, we demonstrate that RethinkMCTS outperforms previous search-based and feedback-based code generation baselines. On the HumanEval dataset, it improves the pass@1 of GPT-3.5-turbo from 70.12 to 89.02 and GPT-4o-mini from 87.20 to 94.51. It effectively conducts more thorough exploration through thought-level searches and enhances the search quality of the entire tree by incorporating rethink operation. △ Less

Submitted 14 September, 2024; originally announced September 2024.

Comments: 11 pages, 4 figures

arXiv:2409.03565 [pdf]

Surface Magnetism in Fe$_3$GeTe$_2$ Crystals

Authors: T. A. Tyson, S. Amarasinghe, AM M. Abeykoon, R. Lalancette, S. K. Du, X. Fang, S. -W. Cheong, A. Al-Mahboob, J. T. Sadowski

Abstract: The surface magnetization of Fe$_3$GeTe$_2$ was examined by low-energy electron microscopy (LEEM) using an off-normal incidence electron beam. We found that the 180$^o$ domain walls are of Bloch type. Temperature-dependent LEEM measurements yield a surface magnetization with a surface critical exponent $β$1 = 0.79 +/- 0.02. This result is consistent with surface magnetism in the 3D semi-infinite H… ▽ More The surface magnetization of Fe$_3$GeTe$_2$ was examined by low-energy electron microscopy (LEEM) using an off-normal incidence electron beam. We found that the 180$^o$ domain walls are of Bloch type. Temperature-dependent LEEM measurements yield a surface magnetization with a surface critical exponent $β$1 = 0.79 +/- 0.02. This result is consistent with surface magnetism in the 3D semi-infinite Heisenberg ($β$1 = 0.84 +/- 0.01) or Ising ($β$1 = 0.78 +/- 0.02) models, which is distinctly different from the bulk exponent ($β$ = 0.34 +/- 0.07). The measurements reveal the power of LEEM with a tilted beam to determine magnetic domain structure in quantum materials. Single crystal diffraction measurements reveal inversion symmetry-breaking weak peaks and yield space group P-6m2. This Fe site defect-derived loss of inversion symmetry enables the formation of skyrmions in this Fe$_3$GeTe$_2$ crystal. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.03193 [pdf, other]

Upper-Limb Rehabilitation with a Dual-Mode Individualized Exoskeleton Robot: A Generative-Model-Based Solution

Authors: Yu Chen, Shu Miao, Jing Ye, Gong Chen, Jianghua Cheng, Ketao Du, Xiang Li

Abstract: Several upper-limb exoskeleton robots have been developed for stroke rehabilitation, but their rather low level of individualized assistance typically limits their effectiveness and practicability. Individualized assistance involves an upper-limb exoskeleton robot continuously assessing feedback from a stroke patient and then meticulously adjusting interaction forces to suit specific conditions an… ▽ More Several upper-limb exoskeleton robots have been developed for stroke rehabilitation, but their rather low level of individualized assistance typically limits their effectiveness and practicability. Individualized assistance involves an upper-limb exoskeleton robot continuously assessing feedback from a stroke patient and then meticulously adjusting interaction forces to suit specific conditions and online changes. This paper describes the development of a new upper-limb exoskeleton robot with a novel online generative capability that allows it to provide individualized assistance to support the rehabilitation training of stroke patients. Specifically, the upper-limb exoskeleton robot exploits generative models to customize the fine and fit trajectory for the patient, as medical conditions, responses, and comfort feedback during training generally differ between patients. This generative capability is integrated into the two working modes of the upper-limb exoskeleton robot: an active mirroring mode for patients who retain motor abilities on one side of the body and a passive following mode for patients who lack motor ability on both sides of the body. The performance of the upper-limb exoskeleton robot was illustrated in experiments involving healthy subjects and stroke patients. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2408.17231 [pdf, other]

CondSeg: Ellipse Estimation of Pupil and Iris via Conditioned Segmentation

Authors: Zhuang Jia, Jiangfan Deng, Liying Chi, Xiang Long, Daniel K. Du

Abstract: Parsing of eye components (i.e. pupil, iris and sclera) is fundamental for eye tracking and gaze estimation for AR/VR products. Mainstream approaches tackle this problem as a multi-class segmentation task, providing only visible part of pupil/iris, other methods regress elliptical parameters using human-annotated full pupil/iris parameters. In this paper, we consider two priors: projected full pup… ▽ More Parsing of eye components (i.e. pupil, iris and sclera) is fundamental for eye tracking and gaze estimation for AR/VR products. Mainstream approaches tackle this problem as a multi-class segmentation task, providing only visible part of pupil/iris, other methods regress elliptical parameters using human-annotated full pupil/iris parameters. In this paper, we consider two priors: projected full pupil/iris circle can be modelled with ellipses (ellipse prior), and the visibility of pupil/iris is controlled by openness of eye-region (condition prior), and design a novel method CondSeg to estimate elliptical parameters of pupil/iris directly from segmentation labels, without explicitly annotating full ellipses, and use eye-region mask to control the visibility of estimated pupil/iris ellipses. Conditioned segmentation loss is used to optimize the parameters by transforming parameterized ellipses into pixel-wise soft masks in a differentiable way. Our method is tested on public datasets (OpenEDS-2019/-2020) and shows competitive results on segmentation metrics, and provides accurate elliptical parameters for further applications of eye tracking simultaneously. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.16403 [pdf, other]

DeepSPoC: A Deep Learning-Based PDE Solver Governed by Sequential Propagation of Chaos

Authors: Kai Du, Yongle Xie, Tao Zhou, Yuancheng Zhou

Abstract: Sequential propagation of chaos (SPoC) is a recently developed tool to solve mean-field stochastic differential equations and their related nonlinear Fokker-Planck equations. Based on the theory of SPoC, we present a new method (deepSPoC) that combines the interacting particle system of SPoC and deep learning. Under the framework of deepSPoC, two classes of frequently used deep models include full… ▽ More Sequential propagation of chaos (SPoC) is a recently developed tool to solve mean-field stochastic differential equations and their related nonlinear Fokker-Planck equations. Based on the theory of SPoC, we present a new method (deepSPoC) that combines the interacting particle system of SPoC and deep learning. Under the framework of deepSPoC, two classes of frequently used deep models include fully connected neural networks and normalizing flows are considered. For high-dimensional problems, spatial adaptive method are designed to further improve the accuracy and efficiency of deepSPoC. We analysis the convergence of the framework of deepSPoC under some simplified conditions and also provide a posterior error estimation for the algorithm. Finally, we test our methods on a wide range of different types of mean-field equations. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.16252 [pdf, ps, other]

A collision-oriented interacting particle system for Landau-type equations and the molecular chaos

Authors: Kai Du, Lei Li

Abstract: We propose a collision-oriented particle system to approximate a class of Landau-type equations. This particle system is formally derived from a particle system with random collisions in the grazing regime, and happens to be a special random batch system with random interaction in the diffusion coefficient. The difference from usual random batch systems with random interaction in the drift is that… ▽ More We propose a collision-oriented particle system to approximate a class of Landau-type equations. This particle system is formally derived from a particle system with random collisions in the grazing regime, and happens to be a special random batch system with random interaction in the diffusion coefficient. The difference from usual random batch systems with random interaction in the drift is that the batch size has to be $p=2$. We then analyze the convergence rate of the proposed particle system to the Landau-type equations using the tool of relative entropy, assuming that the interaction kernels are regular enough. A key aspect of our approach is the gradient estimates of logarithmic densities, applied to both the Landau-type equations and the particle systems. Compared to existing particle systems for the approximation of Landau-type equations, our proposed system not only offers a more intrinsic reflection of the underlying physics but also reduces the computational cost to $O(N)$ per time step when implemented numerically. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.12161 [pdf, other]

Rebalancing Multi-Label Class-Incremental Learning

Authors: Kaile Du, Yifan Zhou, Fan Lyu, Yuyang Li, Junzhou Xie, Yixi Shen, Fuyuan Hu, Guangcan Liu

Abstract: Multi-label class-incremental learning (MLCIL) is essential for real-world multi-label applications, allowing models to learn new labels while retaining previously learned knowledge continuously. However, recent MLCIL approaches can only achieve suboptimal performance due to the oversight of the positive-negative imbalance problem, which manifests at both the label and loss levels because of the t… ▽ More Multi-label class-incremental learning (MLCIL) is essential for real-world multi-label applications, allowing models to learn new labels while retaining previously learned knowledge continuously. However, recent MLCIL approaches can only achieve suboptimal performance due to the oversight of the positive-negative imbalance problem, which manifests at both the label and loss levels because of the task-level partial label issue. The imbalance at the label level arises from the substantial absence of negative labels, while the imbalance at the loss level stems from the asymmetric contributions of the positive and negative loss parts to the optimization. To address the issue above, we propose a Rebalance framework for both the Loss and Label levels (RebLL), which integrates two key modules: asymmetric knowledge distillation (AKD) and online relabeling (OR). AKD is proposed to rebalance at the loss level by emphasizing the negative label learning in classification loss and down-weighting the contribution of overconfident predictions in distillation loss. OR is designed for label rebalance, which restores the original class distribution in memory by online relabeling the missing classes. Our comprehensive experiments on the PASCAL VOC and MS-COCO datasets demonstrate that this rebalancing strategy significantly improves performance, achieving new state-of-the-art results even with a vanilla CNN backbone. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.08524 [pdf, other]

GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization

Authors: Kang Du, Zhihao Liang, Zeyu Wang

Abstract: We present GS-ID, a novel framework for illumination decomposition on Gaussian Splatting, achieving photorealistic novel view synthesis and intuitive light editing. Illumination decomposition is an ill-posed problem facing three main challenges: 1) priors for geometry and material are often lacking; 2) complex illumination conditions involve multiple unknown light sources; and 3) calculating surfa… ▽ More We present GS-ID, a novel framework for illumination decomposition on Gaussian Splatting, achieving photorealistic novel view synthesis and intuitive light editing. Illumination decomposition is an ill-posed problem facing three main challenges: 1) priors for geometry and material are often lacking; 2) complex illumination conditions involve multiple unknown light sources; and 3) calculating surface shading with numerous light sources is computationally expensive. To address these challenges, we first introduce intrinsic diffusion priors to estimate the attributes for physically based rendering. Then we divide the illumination into environmental and direct components for joint optimization. Last, we employ deferred rendering to reduce the computational load. Our framework uses a learnable environment map and Spherical Gaussians (SGs) to represent light sources parametrically, therefore enabling controllable and photorealistic relighting on Gaussian Splatting. Extensive experiments and applications demonstrate that GS-ID produces state-of-the-art illumination decomposition results while achieving better geometry reconstruction and rendering performance. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 15 pages, 13 figures

arXiv:2407.01245 [pdf, other]

SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model

Authors: Lingyue Fu, Hao Guan, Kounianhua Du, Jianghao Lin, Wei Xia, Weinan Zhang, Ruiming Tang, Yasheng Wang, Yong Yu

Abstract: Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently a… ▽ More Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently arrive in the database. In addition, existing KT models only implicitly consider the correlation between concepts and questions, lacking direct modeling of the more complex relationships in the heterogeneous graph of concepts and questions. In this paper, we propose a Structure-aware Inductive Knowledge Tracing model with large language model (dubbed SINKT), which, for the first time, introduces large language models (LLMs) and realizes inductive knowledge tracing. Firstly, SINKT utilizes LLMs to introduce structural relationships between concepts and constructs a heterogeneous graph for concepts and questions. Secondly, by encoding concepts and questions with LLMs, SINKT incorporates semantic information to aid prediction. Finally, SINKT predicts the student's response to the target question by interacting with the student's knowledge state and the question representation. Experiments on four real-world datasets demonstrate that SINKT achieves state-of-the-art performance among 12 existing transductive KT models. Additionally, we explore the performance of SINKT on the inductive KT task and provide insights into various modules. △ Less

Submitted 23 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.18825 [pdf, other]

ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation

Authors: Jizheng Chen, Kounianhua Du, Jianghao Lin, Bo Chen, Ruiming Tang, Weinan Zhang

Abstract: Large language models have been flourishing in the natural language processing (NLP) domain, and their potential for recommendation has been paid much attention to. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns due to their innate weakness in interpreting numerical features and the overhead for long cont… ▽ More Large language models have been flourishing in the natural language processing (NLP) domain, and their potential for recommendation has been paid much attention to. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns due to their innate weakness in interpreting numerical features and the overhead for long context, where the temporal relations among user behaviors, subtle quantitative signals among different ratings, and various side features of items are not well explored. Existing works only fine-tune a sole LLM on given text data without introducing that important information to it, leaving these problems unsolved. In this paper, we propose ELCoRec to Enhance Language understanding with CoPropagation of numerical and categorical features for Recommendation. Concretely, we propose to inject the preference understanding capability into LLM via a GAT expert model where the user preference is better encoded by parallelly propagating the temporal relations, and rating signals as well as various side information of historical items. The parallel propagation mechanism could stabilize heterogeneous features and offer an informative user preference encoding, which is then injected into the language models via soft prompting at the cost of a single token embedding. To further obtain the user's recent interests, we proposed a novel Recent interaction Augmented Prompt (RAP) template. Experiment results over three datasets against strong baselines validate the effectiveness of ELCoRec. The code is available at https://anonymous.4open.science/r/CIKM_Code_Repo-E6F5/README.md. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.08982 [pdf, other]

Implementation Guidelines and Innovations in Quantum LSTM Networks

Authors: Yifan Zhou, Chong Cheng Xu, Mingi Song, Yew Kee Wong, Kangsong Du

Abstract: The rapid evolution of artificial intelligence has driven interest in Long Short-Term Memory (LSTM) networks for their effectiveness in processing sequential data. However, traditional LSTMs are limited by issues such as the vanishing gradient problem and high computational demands. Quantum computing offers a potential solution to these challenges, promising advancements in computational efficienc… ▽ More The rapid evolution of artificial intelligence has driven interest in Long Short-Term Memory (LSTM) networks for their effectiveness in processing sequential data. However, traditional LSTMs are limited by issues such as the vanishing gradient problem and high computational demands. Quantum computing offers a potential solution to these challenges, promising advancements in computational efficiency through the unique properties of qubits, such as superposition and entanglement. This paper presents a theoretical analysis and an implementation plan for a Quantum LSTM (qLSTM) model, which seeks to integrate quantum computing principles with traditional LSTM networks. While the proposed model aims to address the limitations of classical LSTMs, this study focuses primarily on the theoretical aspects and the implementation framework. The actual architecture and its practical effectiveness in enhancing sequential data processing remain to be developed and demonstrated in future work. △ Less

Submitted 25 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: 13 pages, 5 Figures

arXiv:2406.00012 [pdf, other]

Extracting Essential and Disentangled Knowledge for Recommendation Enhancement

Authors: Kounianhua Du, Jizheng Chen, Jianghao Lin, Menghui Zhu, Bo Chen, Shuai Li, Ruiming Tang

Abstract: Recommender models play a vital role in various industrial scenarios, while often faced with the catastrophic forgetting problem caused by the fast shifting data distribution, e.g., the evolving user interests, click signals fluctuation during sales promotions, etc. To alleviate this problem, a common approach is to reuse knowledge from the historical data. However, preserving the vast and fast-ac… ▽ More Recommender models play a vital role in various industrial scenarios, while often faced with the catastrophic forgetting problem caused by the fast shifting data distribution, e.g., the evolving user interests, click signals fluctuation during sales promotions, etc. To alleviate this problem, a common approach is to reuse knowledge from the historical data. However, preserving the vast and fast-accumulating data is hard, which causes dramatic storage overhead. Memorizing old data through a parametric knowledge base is then proposed, which compresses the vast amount of raw data into model parameters. Despite the flexibility, how to improve the memorization and generalization capabilities of the parametric knowledge base is challenging. In this paper, we propose two constraints to extract Essential and Disentangled Knowledge from past data for rational and generalized recommendation enhancement, which improves the capabilities of the parametric knowledge base without increasing the size of it. The essential principle helps to compress the input into representative vectors that capture the task-relevant information and filter out the noisy information. The disentanglement principle reduces the redundancy of stored information and pushes the knowledge base to focus on capturing the disentangled invariant patterns. These two rules together promote rational compression of information for robust and generalized knowledge representations. Extensive experiments on two datasets justify the effectiveness of the proposed method. △ Less

Submitted 20 May, 2024; originally announced June 2024.

arXiv:2406.00011 [pdf, other]

DisCo: Towards Harmonious Disentanglement and Collaboration between Tabular and Semantic Space for Recommendation

Authors: Kounianhua Du, Jizheng Chen, Jianghao Lin, Yunjia Xi, Hangyu Wang, Xinyi Dai, Bo Chen, Ruiming Tang, Weinan Zhang

Abstract: Recommender systems play important roles in various applications such as e-commerce, social media, etc. Conventional recommendation methods usually model the collaborative signals within the tabular representation space. Despite the personalization modeling and the efficiency, the latent semantic dependencies are omitted. Methods that introduce semantics into recommendation then emerge, injecting… ▽ More Recommender systems play important roles in various applications such as e-commerce, social media, etc. Conventional recommendation methods usually model the collaborative signals within the tabular representation space. Despite the personalization modeling and the efficiency, the latent semantic dependencies are omitted. Methods that introduce semantics into recommendation then emerge, injecting knowledge from the semantic representation space where the general language understanding are compressed. However, existing semantic-enhanced recommendation methods focus on aligning the two spaces, during which the representations of the two spaces tend to get close while the unique patterns are discarded and not well explored. In this paper, we propose DisCo to Disentangle the unique patterns from the two representation spaces and Collaborate the two spaces for recommendation enhancement, where both the specificity and the consistency of the two spaces are captured. Concretely, we propose 1) a dual-side attentive network to capture the intra-domain patterns and the inter-domain patterns, 2) a sufficiency constraint to preserve the task-relevant information of each representation space and filter out the noise, and 3) a disentanglement constraint to avoid the model from discarding the unique information. These modules strike a balance between disentanglement and collaboration of the two representation spaces to produce informative pattern vectors, which could serve as extra features and be appended to arbitrary recommendation backbones for enhancement. Experiment results validate the superiority of our method against different models and the compatibility of DisCo over different backbones. Various ablation studies and efficiency analysis are also conducted to justify each model component. △ Less

Submitted 4 June, 2024; v1 submitted 20 May, 2024; originally announced June 2024.

arXiv:2405.16444 [pdf, other]

CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion

Authors: Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang

Abstract: Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, and when they are not, their precomput… ▽ More Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, and when they are not, their precomputed KV caches cannot be directly used since they ignore the text's cross-attention with the preceding text in the LLM input. Thus, the benefits of reusing KV caches remain largely unrealized. This paper tackles just one question: when an LLM input contains multiple text chunks, how to quickly combine their precomputed KV caches in order to achieve the same generation quality as the expensive full prefill (i.e., without reusing KV cache)? We present CacheBlend, a scheme that reuses the pre-computed KV caches, regardless prefix or not, and selectively recomputes the KV values of a small subset of tokens to partially update each reused KV cache. In the meantime,the small extra delay for recomputing some tokens can be pipelined with the retrieval of KV caches within the same job,allowing CacheBlend to store KV caches in slower devices with more storage capacity while retrieving them without increasing the inference delay. By comparing CacheBlend with the state-of-the-art KV cache reusing schemes on three open-source LLMs of various sizes and four popular benchmark datasets of different tasks, we show that CacheBlend reduces time-to-first-token (TTFT) by 2.2-3.3X and increases the inference throughput by 2.8-5X, compared with full KV recompute, without compromising generation quality or incurring more storage cost. △ Less

Submitted 3 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.12442 [pdf, other]

Learning Structure and Knowledge Aware Representation with Large Language Models for Concept Recommendation

Authors: Qingyao Li, Wei Xia, Kounianhua Du, Qiji Zhang, Weinan Zhang, Ruiming Tang, Yong Yu

Abstract: Concept recommendation aims to suggest the next concept for learners to study based on their knowledge states and the human knowledge system. While knowledge states can be predicted using knowledge tracing models, previous approaches have not effectively integrated the human knowledge system into the process of designing these educational models. In the era of rapidly evolving Large Language Model… ▽ More Concept recommendation aims to suggest the next concept for learners to study based on their knowledge states and the human knowledge system. While knowledge states can be predicted using knowledge tracing models, previous approaches have not effectively integrated the human knowledge system into the process of designing these educational models. In the era of rapidly evolving Large Language Models (LLMs), many fields have begun using LLMs to generate and encode text, introducing external knowledge. However, integrating LLMs into concept recommendation presents two urgent challenges: 1) How to construct text for concepts that effectively incorporate the human knowledge system? 2) How to adapt non-smooth, anisotropic text encodings effectively for concept recommendation? In this paper, we propose a novel Structure and Knowledge Aware Representation learning framework for concept Recommendation (SKarREC). We leverage factual knowledge from LLMs as well as the precedence and succession relationships between concepts obtained from the knowledge graph to construct textual representations of concepts. Furthermore, we propose a graph-based adapter to adapt anisotropic text embeddings to the concept recommendation task. This adapter is pre-trained through contrastive learning on the knowledge graph to get a smooth and structure-aware concept representation. Then, it's fine-tuned through the recommendation task, forming a text-to-knowledge-to-recommendation adaptation pipeline, which effectively constructs a structure and knowledge-aware concept representation. Our method does a better job than previous adapters in transforming text encodings for application in concept recommendation. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed approach. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures

arXiv:2405.06902

Causal Inference from Slowly Varying Nonstationary Processes

Authors: Kang Du, Yu Xiang

Abstract: Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In t… ▽ More Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In this work, we propose a new class of restricted SCM, via a time-varying filter and stationary noise, and exploit the asymmetry from nonstationarity for causal identification in both bivariate and network settings. We propose efficient procedures by leveraging powerful estimates of the bivariate evolutionary spectra for slowly varying processes. Various synthetic and real datasets that involve high-order and non-smooth filters are evaluated to demonstrate the effectiveness of our proposed methodology. △ Less

Submitted 29 May, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

Comments: This work was intended as a replacement of arXiv:2012.13025 and any subsequent updates will appear there

arXiv:2405.02355 [pdf, other]

CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation

Authors: Kounianhua Du, Renting Rui, Huacan Chai, Lingyue Fu, Wei Xia, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang

Abstract: Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are… ▽ More Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are inherently logical and complex, making them hard to be correctly generated. Existing methods rely on multiple prompts to the large language model to explore better solutions, which is expensive. In this paper, we propose Syntax Graph Retrieval Augmented Code Generation (CodeGRAG) to enhance the performance of LLMs in single-round code generation tasks. CodeGRAG extracts and summarizes the control flow and data flow of code blocks to fill the gap between programming languages and natural language. The extracted external structural knowledge models the inherent flows of code blocks, which can facilitate LLMs for better understanding of code syntax and serve as a bridge among different programming languages. CodeGRAG significantly improves the code generation ability of LLMs and can even offer performance gain for cross-lingual code generation, e.g., C++ for Python. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.18164 [pdf, other]

Empirical approximation to invariant measures of mean-field Langevin dynamics

Authors: Wenjing Cao, Kai Du

Abstract: This paper is concerned with the approximation to invariant measures for Langevin dynamics of McKean--Vlasov type. Under dissipativity and Lipschitz conditions, we prove that the empirical measures of both the mean-field and self-interacting Langevin dynamics converge to the invariant measure in the Wasserstein distance. Numerical experiments are conducted to illustrate theoretical results. This paper is concerned with the approximation to invariant measures for Langevin dynamics of McKean--Vlasov type. Under dissipativity and Lipschitz conditions, we prove that the empirical measures of both the mean-field and self-interacting Langevin dynamics converge to the invariant measure in the Wasserstein distance. Numerical experiments are conducted to illustrate theoretical results. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 21 pages, 1 figure

MSC Class: 60B10; 37M25; 82C31; 60H10

arXiv:2404.15245 [pdf, other]

Mining Invariance from Nonlinear Multi-Environment Data: Binary Classification

Authors: Austin Goddard, Kang Du, Yu Xiang

Abstract: Making predictions in an unseen environment given data from multiple training environments is a challenging task. We approach this problem from an invariance perspective, focusing on binary classification to shed light on general nonlinear data generation mechanisms. We identify a unique form of invariance that exists solely in a binary setting that allows us to train models invariant over environ… ▽ More Making predictions in an unseen environment given data from multiple training environments is a challenging task. We approach this problem from an invariance perspective, focusing on binary classification to shed light on general nonlinear data generation mechanisms. We identify a unique form of invariance that exists solely in a binary setting that allows us to train models invariant over environments. We provide sufficient conditions for such invariance and show it is robust even when environmental conditions vary greatly. Our formulation admits a causal interpretation, allowing us to compare it with various frameworks. Finally, we propose a heuristic prediction method and conduct experiments using real and synthetic datasets. △ Less

Submitted 3 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: Accepted to the 2024 International Symposium on Information Theory (ISIT)

arXiv:2404.07490 [pdf, other]

Low-energy spin dynamics in a Kitaev material Na3Ni2BiO6 investigated by NMR

Authors: Xinyu Shi, Yi Cui, Yanyan Shangguan, Xiaoyu Xu, Zhanlong Wu, Ze Hu, Shuo Li, Kefan Du, Ying Chen, Long Ma, Zhengxin Liu, Jinsheng Wen, Jinshan Zhang, Weiqiang Yu

Abstract: We performed 23Na NMR and magnetization measurements on an S = 1, quasi-2D honeycomb lattice antiferromagnet Na3Ni2BiO6. A large positive Curie-Weiss constant of 22.9 K is observed. The NMR spectra at low fields are consistent with a "zigzag" magnetic order, indicating a large easy-axis anisotropy. With field applied along the c* axis, the NMR spectra confirm the existence of a 1/3-magnetization p… ▽ More We performed 23Na NMR and magnetization measurements on an S = 1, quasi-2D honeycomb lattice antiferromagnet Na3Ni2BiO6. A large positive Curie-Weiss constant of 22.9 K is observed. The NMR spectra at low fields are consistent with a "zigzag" magnetic order, indicating a large easy-axis anisotropy. With field applied along the c* axis, the NMR spectra confirm the existence of a 1/3-magnetization plateau phase between 5.1 T and 7.1 T. The transition from the zigzag order to the 1/3-magnetization plateau phase is also found to be a first-order type. A monotonic decrease of the spin gap is revealed in the 1/3-magnetization plateau phase, which reaches zero at a quantum critical field Hc = 8.35 T before entering the fully polarized phase. These data suggest the existence of exchange frustration in the system along with strong ferromagnetic interactions, hosting the possibility for Kitaev physics. Besides, well below the ordered phase, the 1/T1 at high fields shows either a level off or an enhancement upon cooling below 3 K, which suggests the existence of low-energy fluctuations. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 7 pages, 7 figures

arXiv:2404.04633 [pdf, other]

Context versus Prior Knowledge in Language Models

Authors: Kevin Du, Vésteinn Snæbjarnarson, Niklas Stoehr, Jennifer C. White, Aaron Schein, Ryan Cotterell

Abstract: To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context. We hypothesize that models perform this integration in a predictable way across different questions and contexts: models will rely more on prior knowledge for questions about entities (e.g., persons, places, etc.) that they are more familiar with due to… ▽ More To answer a question, language models often need to integrate prior knowledge learned during pretraining and new information presented in context. We hypothesize that models perform this integration in a predictable way across different questions and contexts: models will rely more on prior knowledge for questions about entities (e.g., persons, places, etc.) that they are more familiar with due to higher exposure in the training corpus, and be more easily persuaded by some contexts than others. To formalize this problem, we propose two mutual information-based metrics to measure a model's dependency on a context and on its prior about an entity: first, the persuasion score of a given context represents how much a model depends on the context in its decision, and second, the susceptibility score of a given entity represents how much the model can be swayed away from its original answer distribution about an entity. We empirically test our metrics for their validity and reliability. Finally, we explore and find a relationship between the scores and the model's expected familiarity with an entity, and provide two use cases to illustrate their benefits. △ Less

Submitted 16 June, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

Comments: Long paper accepted at ACL 2024

arXiv:2404.02547 [pdf, ps, other]

Well-posedness of the obstacle problem for stochastic nonlinear diffusion equations: an entropy formulation

Authors: Kai Du, Ruoyang Liu

Abstract: In this paper, we establish the existence, uniqueness and stability results for the obstacle problem associated with a degenerate nonlinear diffusion equation perturbed by conservative gradient noise. Our approach revolves round introducing a new entropy formulation for stochastic variational inequalities. As a consequence, we obtain a novel well-posedness result for the obstacle problem of determ… ▽ More In this paper, we establish the existence, uniqueness and stability results for the obstacle problem associated with a degenerate nonlinear diffusion equation perturbed by conservative gradient noise. Our approach revolves round introducing a new entropy formulation for stochastic variational inequalities. As a consequence, we obtain a novel well-posedness result for the obstacle problem of deterministic porous medium equations with nonlinear reaction terms. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 36 pages

MSC Class: 60H15; 35K86; 35K65; 47J20

arXiv:2404.00633 [pdf, other]

IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions

Authors: Zhijun Tu, Kunpeng Du, Hanting Chen, Hailing Wang, Wei Li, Jie Hu, Yunhe Wang

Abstract: Recent advances have demonstrated the powerful capability of transformer architecture in image restoration. However, our analysis indicates that existing transformerbased methods can not establish both exact global and local dependencies simultaneously, which are much critical to restore the details and missing content of degraded images. To this end, we present an efficient image processing trans… ▽ More Recent advances have demonstrated the powerful capability of transformer architecture in image restoration. However, our analysis indicates that existing transformerbased methods can not establish both exact global and local dependencies simultaneously, which are much critical to restore the details and missing content of degraded images. To this end, we present an efficient image processing transformer architecture with hierarchical attentions, called IPTV2, adopting a focal context self-attention (FCSA) and a global grid self-attention (GGSA) to obtain adequate token interactions in local and global receptive fields. Specifically, FCSA applies the shifted window mechanism into the channel self-attention, helps capture the local context and mutual interaction across channels. And GGSA constructs long-range dependencies in the cross-window grid, aggregates global information in spatial dimension. Moreover, we introduce structural re-parameterization technique to feed-forward network to further improve the model capability. Extensive experiments demonstrate that our proposed IPT-V2 achieves state-of-the-art results on various image processing tasks, covering denoising, deblurring, deraining and obtains much better trade-off for performance and computational complexity than previous methods. Besides, we extend our method to image generation as latent diffusion backbone, and significantly outperforms DiTs. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.17555 [pdf, ps, other]

Particle approximation for a conditional McKean--Vlasov stochastic differential equation

Authors: Kai Du, Yunzhang Li, Yuyang Ye

Abstract: In this paper, we construct a type of interacting particle systems to approximate a class of stochastic different equations whose coefficients depend on the conditional probability distributions of the processes given partial observations. After proving the well-posedness and regularity of the particle systems, we establish a quantitative convergence result for the empirical measures of the partic… ▽ More In this paper, we construct a type of interacting particle systems to approximate a class of stochastic different equations whose coefficients depend on the conditional probability distributions of the processes given partial observations. After proving the well-posedness and regularity of the particle systems, we establish a quantitative convergence result for the empirical measures of the particle systems in the Wasserstein space, as the number of particles increases. Moreover, we discuss an Euler--Maruyama scheme of the particle system and validate its strong convergence. A numerical experiment is conducted to illustrate our results. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.16520 [pdf, other]

CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification

Authors: Guangqian Yang, Kangrui Du, Zhihan Yang, Ye Du, Yongping Zheng, Shujun Wang

Abstract: Alzheimer's disease (AD) is an incurable neurodegenerative condition leading to cognitive and functional deterioration. Given the lack of a cure, prompt and precise AD diagnosis is vital, a complex process dependent on multiple factors and multi-modal data. While successful efforts have been made to integrate multi-modal representation learning into medical datasets, scant attention has been given… ▽ More Alzheimer's disease (AD) is an incurable neurodegenerative condition leading to cognitive and functional deterioration. Given the lack of a cure, prompt and precise AD diagnosis is vital, a complex process dependent on multiple factors and multi-modal data. While successful efforts have been made to integrate multi-modal representation learning into medical datasets, scant attention has been given to 3D medical images. In this paper, we propose Contrastive Masked Vim Autoencoder (CMViM), the first efficient representation learning method tailored for 3D multi-modal data. Our proposed framework is built on a masked Vim autoencoder to learn a unified multi-modal representation and long-dependencies contained in 3D medical images. We also introduce an intra-modal contrastive learning module to enhance the capability of the multi-modal Vim encoder for modeling the discriminative features in the same modality, and an inter-modal contrastive learning module to alleviate misaligned representation among modalities. Our framework consists of two main steps: 1) incorporate the Vision Mamba (Vim) into the mask autoencoder to reconstruct 3D masked multi-modal data efficiently. 2) align the multi-modal representations with contrastive learning mechanisms from both intra-modal and inter-modal aspects. Our framework is pre-trained and validated ADNI2 dataset and validated on the downstream task for AD classification. The proposed CMViM yields 2.7\% AUC performance improvement compared with other state-of-the-art methods. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 11 pages, 1 figure

arXiv:2403.12559 [pdf, other]

Confidence Self-Calibration for Multi-Label Class-Incremental Learning

Authors: Kaile Du, Yifan Zhou, Fan Lyu, Yuyang Li, Chen Lu, Guangcan Liu

Abstract: The partial label challenge in Multi-Label Class-Incremental Learning (MLCIL) arises when only the new classes are labeled during training, while past and future labels remain unavailable. This issue leads to a proliferation of false-positive errors due to erroneously high confidence multi-label predictions, exacerbating catastrophic forgetting within the disjoint label space. In this paper, we ai… ▽ More The partial label challenge in Multi-Label Class-Incremental Learning (MLCIL) arises when only the new classes are labeled during training, while past and future labels remain unavailable. This issue leads to a proliferation of false-positive errors due to erroneously high confidence multi-label predictions, exacerbating catastrophic forgetting within the disjoint label space. In this paper, we aim to refine multi-label confidence calibration in MLCIL and propose a Confidence Self-Calibration (CSC) approach. Firstly, for label relationship calibration, we introduce a class-incremental graph convolutional network that bridges the isolated label spaces by constructing learnable, dynamically extended label relationship graph. Then, for confidence calibration, we present a max-entropy regularization for each multi-label increment, facilitating confidence self-calibration through the penalization of over-confident output distributions. Our approach attains new state-of-the-art results in MLCIL tasks on both MS-COCO and PASCAL VOC datasets, with the calibration of label confidences confirmed through our methodology. △ Less

Submitted 12 August, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: Accepted at the European Conference on Computer Vision (ECCV) 2024

arXiv:2403.11434 [pdf, other]

Earth+: on-board satellite imagery compression leveraging historical earth observations

Authors: Kuntai Du, Yihua Cheng, Peder Olsen, Shadi Noghabi, Ranveer Chandra, Junchen Jiang

Abstract: With the increasing deployment of earth observation satellite constellations, the downlink (satellite-to-ground) capacity often limits the freshness, quality, and coverage of the imagery data available to applications on the ground. To overcome the downlink limitation, we present Earth+, a new satellite imagery compression system that, instead of compressing each image individually, pinpoints and… ▽ More With the increasing deployment of earth observation satellite constellations, the downlink (satellite-to-ground) capacity often limits the freshness, quality, and coverage of the imagery data available to applications on the ground. To overcome the downlink limitation, we present Earth+, a new satellite imagery compression system that, instead of compressing each image individually, pinpoints and downloads only recent imagery changes with respect to the history reference images. To minimize the amount of changes, it is critical to make reference images as fresh as possible. Earth+ enables each satellite to choose fresh reference images from not only its own history images but also past images of other satellites from an entire satellite constellation. To share reference images across satellites, Earth+ utilizes the limited capacity of the existing uplink (ground-to-satellite) by judiciously selecting and compressing reference images while still allowing accurate change detection. In short, Earth+ is the first to make reference-based compression efficient, by enabling constellation-wide sharing of fresh reference images across satellites. Our evaluation shows that Earth+ can reduce the downlink usage by a factor of 3.3 compared to state-of-the-art on-board image compression techniques while not sacrificing image quality, or using more on-board computing or storage resources, or more uplink bandwidth than currently available. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.01714 [pdf, other]

doi 10.1103/PhysRevB.109.184407

Molecular intercalation in the van der Waals antiferromagnets FePS3 and NiPS3

Authors: Cong Li, Ze Hu, Xiaofei Hou, Sheng Xu, Zhanlong Wu, Kefan Du, Shuo Li, Xiaoyu Xu, Ying Chen, Zeyu Wang, Tiancheng Mu, Tian-Long Xia, Yanfeng Guo, B. Normand, Weiqiang Yu, Yi Cui

Abstract: We have performed electrochemical treatment of the van der Waals antiferromagnetic materials FePS$_3$ and NiPS$_3$ with the ionic liquid EMIM-BF$_4$, achieving significant molecular intercalation. Mass analysis of the intercalated compounds, EMIM$_x$-FePS$_3$ and EMIM$_x$-NiPS$_3$, indicated respective intercalation levels, $x$, of approximately 27\% and 37\%, and X-ray diffraction measurements de… ▽ More We have performed electrochemical treatment of the van der Waals antiferromagnetic materials FePS$_3$ and NiPS$_3$ with the ionic liquid EMIM-BF$_4$, achieving significant molecular intercalation. Mass analysis of the intercalated compounds, EMIM$_x$-FePS$_3$ and EMIM$_x$-NiPS$_3$, indicated respective intercalation levels, $x$, of approximately 27\% and 37\%, and X-ray diffraction measurements demonstrated a massive (over 50\%) enhancement of the $c$-axis lattice parameters. To investigate the consequences of these changes for the magnetic properties, we performed magnetic susceptibility and $^{31}$P nuclear magnetic resonance (NMR) studies of both systems. For EMIM$_x$-FePS$_3$, intercalation reduces the magnetic ordering temperature from $T_N = 120$~K to 78~K, and we find a spin gap in the antiferromagnetic phase that drops from 45~K to 30~K. For EMIM$_x$-NiPS$_3$, the ordering temperature is almost unaffected (changing from 148~K to 145~K), but a change towards nearly isotropic spin fluctuations suggests an alteration of the magnetic Hamiltonian. Such relatively modest changes, given that the huge extension of the $c$ axes is expected to cause a very strong suppression any interlayer interactions, point unequivocally to the conclusion that the magnetic properties of both parent compounds are determined solely by two-dimensional (2D), intralayer physics. The changes in transition temperatures and low-temperature spin dynamics in both compounds therefore indicate that intercalation also results in a significant modulation of the intralayer magnetic interactions, which we propose is due to charge doping and localization on the P sites. Our study offers chemical intercalation with ionic liquids as an effective method to control not only the interlayer but also the intralayer interactions in quasi-2D magnetic materials. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Journal ref: Physical Review B 109, 184407(2024)

arXiv:2402.11492 [pdf, other]

Exponential Cluster Synchronization in Fast Switching Network Topologies: A Pinning Control Approach with Necessary and Sufficient Conditions

Authors: Ku Du, Yu Kang

Abstract: This research investigates the intricate domain of synchronization problem among multiple agents operating within a dynamic fast switching network topology. We concentrate on cluster synchronization within coupled linear system under pinning control, providing both necessary and sufficient conditions. As a pivotal aspect, this paper aim to president the weakest possible conditions to make the coup… ▽ More This research investigates the intricate domain of synchronization problem among multiple agents operating within a dynamic fast switching network topology. We concentrate on cluster synchronization within coupled linear system under pinning control, providing both necessary and sufficient conditions. As a pivotal aspect, this paper aim to president the weakest possible conditions to make the coupled linear system realize cluster synchronization exponentially. Within the context of fast switching framework, we initially examine the necessary conditions, commencing with the transformation of the consensus problem into a stability problem, introducing a new variable to make the coupled system achieve cluster synchronization if the system is controllable; communication topology switching fast enough and the coupling strength should be sufficiently robust. Then, by using the Lyapunov theorem, we also present that the state matrix controllable is necessary for cluster synchronization. Furthermore, this paper culminating in the incorporation of contraction theory and an invariant manifold, demonstrating that the switching topology has an average is imperative for achieving cluster synchronization. Finally, we introduce three simulations to validate the efficacy of the proposed approach. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.08182 [pdf, other]

Variational Continual Test-Time Adaptation

Authors: Fan Lyu, Kaile Du, Yuyang Li, Hanyu Zhao, Zhang Zhang, Guangcan Liu, Liang Wang

Abstract: The prior drift is crucial in Continual Test-Time Adaptation (CTTA) methods that only use unlabeled test data, as it can cause significant error propagation. In this paper, we introduce VCoTTA, a variational Bayesian approach to measure uncertainties in CTTA. At the source stage, we transform a pre-trained deterministic model into a Bayesian Neural Network (BNN) via a variational warm-up strategy,… ▽ More The prior drift is crucial in Continual Test-Time Adaptation (CTTA) methods that only use unlabeled test data, as it can cause significant error propagation. In this paper, we introduce VCoTTA, a variational Bayesian approach to measure uncertainties in CTTA. At the source stage, we transform a pre-trained deterministic model into a Bayesian Neural Network (BNN) via a variational warm-up strategy, injecting uncertainties into the model. During the testing time, we employ a mean-teacher update strategy using variational inference for the student model and exponential moving average for the teacher model. Our novel approach updates the student model by combining priors from both the source and teacher models. The evidence lower bound is formulated as the cross-entropy between the student and teacher models, along with the Kullback-Leibler (KL) divergence of the prior mixture. Experimental results on three datasets demonstrate the method's effectiveness in mitigating prior drift within the CTTA framework. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.06884 [pdf, other]

Low-Rank Approximation of Structural Redundancy for Self-Supervised Learning

Authors: Kang Du, Yu Xiang

Abstract: We study the data-generating mechanism for reconstructive SSL to shed light on its effectiveness. With an infinite amount of labeled samples, we provide a sufficient and necessary condition for perfect linear approximation. The condition reveals a full-rank component that preserves the label classes of Y, along with a redundant component. Motivated by the condition, we propose to approximate the r… ▽ More We study the data-generating mechanism for reconstructive SSL to shed light on its effectiveness. With an infinite amount of labeled samples, we provide a sufficient and necessary condition for perfect linear approximation. The condition reveals a full-rank component that preserves the label classes of Y, along with a redundant component. Motivated by the condition, we propose to approximate the redundant component by a low-rank factorization and measure the approximation quality by introducing a new quantity $ε_s$, parameterized by the rank of factorization s. We incorporate $ε_s$ into the excess risk analysis under both linear regression and ridge regression settings, where the latter regularization approach is to handle scenarios when the dimension of the learned features is much larger than the number of labeled samples n for downstream tasks. We design three stylized experiments to compare SSL with supervised learning under different settings to support our theoretical findings. △ Less

Submitted 27 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: Accepted to the 3rd Conference on Causal Learning and Reasoning (CLeaR)

arXiv:2402.02547 [pdf]

Integration of cognitive tasks into artificial general intelligence test for large models

Authors: Youzhi Qu, Chen Wei, Penghui Du, Wenxin Che, Chi Zhang, Wanli Ouyang, Yatao Bian, Feiyang Xu, Bin Hu, Kai Du, Haiyan Wu, Jia Liu, Quanying Liu

Abstract: During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of… ▽ More During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests, aimed at fulfilling the testing needs of large models with enhanced capabilities. The cognitive science-inspired AGI tests encompass the full spectrum of intelligence facets, including crystallized intelligence, fluid intelligence, social intelligence, and embodied intelligence. To assess the multidimensional intelligence of large models, the AGI tests consist of a battery of well-designed cognitive tests adopted from human intelligence tests, and then naturally encapsulates into an immersive virtual community. We propose increasing the complexity of AGI testing tasks commensurate with advancements in large models and emphasizing the necessity for the interpretation of test results to avoid false negatives and false positives. We believe that cognitive science-inspired AGI tests will effectively guide the targeted improvement of large models in specific dimensions of intelligence and accelerate the integration of large models into human society. △ Less

Submitted 5 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.12961 [pdf, other]

doi 10.1145/3672198.3673797

Eloquent: A More Robust Transmission Scheme for LLM Token Streaming

Authors: Hanchen Li, Yuhan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, Junchen Jiang

Abstract: To render each generated token in real-time for users, the Large Language Model (LLM) server generates tokens one by one and streams each token (or group of a few tokens) through the network to the user right after generation, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet lo… ▽ More To render each generated token in real-time for users, the Large Language Model (LLM) server generates tokens one by one and streams each token (or group of a few tokens) through the network to the user right after generation, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of later tokens even if the packets containing them arrive on time. With a measurement study, we show that current applications suffer from increased stalls under unstable networks. For this emerging token streaming problem in LLM Chatbots that differs from previous multimedia and text applications, we propose a novel transmission scheme, called Eloquent, which puts newly generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and, in the meantime, is independently rendered when received, avoiding the aforementioned stalls caused by missing packets. Through simulation under various networks, we show Eloquent reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the retransmission method commonly used by real chatbot applications and by 31.6% compared to the baseline packet duplication scheme. By tailoring Eloquent to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI. △ Less

Submitted 16 June, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: In SIGCOMM Workshop on Networks for AI Computing (NAIC '24)

arXiv:2401.11788 [pdf, other]

Obtaining the pseudoinverse solution of singular range-symmetric linear systems with GMRES-type methods

Authors: Kui Du, Jia-Jun Fan, Fang Wang

Abstract: It is well known that for singular inconsistent range-symmetric linear systems, the generalized minimal residual (GMRES) method determines a least squares solution without breakdown. The reached least squares solution may be or not be the pseudoinverse solution. We show that a lift strategy can be used to obtain the pseudoinverse solution. In addition, we propose a new iterative method named RSMAR… ▽ More It is well known that for singular inconsistent range-symmetric linear systems, the generalized minimal residual (GMRES) method determines a least squares solution without breakdown. The reached least squares solution may be or not be the pseudoinverse solution. We show that a lift strategy can be used to obtain the pseudoinverse solution. In addition, we propose a new iterative method named RSMAR (minimum $\mathbf A$-residual) for range-symmetric linear systems $\mathbf A\mathbf x=\mathbf b$. At step $k$ RSMAR minimizes $\|\mathbf A\mathbf r_k\|$ in the $k$th Krylov subspace generated with $\{\mathbf A, \mathbf r_0\}$ rather than $\|\mathbf r_k\|$, where $\mathbf r_k$ is the $k$th residual vector and $\|\cdot\|$ denotes the Euclidean vector norm. We show that RSMAR and GMRES terminate with the same least squares solution when applied to range-symmetric linear systems. We provide two implementations for RSMAR. Our numerical experiments show that RSMAR is the most suitable method among GMRES-type methods for singular inconsistent range-symmetric linear systems. △ Less

Submitted 22 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 22 pages, 4 figures

MSC Class: 15A06; 15A09; 65F10; 65F25; 65F50

arXiv:2401.08221 [pdf, other]

Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets

Authors: Hang Chen, Xinyu Yang, Keqing Du

Abstract: Integrating deep learning and causal discovery has encouraged us to spot that learning causal structures and representations in dialogue and video is full of challenges. We defined These data forms as "Indefinite Data", characterized by multi-structure data and multi-value representations. Unlike existing adaptable data forms, Indefinite Data still faces gaps in datasets and methods. To address th… ▽ More Integrating deep learning and causal discovery has encouraged us to spot that learning causal structures and representations in dialogue and video is full of challenges. We defined These data forms as "Indefinite Data", characterized by multi-structure data and multi-value representations. Unlike existing adaptable data forms, Indefinite Data still faces gaps in datasets and methods. To address the dataset gap, we release two high-quality datasets - Causalogue and Causaction, containing text dialogue samples and video action samples with causal annotations respectively. Moreover, the method gap arises from the coexistence of multi-structure data and multi-value representations, breaking the assumptions of all current methods and rendering them infeasible on Indefinite Data. To this end, we propose a probabilistic framework as a baseline, incorporating three designed highlights for this gap: 1) establishing Causation Condition of representations using the independence of noise terms under non-fixed causal structures, 2) treating causal strength as a latent variable and measuring the reconstruction loss in the correlation space, and 3) estimating the effects of latent confounders. These highpoints make the probabilistic model capable of overcoming challenges brought by the coexistence of multi-structure data and multi-value representations and pave the way for the extension of latent confounders. Comprehensive experiments have evaluated baseline results of causal structures, causal representations, and confounding disentanglement. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: If you are interested in the two new datasets, pls contact us by email

arXiv:2401.02608 [pdf, other]

GPBiLQ and GPQMR: Two iterative methods for unsymmetric partitioned linear systems

Authors: Kui Du, Jia-Jun Fan, Fang Wang

Abstract: We introduce two iterative methods, GPBiLQ and GPQMR, for solving unsymmetric partitioned linear systems. The basic mechanism underlying GPBiLQ and GPQMR is a novel simultaneous tridiagonalization via biorthogonality that allows for short-recurrence iterative schemes. Similar to the biconjugate gradient method, it is possible to develop another method, GPBiCG, whose iterate (if it exists) can be o… ▽ More We introduce two iterative methods, GPBiLQ and GPQMR, for solving unsymmetric partitioned linear systems. The basic mechanism underlying GPBiLQ and GPQMR is a novel simultaneous tridiagonalization via biorthogonality that allows for short-recurrence iterative schemes. Similar to the biconjugate gradient method, it is possible to develop another method, GPBiCG, whose iterate (if it exists) can be obtained inexpensively from the GPBiLQ iterate. Whereas the iterate of GPBiCG may not exist, the iterates of GPBiLQ and GPQMR are always well defined as long as the biorthogonal tridiagonal reduction process does not break down. We discuss connections between the proposed methods and some existing methods, and give numerical experiments to illustrate the performance of the proposed methods. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 22 pages, 4 figures

MSC Class: 15A06; 65F10; 65F25; 65F50

arXiv:2401.01490 [pdf]

Chirality tuning and reversing with resonant phase-change metasurfaces

Authors: Xinbo Sha, Kang Du, Yixuan Zeng, Fangxing Lai, Jun Yin, Hanxu Zhang, Bo Song, Jiecai Han, Shumin Xiao, Yuri Kivshar, Qinghai Song

Abstract: Dynamic control of circular dichroism in photonic structures is critically important for compact spectrometers, stereoscopic displays, and information processing exploiting multiple degrees of freedom. Metasurfaces can help miniaturize chiral devices but only produce static and limited chiral responses. While external stimuli are able to tune resonances, their modulations are often weak, and rever… ▽ More Dynamic control of circular dichroism in photonic structures is critically important for compact spectrometers, stereoscopic displays, and information processing exploiting multiple degrees of freedom. Metasurfaces can help miniaturize chiral devices but only produce static and limited chiral responses. While external stimuli are able to tune resonances, their modulations are often weak, and reversing continuously the sign of circular dichroism is extremely challenging. Here, we demonstrate dynamically tunable chiral response of resonant metasurfaces supporting chiral bound states in the continuum combining them with phase-change materials. Phase transition between amorphous and crystalline phases allows to control chiral response and vary chirality rapidly from -0.947 to +0.958 backward and forward via chirality continuum. Our demonstrations underpin the rapid development of chiral photonics and its applications. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 14 pages, 4 figures

arXiv:2312.07081 [pdf]

Giant X-ray circular dichroism in a time-reversal invariant altermagnet

Authors: Jun Okamoto, Ru-Pan Wang, Yen-Yi Chu, Hung-Wei Shiu, Amol Singh, Hsiao-Yu Huang, Chung-Yu Mou, Sucitto Teh, Horng-Tay Jeng, Kai Du, Xianghan Xu, Sang-Wook Cheong, Chao-Hung Du, Chien-Te Chen, Atsushi Fujimori, Di-Jing Huang

Abstract: X-ray circular dichroism, arising from the contrast in X-ray absorption between opposite photon helicities, serves as a spectroscopic tool to measure the magnetization of ferromagnetic materials and identify the handedness of chiral crystals. Antiferromagnets with crystallographic chirality typically lack X-ray magnetic circular dichroism because of time-reversal symmetry, yet exhibit weak X-ray n… ▽ More X-ray circular dichroism, arising from the contrast in X-ray absorption between opposite photon helicities, serves as a spectroscopic tool to measure the magnetization of ferromagnetic materials and identify the handedness of chiral crystals. Antiferromagnets with crystallographic chirality typically lack X-ray magnetic circular dichroism because of time-reversal symmetry, yet exhibit weak X-ray natural circular dichroism. Here, we report the observation of giant natural circular dichroism in the Ni $L_3$-edge X-ray absorption of Ni$_3$TeO$_6$, a polar and chiral antiferromagnet with effective time-reversal symmetry. To unravel this intriguing phenomenon, we propose a phenomenological model that classifies the movement of photons in a chiral crystal within the same symmetry class as that of a magnetic field. The coupling of X-ray polarization with the induced magnetization yields giant X-ray natural circular dichroism, revealing the altermagnetism of Ni$_3$TeO$_6$. Our findings provide evidence for the interplay between magnetism and crystal chirality in natural optical activity. Additionally, we establish the first example of a new class of magnetic materials exhibiting circular dichroism with time-reversal symmetry. △ Less

Submitted 23 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted by Advanced Materials (2024.2.16) Revised title: Giant X-ray circular dichroism in a time-reversal invariant altermagnet Revised drafts: Main 14 pages, 4 figures, and SI 20 pages, 8 figures

arXiv:2311.18567 [pdf, other]

Grammatical Gender's Influence on Distributional Semantics: A Causal Perspective

Authors: Karolina Stańczak, Kevin Du, Adina Williams, Isabelle Augenstein, Ryan Cotterell

Abstract: How much meaning influences gender assignment across languages is an active area of research in modern linguistics and cognitive science. We can view current approaches as aiming to determine where gender assignment falls on a spectrum, from being fully arbitrarily determined to being largely semantically determined. For the latter case, there is a formulation of the neo-Whorfian hypothesis, which… ▽ More How much meaning influences gender assignment across languages is an active area of research in modern linguistics and cognitive science. We can view current approaches as aiming to determine where gender assignment falls on a spectrum, from being fully arbitrarily determined to being largely semantically determined. For the latter case, there is a formulation of the neo-Whorfian hypothesis, which claims that even inanimate noun gender influences how people conceive of and talk about objects (using the choice of adjective used to modify inanimate nouns as a proxy for meaning). We offer a novel, causal graphical model that jointly represents the interactions between a noun's grammatical gender, its meaning, and adjective choice. In accordance with past results, we find a relationship between the gender of nouns and the adjectives which modify them. However, when we control for the meaning of the noun, we find that grammatical gender has a near-zero effect on adjective choice, thereby calling the neo-Whorfian hypothesis into question. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.12401

CASR: Refining Action Segmentation via Marginalizing Frame-levle Causal Relationships

Authors: Keqing Du, Xinyu Yang, Hang Chen

Abstract: Integrating deep learning and causal discovery has increased the interpretability of Temporal Action Segmentation (TAS) tasks. However, frame-level causal relationships exist many complicated noises outside the segment-level, making it infeasible to directly express macro action semantics. Thus, we propose Causal Abstraction Segmentation Refiner (CASR), which can refine TAS results from various mo… ▽ More Integrating deep learning and causal discovery has increased the interpretability of Temporal Action Segmentation (TAS) tasks. However, frame-level causal relationships exist many complicated noises outside the segment-level, making it infeasible to directly express macro action semantics. Thus, we propose Causal Abstraction Segmentation Refiner (CASR), which can refine TAS results from various models by enhancing video causality in marginalizing frame-level casual relationships. Specifically, we define the equivalent frame-level casual model and segment-level causal model, so that the causal adjacency matrix constructed from marginalized frame-level causal relationships has the ability to represent the segmnet-level causal relationships. CASR works out by reducing the difference in the causal adjacency matrix between we constructed and pre-segmentation results of backbone models. In addition, we propose a novel evaluation metric Causal Edit Distance (CED) to evaluate the causal interpretability. Extensive experimental results on mainstream datasets indicate that CASR significantly surpasses existing various methods in action segmentation performance, as well as in causal explainability and generalization. △ Less

Submitted 26 January, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: We found that the paper needs to be modified in the model and all experiments must be re-run, so we request to withdraw the current version

arXiv:2311.12200 [pdf]

Hydrogen-induced tunable remanent polarization in a perovskite nickelate

Authors: Yifan Yuan, Michele Kotiuga, Tae Joon Park, Yuanyuan Ni, Arnob Saha, Hua Zhou, Jerzy T. Sadowski, Abdullah Al-Mahboob, Haoming Yu, Kai Du, Minning Zhu, Sunbin Deng, Ravindra S. Bisht, Xiao Lyu, Chung-Tse Michael Wu, Peide D. Ye, Abhronil Sengupta, Sang-Wook Cheong, Xiaoshan Xu, Karin M. Rabe, Shriram Ramanathan

Abstract: Materials with field-tunable polarization are of broad interest to condensed matter sciences and solid-state device technologies. Here, using hydrogen (H) donor doping, we modify the room temperature metallic phase of a perovskite nickelate NdNiO3 into an insulating phase with both metastable dipolar polarization and space-charge polarization. We then demonstrate transient negative differential ca… ▽ More Materials with field-tunable polarization are of broad interest to condensed matter sciences and solid-state device technologies. Here, using hydrogen (H) donor doping, we modify the room temperature metallic phase of a perovskite nickelate NdNiO3 into an insulating phase with both metastable dipolar polarization and space-charge polarization. We then demonstrate transient negative differential capacitance in thin film capacitors. The space-charge polarization caused by long-range movement and trapping of protons dominates when the electric field exceeds the threshold value. First-principles calculations suggest the polarization originates from the polar structure created by H doping. We find that polarization decays within ~1 second which is an interesting temporal regime for neuromorphic computing hardware design, and we implement the transient characteristics in a neural network to demonstrate unsupervised learning. These discoveries open new avenues for designing novel ferroelectric materials and electrets using light-ion doping. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 13 pages, 5 figures

arXiv:2311.11428 [pdf, other]

Self-interacting approximation to McKean-Vlasov long-time limit: a Markov chain Monte Carlo method

Authors: Kai Du, Zhenjie Ren, Florin Suciu, Songbo Wang

Abstract: For a certain class of McKean--Vlasov processes, we introduce proxy processes that substitute the mean-field interaction with self-interaction, employing a weighted occupation measure. Our study encompasses two key achievements. First, we demonstrate the ergodicity of the self-interacting dynamics, under broad conditions, by applying the reflection coupling method. Second, in scenarios where the d… ▽ More For a certain class of McKean--Vlasov processes, we introduce proxy processes that substitute the mean-field interaction with self-interaction, employing a weighted occupation measure. Our study encompasses two key achievements. First, we demonstrate the ergodicity of the self-interacting dynamics, under broad conditions, by applying the reflection coupling method. Second, in scenarios where the drifts are negative intrinsic gradients of convex mean-field potential functionals, we use entropy and functional inequalities to demonstrate that the stationary measures of the self-interacting processes approximate the invariant measures of the corresponding McKean--Vlasov processes. As an application, we show how to learn the optimal weights of a two-layer neural network by training a single neuron. △ Less

Submitted 14 January, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: 42 pages, 1 figure; contains a minor correction

arXiv:2311.07240 [pdf, other]

The \ion{H}{I}-rich Ultra-diffuse Galaxies follow the Extended Schmidt Law

Authors: Sai Zhai, Yong Shi, Zhi-Yu Zhang, Jun-Zhi Wang, Yu Gao, Qiusheng Gu, Tao Wang, Kaiyi Du, Xiaoling Yu, Xin Li

Abstract: The \ion{H}{I}-rich ultra-diffuse galaxies (HUDGs) offer a unique case for studies of star formation laws (SFLs) as they host low star formation efficiency (SFE) and low-metallicity environments where gas is predominantly atomic. We collect a sample of six HUDGs in the field and investigate their location in the extended Schmidt law(… ▽ More The \ion{H}{I}-rich ultra-diffuse galaxies (HUDGs) offer a unique case for studies of star formation laws (SFLs) as they host low star formation efficiency (SFE) and low-metallicity environments where gas is predominantly atomic. We collect a sample of six HUDGs in the field and investigate their location in the extended Schmidt law($Σ_{\text {SFR }} \propto \left(Σ_{\text{star}}^{0.5} Σ_{\text{gas}}\right)^{1.09}$). They are consistent with this relationship well (with deviations of only 1.1 sigma). Furthermore, we find that HUDGs follow the tight correlation between the hydrostatic pressure in the galaxy mid-plane and the quantity on the x-axis ($\rm log(Σ_{star}^{0.5}Σ_{gas})$) of the extended Schmidt law. This result indicates that these HUDGs can be self-regulated systems that reach the dynamical and thermal equilibrium. In this framework, the stellar gravity compresses the disk vertically and counteracts the gas pressure in the galaxy mid-plane to regulate the star formation as suggested by some theoretical models. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 6 pages, 4 figures, accepted for publication in MNRAS

arXiv:2311.00923 [pdf, other]

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

Authors: Hang Chen, Keqing Du, Chenguang Li, Xinyu Yang

Abstract: The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions… ▽ More The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: under review

arXiv:2310.19302 [pdf, other]

Empirical approximation to invariant measures of non-degenerate McKean-Vlasov dynamics

Authors: Wenjing Cao, Kai Du

Abstract: This paper studies the approximation of invariant measures of McKean-Vlasov dynamics with non-degenerate additive noise. While prior findings necessitated a strong monotonicity condition on the McKean-Vlasov process, we expand these results to encompass dissipative and weak interaction scenarios. Utilizing a reflection coupling technique, we prove that the empirical measures of the McKean-Vlasov p… ▽ More This paper studies the approximation of invariant measures of McKean-Vlasov dynamics with non-degenerate additive noise. While prior findings necessitated a strong monotonicity condition on the McKean-Vlasov process, we expand these results to encompass dissipative and weak interaction scenarios. Utilizing a reflection coupling technique, we prove that the empirical measures of the McKean-Vlasov process and its path-dependent counterpart can converge to the invariant measure in the Wasserstein metric. The Curie-Weiss mean-field lattice model serves as a numerical example to illustrate empirical approximation. △ Less

Submitted 23 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: 21 pages, 1 figure; typos corrected, email address updated

MSC Class: 60B10; 37M25; 60F25; 60H10

arXiv:2310.18634 [pdf, other]

SSL Framework for Causal Inconsistency between Structures and Representations

Authors: Hang Chen, Xinyu Yang, Keqing Du

Abstract: The cross-pollination of deep learning and causal discovery has catalyzed a burgeoning field of research seeking to elucidate causal relationships within non-statistical data forms like images, videos, and text. Such data, often being named `indefinite data', exhibit unique challenges-inconsistency between causal structure and representation, which are not common in conventional data forms. To tac… ▽ More The cross-pollination of deep learning and causal discovery has catalyzed a burgeoning field of research seeking to elucidate causal relationships within non-statistical data forms like images, videos, and text. Such data, often being named `indefinite data', exhibit unique challenges-inconsistency between causal structure and representation, which are not common in conventional data forms. To tackle this issue, we theoretically develop intervention strategies suitable for indefinite data and derive causal consistency condition (CCC). Moreover, we design a self-supervised learning (SSL) framework that considers interventions as `views' and CCC as a `philosophy' with two implement examples on Supervised Specialized Models (SSMs) and Large Language Models (LLMs), respectively. To evaluate pure inconsistency manifestations, we have prepared the first high-quality causal dialogue dataset-Causalogue. Evaluations are also performed on three other downstream tasks. Extensive experimentation has substantiated the efficacy of our methodology, illuminating how CCC could potentially play an influential role in various fields. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.07240 [pdf, other]

CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving

Authors: Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang

Abstract: As large language models (LLMs) take on complex tasks, their inputs are supplemented with longer contexts that incorporate domain knowledge. Yet using long contexts is challenging, as nothing can be generated until the whole context is processed by the LLM. While the context-processing delay can be reduced by reusing the KV cache of a context across different inputs, fetching the KV cache, which c… ▽ More As large language models (LLMs) take on complex tasks, their inputs are supplemented with longer contexts that incorporate domain knowledge. Yet using long contexts is challenging, as nothing can be generated until the whole context is processed by the LLM. While the context-processing delay can be reduced by reusing the KV cache of a context across different inputs, fetching the KV cache, which contains large tensors, over the network can cause high extra network delays. CacheGen is a fast context-loading module for LLM systems. First, CacheGen uses a custom tensor encoder, leveraging KV cache's distributional properties to encode a KV cache into more compact bitstream representations with negligible decoding overhead, to save bandwidth usage. Second, CacheGen adapts the compression level of different parts of a KV cache to cope with changes in available bandwidth, in order to maintain low context-loading delay and high generation quality. % When available bandwidth drops, CacheGen may raise the compression level for a part of the context or recompute its KV cache on the fly. We test CacheGen on popular LLMs and datasets. Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3.5-4.3x and the total delay in fetching and processing contexts by 3.2-3.7x with negligible impact on the LLM response quality. Our code is at: https://github.com/UChi-JCL/CacheGen. △ Less

Submitted 19 July, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: SIGCOMM'24

arXiv:2310.04685 [pdf, other]

Automatic and Efficient Customization of Neural Networks for ML Applications

Authors: Yuhan Liu, Chengcheng Wan, Kuntai Du, Henry Hoffmann, Junchen Jiang, Shan Lu, Michael Maire

Abstract: ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can caus… ▽ More ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API. However, these APIs offer the same pre-trained models regardless of how their output is used by different applications. This can be suboptimal as not all ML inference errors can cause application failures, and the distinction between inference errors that can or cannot cause failures varies greatly across applications. To tackle this problem, we first study 77 real-world applications, which collectively use six ML APIs from two providers, to reveal common patterns of how ML API output affects applications' decision processes. Inspired by the findings, we propose ChameleonAPI, an optimization framework for ML APIs, which takes effect without changing the application source code. ChameleonAPI provides application developers with a parser that automatically analyzes the application to produce an abstract of its decision process, which is then used to devise an application-specific loss function that only penalizes API output errors critical to the application. ChameleonAPI uses the loss function to efficiently train a neural network model customized for each application and deploys it to serve API invocations from the respective application via existing interface. Compared to a baseline that selects the best-of-all commercial ML API, we show that ChameleonAPI reduces incorrect application decisions by 43%. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.04394 [pdf, other]

doi 10.1103/PhysRevB.109.L041111

Spin-Mediated Direct Photon Scattering by Plasmons in BiTeI

Authors: A. C. Lee, S. Sarkar, K. Du, H. -H. Kung, C. J. Won, K. Wang, S. -W. Cheong, S. Maiti, G. Blumberg

Abstract: We use polarization resolved Raman spectroscopy to demonstrate that for a 3D giant Rashba system the bulk plasmon collective mode can directly couple to the Raman response even in the long wavelength $\mathbf q \rightarrow 0$ limit. Although conventional theory predicts the plasmon spectral weight to be suppressed as the square of its quasi-momentum and thus negligibly weak in the Raman spectra, w… ▽ More We use polarization resolved Raman spectroscopy to demonstrate that for a 3D giant Rashba system the bulk plasmon collective mode can directly couple to the Raman response even in the long wavelength $\mathbf q \rightarrow 0$ limit. Although conventional theory predicts the plasmon spectral weight to be suppressed as the square of its quasi-momentum and thus negligibly weak in the Raman spectra, we observe a sharp in-gap plasmon mode in the Raman spectrum of BiTeI below the Rashba continuum. This coupling, in a polar system with spin-orbit coupling, occurs without assistance from phonons when the incoming photon excitation is resonant with Rashba-split intermediate states. We discuss the distinctive features of BiTeI's giant Rashba system band structure that enable the direct observation of plasmon in Raman scattering. △ Less

Submitted 18 February, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Editors' Suggestion

Journal ref: Phys. Rev. B 109, L041111 (2024)

arXiv:2310.02422 [pdf, other]

doi 10.1145/3620678.3624653

OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation

Authors: Kuntai Du, Yuhan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, Yuyang Huang, Ganesh Ananthanarayanan, Junchen Jiang

Abstract: Deep learning inference on streaming media data, such as object detection in video or LiDAR feeds and text extraction from audio waves, is now ubiquitous. To achieve high inference accuracy, these applications typically require significant network bandwidth to gather high-fidelity data and extensive GPU resources to run deep neural networks (DNNs). While the high demand for network bandwidth and G… ▽ More Deep learning inference on streaming media data, such as object detection in video or LiDAR feeds and text extraction from audio waves, is now ubiquitous. To achieve high inference accuracy, these applications typically require significant network bandwidth to gather high-fidelity data and extensive GPU resources to run deep neural networks (DNNs). While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs. This paper presents OneAdapt, which meets these requirements by leveraging a gradient-ascent strategy to adapt configuration knobs. The key idea is to embrace DNNs' differentiability to quickly estimate the accuracy's gradient to each configuration knob, called AccGrad. Specifically, OneAdapt estimates AccGrad by multiplying two gradients: InputGrad (i.e. how each configuration knob affects the input to the DNN) and DNNGrad (i.e. how the DNN input affects the DNN inference output). We evaluate OneAdapt across five types of configurations, four analytic tasks, and five types of input data. Compared to state-of-the-art adaptation schemes, OneAdapt cuts bandwidth usage and GPU usage by 15-59% while maintaining comparable accuracy or improves accuracy by 1-5% while using equal or fewer resources. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: SoCC' 23

Showing 1–50 of 174 results for author: Du, K