-
A Near-Optimal Algorithm for Convex Simple Bilevel Optimization under Weak Assumptions
Authors:
Rujun Jiang,
Xu Shi,
Jiulin Wang
Abstract:
Bilevel optimization provides a comprehensive framework that bridges single- and multi-objective optimization, encompassing various formulations, including standard nonlinear programs. This paper focuses on a specific class of bilevel optimization known as simple bilevel optimization. In these problems, the objective is to minimize a composite convex function over the optimal solution set of anoth…
▽ More
Bilevel optimization provides a comprehensive framework that bridges single- and multi-objective optimization, encompassing various formulations, including standard nonlinear programs. This paper focuses on a specific class of bilevel optimization known as simple bilevel optimization. In these problems, the objective is to minimize a composite convex function over the optimal solution set of another composite convex minimization problem. By reformulating the simple bilevel problem as finding the left-most root of a nonlinear equation, we employ a bisection scheme to efficiently obtain a solution that is $ε$-optimal for both the upper- and lower-level objectives. In each iteration, the bisection narrows down an interval by assessing the feasibility of a discriminating criterion. By introducing a novel dual approach and employing the Accelerated Proximal Gradient (APG) method, we demonstrate that each subproblem in the bisection scheme can be solved in ${\mathcal{O}}(\sqrt{(L_{g_1}+2D_z L_{f_1}+1)/ε}|\logε|^2)$ oracle queries under weak assumptions. Here, $L_{f_1}$ and $L_{g_1}$ represent the Lipschitz constants of the gradients of the upper- and lower-level objectives' smooth components, and $D_z$ is the upper bound of the optimal multiplier of the subproblem. Considering the number of binary searches, the total complexity of our proposed method is ${\mathcal{O}}(\sqrt{(L_{g_1}+2D_z L_{f_1}+1)/ε}|\logε|^3)$. Our method achieves near-optimal complexity results, comparable to those in unconstrained smooth or composite convex optimization when disregarding the logarithmic terms. Numerical experiments also demonstrate the superior performance of our method compared to the state-of-the-art.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
A Modified Initial Mass Function of the First Stars with Explodability Theory under Different Enrichment Scenarios
Authors:
Ruizheng Jiang,
Gang Zhao,
Haining Li,
Qianfan Xing
Abstract:
The most metal-poor stars record the earliest metal enrichment triggered by Population III stars. By comparing observed abundance patterns with theoretical yields of metal-free stars, physical properties of their first star progenitors can be inferred, including zero-age main-sequence mass and explosion energy. In this work, the initial mass distribution (IMF) of first stars is obtained from the l…
▽ More
The most metal-poor stars record the earliest metal enrichment triggered by Population III stars. By comparing observed abundance patterns with theoretical yields of metal-free stars, physical properties of their first star progenitors can be inferred, including zero-age main-sequence mass and explosion energy. In this work, the initial mass distribution (IMF) of first stars is obtained from the largest analysis to date of 406 very metal-poor stars with the newest LAMOST/Subaru high-resolution spectroscopic observations. However, the mass distribution fails to be consistent with the Salpeter IMF, which is also reported by previous studies. Here we modify the standard power-law function with explodability theory. The mass distribution of Population III stars could be well explained by ensuring the initial metal enrichment to originate from successful supernova explosions. Based on the modified power-law function, we suggest an extremely top-heavy or nearly flat initial mass function with a large explosion energy exponent. This indicates that supernova explodability should be considered in the earliest metal enrichment process in the Universe.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Clustered Factor Analysis for Multivariate Spatial Data
Authors:
Yanxiu Jin,
Tomoya Wakayama,
Renhe Jiang,
Shonosuke Sugasawa
Abstract:
Factor analysis has been extensively used to reveal the dependence structures among multivariate variables, offering valuable insight in various fields. However, it cannot incorporate the spatial heterogeneity that is typically present in spatial data. To address this issue, we introduce an effective method specifically designed to discover the potential dependence structures in multivariate spati…
▽ More
Factor analysis has been extensively used to reveal the dependence structures among multivariate variables, offering valuable insight in various fields. However, it cannot incorporate the spatial heterogeneity that is typically present in spatial data. To address this issue, we introduce an effective method specifically designed to discover the potential dependence structures in multivariate spatial data. Our approach assumes that spatial locations can be approximately divided into a finite number of clusters, with locations within the same cluster sharing similar dependence structures. By leveraging an iterative algorithm that combines spatial clustering with factor analysis, we simultaneously detect spatial clusters and estimate a unique factor model for each cluster. The proposed method is evaluated through comprehensive simulation studies, demonstrating its flexibility. In addition, we apply the proposed method to a dataset of railway station attributes in the Tokyo metropolitan area, highlighting its practical applicability and effectiveness in uncovering complex spatial dependencies.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Seek and Solve Reasoning for Table Question Answering
Authors:
Ruya Jiang,
Chun Wang,
Weihong Deng
Abstract:
Table-based Question Answering (TQA) involves answering questions based on tabular data. The complexity of table structures and question logic makes this task difficult even for Large Language Models (LLMs). This paper improves TQA performance by leveraging LLMs' reasoning capabilities. Inspired by how humans solve TQA tasks, we propose a Seek-and-Solve pipeline that instructs the LLM to first see…
▽ More
Table-based Question Answering (TQA) involves answering questions based on tabular data. The complexity of table structures and question logic makes this task difficult even for Large Language Models (LLMs). This paper improves TQA performance by leveraging LLMs' reasoning capabilities. Inspired by how humans solve TQA tasks, we propose a Seek-and-Solve pipeline that instructs the LLM to first seek relevant information and then answer questions. The two stages are integrated at the reasoning level, and their Chain of Thought (CoT) paths are integrated into a coherent Seek-and-Solve CoT (SS-CoT). Furthermore, we present a compact single-stage TQA-solving prompt distilled from the pipeline. Experiments demonstrate that under In-Context Learning settings, using samples with SS-CoT paths as demonstrations, the TQA-solving prompt can effectively guide the LLM to solve complex TQA tasks, resulting in improved performance and reliability. Our results highlight the importance of properly eliciting LLMs' reasoning capabilities in solving complex TQA tasks.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Orbital magnetoelectric coupling of three dimensional Chern insulators
Authors:
Xin Lu,
Renwen Jiang,
Jianpeng Liu
Abstract:
Orbital magnetoelectric effect is closely related to the band topology of bulk crystalline insulators. Typical examples include the half quantized Chern-Simons orbital magnetoelectric coupling in three dimensional (3D) axion insulators and topological insulators, which are the hallmarks of their nontrivial bulk band topology. While the Chern-Simons coupling is well defined only for insulators with…
▽ More
Orbital magnetoelectric effect is closely related to the band topology of bulk crystalline insulators. Typical examples include the half quantized Chern-Simons orbital magnetoelectric coupling in three dimensional (3D) axion insulators and topological insulators, which are the hallmarks of their nontrivial bulk band topology. While the Chern-Simons coupling is well defined only for insulators with zero Chern number, the orbital magnetoelectric effects in 3D Chern insulators with nonzero (layer) Chern numbers are still open questions. In this work, we propose a never-mentioned quantization rule for the layer-resolved orbital magnetoelectric response in 3D Chern insulators, the gradient of which is exactly quantized in unit of $e^2/h$. By theoretical analysis and numerical simulations, we demonstrate that the quantized orbital magnetoelectric response remains robust for various types of interlayer hoppings and stackings, even against disorder and lack of symmetries. We argue that the robustness has a topological origin and protected by layer Chern number. It is thus promising to observe the proposed quantized orbital magnetoelectric response in a slab of 3D Chern insulator thanks to recent experimental developments.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Non-Homophilic Graph Pre-Training and Prompt Learning
Authors:
Xingtong Yu,
Jie Zhang,
Yuan Fang,
Renhe Jiang
Abstract:
Graphs are ubiquitous for modeling complex relationships between objects across various fields. Graph neural networks (GNNs) have become a mainstream technique for graph-based applications, but their performance heavily relies on abundant labeled data. To reduce labeling requirement, pre-training and prompt learning has become a popular alternative. However, most existing prompt methods do not dif…
▽ More
Graphs are ubiquitous for modeling complex relationships between objects across various fields. Graph neural networks (GNNs) have become a mainstream technique for graph-based applications, but their performance heavily relies on abundant labeled data. To reduce labeling requirement, pre-training and prompt learning has become a popular alternative. However, most existing prompt methods do not differentiate homophilic and heterophilic characteristics of real-world graphs. In particular, many real-world graphs are non-homophilic, not strictly or uniformly homophilic with mixing homophilic and heterophilic patterns, exhibiting varying non-homophilic characteristics across graphs and nodes. In this paper, we propose ProNoG, a novel pre-training and prompt learning framework for such non-homophilic graphs. First, we analyze existing graph pre-training methods, providing theoretical insights into the choice of pre-training tasks. Second, recognizing that each node exhibits unique non-homophilic characteristics, we propose a conditional network to characterize the node-specific patterns in downstream tasks. Finally, we thoroughly evaluate and analyze ProNoG through extensive experiments on ten public datasets.
△ Less
Submitted 30 August, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
BLADE: Benchmarking Language Model Agents for Data-Driven Science
Authors:
Ken Gu,
Ruoxi Shang,
Ruien Jiang,
Keying Kuang,
Richard-John Lin,
Donghe Lyu,
Yue Mao,
Youran Pan,
Teng Wu,
Jiaqian Yu,
Yikun Zhang,
Tianmai M. Zhang,
Lanyi Zhu,
Mike A. Merrill,
Jeffrey Heer,
Tim Althoff
Abstract:
Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-dri…
▽ More
Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-driven science. However, evaluating agents on such open-ended tasks is challenging due to multiple valid approaches, partially correct steps, and different ways to express the same decisions. To address these challenges, we present BLADE, a benchmark to automatically evaluate agents' multifaceted approaches to open-ended research questions. BLADE consists of 12 datasets and research questions drawn from existing scientific literature, with ground truth collected from independent analyses by expert data scientists and researchers. To automatically evaluate agent responses, we developed corresponding computational methods to match different representations of analyses to this ground truth. Though language models possess considerable world knowledge, our evaluation shows that they are often limited to basic analyses. However, agents capable of interacting with the underlying data demonstrate improved, but still non-optimal, diversity in their analytical decision making. Our work enables the evaluation of agents for data-driven science and provides researchers deeper insights into agents' analysis approaches.
△ Less
Submitted 20 August, 2024; v1 submitted 18 August, 2024;
originally announced August 2024.
-
DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs
Authors:
Dongyuan Li,
Shiyin Tan,
Ying Zhang,
Ming Jin,
Shirui Pan,
Manabu Okumura,
Renhe Jiang
Abstract:
Dynamic graph learning aims to uncover evolutionary laws in real-world systems, enabling accurate social recommendation (link prediction) or early detection of cancer cells (classification). Inspired by the success of state space models, e.g., Mamba, for efficiently capturing long-term dependencies in language modeling, we propose DyG-Mamba, a new continuous state space model (SSM) for dynamic gra…
▽ More
Dynamic graph learning aims to uncover evolutionary laws in real-world systems, enabling accurate social recommendation (link prediction) or early detection of cancer cells (classification). Inspired by the success of state space models, e.g., Mamba, for efficiently capturing long-term dependencies in language modeling, we propose DyG-Mamba, a new continuous state space model (SSM) for dynamic graph learning. Specifically, we first found that using inputs as control signals for SSM is not suitable for continuous-time dynamic network data with irregular sampling intervals, resulting in models being insensitive to time information and lacking generalization properties. Drawing inspiration from the Ebbinghaus forgetting curve, which suggests that memory of past events is strongly correlated with time intervals rather than specific details of the events themselves, we directly utilize irregular time spans as control signals for SSM to achieve significant robustness and generalization. Through exhaustive experiments on 12 datasets for dynamic link prediction and dynamic node classification tasks, we found that DyG-Mamba achieves state-of-the-art performance on most of the datasets, while also demonstrating significantly improved computation and memory efficiency.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Impacts of Darwinian Evolution on Pre-trained Deep Neural Networks
Authors:
Guodong Du,
Runhua Jiang,
Senqiao Yang,
Haoyang Li,
Wei Chen,
Keren Li,
Sim Kuan Goh,
Ho-Kin Tang
Abstract:
Darwinian evolution of the biological brain is documented through multiple lines of evidence, although the modes of evolutionary changes remain unclear. Drawing inspiration from the evolved neural systems (e.g., visual cortex), deep learning models have demonstrated superior performance in visual tasks, among others. While the success of training deep neural networks has been relying on back-propa…
▽ More
Darwinian evolution of the biological brain is documented through multiple lines of evidence, although the modes of evolutionary changes remain unclear. Drawing inspiration from the evolved neural systems (e.g., visual cortex), deep learning models have demonstrated superior performance in visual tasks, among others. While the success of training deep neural networks has been relying on back-propagation (BP) and its variants to learn representations from data, BP does not incorporate the evolutionary processes that govern biological neural systems. This work proposes a neural network optimization framework based on evolutionary theory. Specifically, BP-trained deep neural networks for visual recognition tasks obtained from the ending epochs are considered the primordial ancestors (initial population). Subsequently, the population evolved with differential evolution. Extensive experiments are carried out to examine the relationships between Darwinian evolution and neural network optimization, including the correspondence between datasets, environment, models, and living species. The empirical results show that the proposed framework has positive impacts on the network, with reduced over-fitting and an order of magnitude lower time complexity compared to BP. Moreover, the experiments show that the proposed framework performs well on deep neural networks and big datasets.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
Authors:
Xinyu Liu,
Shuyu Shen,
Boyan Li,
Peixian Ma,
Runzhi Jiang,
Yuyu Luo,
Yuxin Zhang,
Ju Fan,
Guoliang Li,
Nan Tang
Abstract:
Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its e…
▽ More
Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: NL2SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to NL2SQL benchmarks; (3) Evaluation: Evaluating NL2SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing NL2SQL errors to find the root cause and guiding NL2SQL models to evolve. Moreover, we provide a rule of thumb for developing NL2SQL solutions. Finally, we discuss the research challenges and open problems of NL2SQL in the LLMs era.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Mathematical Programming For Adaptive Experiments
Authors:
Ethan Che,
Daniel R. Jiang,
Hongseok Namkoong,
Jimmy Wang
Abstract:
Adaptive experimentation can significantly improve statistical power, but standard algorithms overlook important practical issues including batched and delayed feedback, personalization, non-stationarity, multiple objectives, and constraints. To address these issues, the current algorithm design paradigm crafts tailored methods for each problem instance. Since it is infeasible to devise novel algo…
▽ More
Adaptive experimentation can significantly improve statistical power, but standard algorithms overlook important practical issues including batched and delayed feedback, personalization, non-stationarity, multiple objectives, and constraints. To address these issues, the current algorithm design paradigm crafts tailored methods for each problem instance. Since it is infeasible to devise novel algorithms for every real-world instance, practitioners often have to resort to suboptimal approximations that do not address all of their challenges. Moving away from developing bespoke algorithms for each setting, we present a mathematical programming view of adaptive experimentation that can flexibly incorporate a wide range of objectives, constraints, and statistical procedures. By formulating a dynamic program in the batched limit, our modeling framework enables the use of scalable optimization methods (e.g., SGD and auto-differentiation) to solve for treatment allocations. We evaluate our framework on benchmarks modeled after practical challenges such as non-stationarity, personalization, multi-objectives, and constraints. Unlike bespoke algorithms such as modified variants of Thomson sampling, our mathematical programming approach provides remarkably robust performance across instances.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
AExGym: Benchmarks and Environments for Adaptive Experimentation
Authors:
Jimmy Wang,
Ethan Che,
Daniel R. Jiang,
Hongseok Namkoong
Abstract:
Innovations across science and industry are evaluated using randomized trials (a.k.a. A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on…
▽ More
Innovations across science and industry are evaluated using randomized trials (a.k.a. A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on real-world datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an open source library, AExGym, which is designed with modularity and extensibility in mind to allow experimentation practitioners to develop custom environments and algorithms.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models
Authors:
Yuchen Dong,
XiaoXiang Fang,
Yuchen Hu,
Renshuang Jiang,
Zhe Jiang
Abstract:
The application of large language models to facilitate automated software operations and tool generation (SOTG), thus augmenting software productivity, mirrors the early stages of human evolution when the ability to create and use tools accelerated the progress of civilization. These complex tasks require AI to continuously summarize and improve. Current research often overlooks the importance of…
▽ More
The application of large language models to facilitate automated software operations and tool generation (SOTG), thus augmenting software productivity, mirrors the early stages of human evolution when the ability to create and use tools accelerated the progress of civilization. These complex tasks require AI to continuously summarize and improve. Current research often overlooks the importance of converting real-time task experiences into system memory and differentiating the value of existing knowledge for future reference. This paper addresses these issues by evolving external memory models into Memory-Loop Networks for timely memorization and experience referencing. We also enhance a RAG mechanism with knowledge precision segmentation to utilize memory based on value differentiation, and design the MaxMind model for SOTG accordingly.To demonstrate our approach, we developed MaxMind4Sheet, an electronic spreadsheet processing system aligned with the MaxMind philosophy. Comparative experiments with SheetCopilot have demonstrated that the accumulation and recycling of task memories lead to a steady enhancement in task success rate, with an improvement rate of approximately 3%-6% per round in this implementation example. Note that as the memories continue to grow, this cumulative improvement may be substantial. The inclusion of memory recycling can also boost the system's task execution efficiency by up to 25%, and it can address the retraining issue faced by LLMs when handling specialized tasks through memories transfer.These suggest that MaxMind has significant potential to enhance the capabilities and productivity of LLM systems in SOTG.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Learning to Solve Bilevel Programs with Binary Tender
Authors:
Bo Zhou,
Ruiwei Jiang,
Siqian Shen
Abstract:
Bilevel programs (BPs) find a wide range of applications in fields such as energy, transportation, and machine learning. As compared to BPs with continuous (linear/convex) optimization problems in both levels, the BPs with discrete decision variables have received much less attention, largely due to the ensuing computational intractability and the incapability of gradient-based algorithms for hand…
▽ More
Bilevel programs (BPs) find a wide range of applications in fields such as energy, transportation, and machine learning. As compared to BPs with continuous (linear/convex) optimization problems in both levels, the BPs with discrete decision variables have received much less attention, largely due to the ensuing computational intractability and the incapability of gradient-based algorithms for handling discrete optimization formulations. In this paper, we develop deep learning techniques to address this challenge. Specifically, we consider a BP with binary tender, wherein the upper and lower levels are linked via binary variables. We train a neural network to approximate the optimal value of the lower-level problem, as a function of the binary tender. Then, we obtain a single-level reformulation of the BP through a mixed-integer representation of the value function. Furthermore, we conduct a comparative analysis between two types of neural networks: general neural networks and the novel input supermodular neural networks, studying their representational capacities. To solve high-dimensional BPs, we introduce an enhanced sampling method to generate higher-quality samples and implement an iterative process to refine solutions. We demonstrate the performance of these approaches through extensive numerical experiments, whose lower-level problems are linear and mixed-integer programs, respectively.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions
Authors:
Kai Liu,
Zhihang Fu,
Chao Chen,
Sheng Jin,
Ze Chen,
Mingyuan Tao,
Rongxin Jiang,
Jieping Ye
Abstract:
The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spur…
▽ More
The key to OOD detection has two aspects: generalized feature representation and precise category description. Recently, vision-language models such as CLIP provide significant advances in both two issues, but constructing precise category descriptions is still in its infancy due to the absence of unseen categories. This work introduces two hierarchical contexts, namely perceptual context and spurious context, to carefully describe the precise category boundary through automatic prompt tuning. Specifically, perceptual contexts perceive the inter-category difference (e.g., cats vs apples) for current classification tasks, while spurious contexts further identify spurious (similar but exactly not) OOD samples for every single category (e.g., cats vs panthers, apples vs peaches). The two contexts hierarchically construct the precise description for a certain category, which is, first roughly classifying a sample to the predicted category and then delicately identifying whether it is truly an ID sample or actually OOD. Moreover, the precise descriptions for those categories within the vision-language framework present a novel application: CATegory-EXtensible OOD detection (CATEX). One can efficiently extend the set of recognizable categories by simply merging the hierarchical contexts learned under different sub-task settings. And extensive experiments are conducted to demonstrate CATEX's effectiveness, robustness, and category-extensibility. For instance, CATEX consistently surpasses the rivals by a large margin with several protocols on the challenging ImageNet-1K dataset. In addition, we offer new insights on how to efficiently scale up the prompt engineering in vision-language models to recognize thousands of object categories, as well as how to incorporate large language models (like GPT-3) to boost zero-shot applications. Code will be made public soon.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge
Authors:
Kai Liu,
Ze Chen,
Zhihang Fu,
Rongxin Jiang,
Fan Zhou,
Yaowu Chen,
Yue Wu,
Jieping Ye
Abstract:
This paper presents a pioneering methodology, termed StructTuning, to efficiently transform foundation Large Language Models (LLMs) into domain specialists. It significantly minimizes the training corpus requirement to a mere 0.3% while achieving an impressive 50% of traditional knowledge injection performance. Our method is inspired by the educational processes for human students, particularly ho…
▽ More
This paper presents a pioneering methodology, termed StructTuning, to efficiently transform foundation Large Language Models (LLMs) into domain specialists. It significantly minimizes the training corpus requirement to a mere 0.3% while achieving an impressive 50% of traditional knowledge injection performance. Our method is inspired by the educational processes for human students, particularly how structured domain knowledge from textbooks is absorbed and then applied to tackle real-world challenges through specific exercises. Based on this, we propose a novel two-stage knowledge injection strategy: Structure-aware Continual Pre-Training (SCPT) and Structure-aware Supervised Fine-Tuning (SSFT). In the SCPT phase, we organize the training data into an auto-generated taxonomy of domain knowledge, enabling LLMs to effectively memorize textual segments linked to specific expertise within the taxonomy's architecture. Subsequently, in the SSFT phase, we explicitly prompt models to reveal the underlying knowledge structure in their outputs, leveraging this structured domain insight to address practical problems adeptly. Our ultimate method has undergone extensive evaluations across model architectures and scales, using closed-book question-answering tasks on LongBench and MMedBench datasets. Remarkably, our method matches 50% of the improvement displayed by the state-of-the-art MMedLM2 on MMedBench, but with only 0.3% quantity of the training corpus. This breakthrough showcases the potential to scale up our StructTuning for stronger domain-specific LLMs. Code will be made public soon.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Enhancing LLM's Cognition via Structurization
Authors:
Kai Liu,
Zhihang Fu,
Chao Chen,
Wei Zhang,
Rongxin Jiang,
Fan Zhou,
Yaowu Chen,
Yue Wu,
Jieping Ye
Abstract:
When reading long-form text, human cognition is complex and structurized. While large language models (LLMs) process input contexts through a causal and sequential perspective, this approach can potentially limit their ability to handle intricate and complex inputs effectively. To enhance LLM's cognition capability, this paper presents a novel concept of context structurization. Specifically, we t…
▽ More
When reading long-form text, human cognition is complex and structurized. While large language models (LLMs) process input contexts through a causal and sequential perspective, this approach can potentially limit their ability to handle intricate and complex inputs effectively. To enhance LLM's cognition capability, this paper presents a novel concept of context structurization. Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements. By doing so, LLMs can better grasp intricate and extended contexts through precise attention and information-seeking along the organized structures. Extensive evaluations are conducted across various model architectures and sizes (including several 7B- to 72B-size auto-regressive LLMs as well as BERT-like masking models) on a diverse set of NLP tasks (e.g., context-based question-answering, exhaustive hallucination evaluation, and passage-level dense retrieval). Empirical results show consistent and significant performance gains afforded by a single-round structurization. In particular, we boost a 72B-parameter open-source model to achieve comparable performance against GPT-3.5-Turbo as the hallucination evaluator. Besides, we show the feasibility of distilling advanced LLMs' language processing abilities to a smaller yet effective StruXGPT-7B to execute structurization, addressing the practicality of our approach. Code will be made public soon.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution
Authors:
Kai Liu,
Zhihang Fu,
Sheng Jin,
Chao Chen,
Ze Chen,
Rongxin Jiang,
Fan Zhou,
Yaowu Chen,
Jieping Ye
Abstract:
Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions. In real-world scenarios, however, the efficacy of existing OOD detection methods is often impeded by the inherent imbalance of in-distribution (ID) data, which causes significant performance decline. Through statistical observations, we have identified two comm…
▽ More
Detecting and rejecting unknown out-of-distribution (OOD) samples is critical for deployed neural networks to void unreliable predictions. In real-world scenarios, however, the efficacy of existing OOD detection methods is often impeded by the inherent imbalance of in-distribution (ID) data, which causes significant performance decline. Through statistical observations, we have identified two common challenges faced by different OOD detectors: misidentifying tail class ID samples as OOD, while erroneously predicting OOD samples as head class from ID. To explain this phenomenon, we introduce a generalized statistical framework, termed ImOOD, to formulate the OOD detection problem on imbalanced data distribution. Consequently, the theoretical analysis reveals that there exists a class-aware bias item between balanced and imbalanced OOD detection, which contributes to the performance gap. Building upon this finding, we present a unified training-time regularization technique to mitigate the bias and boost imbalanced OOD detectors across architecture designs. Our theoretically grounded method translates into consistent improvements on the representative CIFAR10-LT, CIFAR100-LT, and ImageNet-LT benchmarks against several state-of-the-art OOD detection approaches. Code will be made public soon.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
ESOD: Efficient Small Object Detection on High-Resolution Images
Authors:
Kai Liu,
Zhihang Fu,
Sheng Jin,
Ze Chen,
Fan Zhou,
Rongxin Jiang,
Yaowu Chen,
Jieping Ye
Abstract:
Enlarging input images is a straightforward and effective approach to promote small object detection. However, simple image enlargement is significantly expensive on both computations and GPU memory. In fact, small objects are usually sparsely distributed and locally clustered. Therefore, massive feature extraction computations are wasted on the non-target background area of images. Recent works h…
▽ More
Enlarging input images is a straightforward and effective approach to promote small object detection. However, simple image enlargement is significantly expensive on both computations and GPU memory. In fact, small objects are usually sparsely distributed and locally clustered. Therefore, massive feature extraction computations are wasted on the non-target background area of images. Recent works have tried to pick out target-containing regions using an extra network and perform conventional object detection, but the newly introduced computation limits their final performance. In this paper, we propose to reuse the detector's backbone to conduct feature-level object-seeking and patch-slicing, which can avoid redundant feature extraction and reduce the computation cost. Incorporating a sparse detection head, we are able to detect small objects on high-resolution inputs (e.g., 1080P or larger) for superior performance. The resulting Efficient Small Object Detection (ESOD) approach is a generic framework, which can be applied to both CNN- and ViT-based detectors to save the computation and GPU memory costs. Extensive experiments demonstrate the efficacy and efficiency of our method. In particular, our method consistently surpasses the SOTA detectors by a large margin (e.g., 8% gains on AP) on the representative VisDrone, UAVDT, and TinyPerson datasets. Code will be made public soon.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Datasets of Visualization for Machine Learning
Authors:
Can Liu,
Ruike Jiang,
Shaocong Tan,
Jiacheng Yu,
Chaofan Yang,
Hanning Shao,
Xiaoru Yuan
Abstract:
Datasets of visualization play a crucial role in automating data-driven visualization pipelines, serving as the foundation for supervised model training and algorithm benchmarking. In this paper, we survey the literature on visualization datasets and provide a comprehensive overview of existing visualization datasets, including their data types, formats, supported tasks, and openness. We propose a…
▽ More
Datasets of visualization play a crucial role in automating data-driven visualization pipelines, serving as the foundation for supervised model training and algorithm benchmarking. In this paper, we survey the literature on visualization datasets and provide a comprehensive overview of existing visualization datasets, including their data types, formats, supported tasks, and openness. We propose a what-why-how model for visualization datasets, considering the content of the dataset (what), the supported tasks (why), and the dataset construction process (how). This model provides a clear understanding of the diversity and complexity of visualization datasets. Additionally, we highlight the challenges faced by existing visualization datasets, including the lack of standardization in data types and formats and the limited availability of large-scale datasets. To address these challenges, we suggest future research directions.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Artist: Aesthetically Controllable Text-Driven Stylization without Training
Authors:
Ruixiang Jiang,
Changwen Chen
Abstract:
Diffusion models entangle content and style generation during the denoising process, leading to undesired content modification when directly applied to stylization tasks. Existing methods struggle to effectively control the diffusion model to meet the aesthetic-level requirements for stylization. In this paper, we introduce \textbf{Artist}, a training-free approach that aesthetically controls the…
▽ More
Diffusion models entangle content and style generation during the denoising process, leading to undesired content modification when directly applied to stylization tasks. Existing methods struggle to effectively control the diffusion model to meet the aesthetic-level requirements for stylization. In this paper, we introduce \textbf{Artist}, a training-free approach that aesthetically controls the content and style generation of a pretrained diffusion model for text-driven stylization. Our key insight is to disentangle the denoising of content and style into separate diffusion processes while sharing information between them. We propose simple yet effective content and style control methods that suppress style-irrelevant content generation, resulting in harmonious stylization results. Extensive experiments demonstrate that our method excels at achieving aesthetic-level stylization requirements, preserving intricate details in the content image and aligning well with the style prompt. Furthermore, we showcase the highly controllability of the stylization strength from various perspectives. Code will be released, project home page: https://DiffusionArtist.github.io
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
SOREL: A Stochastic Algorithm for Spectral Risks Minimization
Authors:
Yuze Ge,
Rujun Jiang
Abstract:
The spectral risk has wide applications in machine learning, especially in real-world decision-making, where people are not only concerned with models' average performance. By assigning different weights to the losses of different sample points, rather than the same weights as in the empirical risk, it allows the model's performance to lie between the average performance and the worst-case perform…
▽ More
The spectral risk has wide applications in machine learning, especially in real-world decision-making, where people are not only concerned with models' average performance. By assigning different weights to the losses of different sample points, rather than the same weights as in the empirical risk, it allows the model's performance to lie between the average performance and the worst-case performance. In this paper, we propose SOREL, the first stochastic gradient-based algorithm with convergence guarantees for the spectral risk minimization. Previous algorithms often consider adding a strongly concave function to smooth the spectral risk, thus lacking convergence guarantees for the original spectral risk. We theoretically prove that our algorithm achieves a near-optimal rate of $\widetilde{O}(1/\sqrtε)$ in terms of $ε$. Experiments on real datasets show that our algorithm outperforms existing algorithms in most cases, both in terms of runtime and sample complexity.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
GCS*: Forward Heuristic Search on Implicit Graphs of Convex Sets
Authors:
Shao Yuan Chew Chia,
Rebecca H. Jiang,
Bernhard Paus Graesdal,
Leslie Pack Kaelbling,
Russ Tedrake
Abstract:
We consider large-scale, implicit-search-based solutions to Shortest Path Problems on Graphs of Convex Sets (GCS). We propose GCS*, a forward heuristic search algorithm that generalizes A* search to the GCS setting, where a continuous-valued decision is made at each graph vertex, and constraints across graph edges couple these decisions, influencing costs and feasibility. Such mixed discrete-conti…
▽ More
We consider large-scale, implicit-search-based solutions to Shortest Path Problems on Graphs of Convex Sets (GCS). We propose GCS*, a forward heuristic search algorithm that generalizes A* search to the GCS setting, where a continuous-valued decision is made at each graph vertex, and constraints across graph edges couple these decisions, influencing costs and feasibility. Such mixed discrete-continuous planning is needed in many domains, including motion planning around obstacles and planning through contact. This setting provides a unique challenge for best-first search algorithms: the cost and feasibility of a path depend on continuous-valued points chosen along the entire path. We show that by pruning paths that are cost-dominated over their entire terminal vertex, GCS* can search efficiently while still guaranteeing cost-optimality and completeness. To find satisficing solutions quickly, we also present a complete but suboptimal variation, pruning instead reachability-dominated paths. We implement these checks using polyhedral-containment or sampling-based methods. The former implementation is complete and cost-optimal, while the latter is probabilistically complete and asymptotically cost-optimal and performs effectively even with minimal samples in practice. We demonstrate GCS* on planar pushing tasks where the combinatorial explosion of contact modes renders prior methods intractable and show it performs favorably compared to the state-of-the-art. Project website: https://shaoyuan.cc/research/gcs-star/
△ Less
Submitted 17 September, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks
Authors:
Lele Zheng,
Yang Cao,
Renhe Jiang,
Kenjiro Taura,
Yulong Shen,
Sheng Li,
Masatoshi Yoshikawa
Abstract:
Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion a…
▽ More
Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion attacks in spatiotemporal federated learning. In this paper, we explore the gradient attack problem in spatiotemporal federated learning from attack and defense perspectives. To understand privacy risks in spatiotemporal federated learning, we first propose Spatiotemporal Gradient Inversion Attack (ST-GIA), a gradient attack algorithm tailored to spatiotemporal data that successfully reconstructs the original location from gradients. Furthermore, we design an adaptive defense strategy to mitigate gradient inversion attacks in spatiotemporal federated learning. By dynamically adjusting the perturbation levels, we can offer tailored protection for varying rounds of training data, thereby achieving a better trade-off between privacy and utility than current state-of-the-art methods. Through intensive experimental analysis on three real-world datasets, we reveal that the proposed defense strategy can well preserve the utility of spatiotemporal federated learning with effective security protection.
△ Less
Submitted 15 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Deep Learning-based Anomaly Detection and Log Analysis for Computer Networks
Authors:
Shuzhan Wang,
Ruxue Jiang,
Zhaoqi Wang,
Yan Zhou
Abstract:
Computer network anomaly detection and log analysis, as an important topic in the field of network security, has been a key task to ensure network security and system reliability. First, existing network anomaly detection and log analysis methods are often challenged by high-dimensional data and complex network topologies, resulting in unstable performance and high false-positive rates. In additio…
▽ More
Computer network anomaly detection and log analysis, as an important topic in the field of network security, has been a key task to ensure network security and system reliability. First, existing network anomaly detection and log analysis methods are often challenged by high-dimensional data and complex network topologies, resulting in unstable performance and high false-positive rates. In addition, traditional methods are usually difficult to handle time-series data, which is crucial for anomaly detection and log analysis. Therefore, we need a more efficient and accurate method to cope with these problems. To compensate for the shortcomings of current methods, we propose an innovative fusion model that integrates Isolation Forest, GAN (Generative Adversarial Network), and Transformer with each other, and each of them plays a unique role. Isolation Forest is used to quickly identify anomalous data points, and GAN is used to generate synthetic data with the real data distribution characteristics to augment the training dataset, while the Transformer is used for modeling and context extraction on time series data. The synergy of these three components makes our model more accurate and robust in anomaly detection and log analysis tasks. We validate the effectiveness of this fusion model in an extensive experimental evaluation. Experimental results show that our model significantly improves the accuracy of anomaly detection while reducing the false alarm rate, which helps to detect potential network problems in advance. The model also performs well in the log analysis task and is able to quickly identify anomalous behaviors, which helps to improve the stability of the system. The significance of this study is that it introduces advanced deep learning techniques, which work anomaly detection and log analysis.
△ Less
Submitted 14 September, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
`Interaction annealing' to determine effective quantized valence and orbital structure: an illustration with ferro-orbital order in WTe$_2$
Authors:
Ruoshi Jiang,
Fangyuan Gu,
Wei Ku
Abstract:
Strongly correlated materials are known to display qualitatively distinct emergent behaviors at low energy. Conveniently, the superposition principle of quantum mechanics ensures that, upon absorbing quantum fluctuation, these rich low-energy behaviors can always be effectively described by dressed particles with fully quantized charge, spin, and orbitals structure. Such a powerful and simple desc…
▽ More
Strongly correlated materials are known to display qualitatively distinct emergent behaviors at low energy. Conveniently, the superposition principle of quantum mechanics ensures that, upon absorbing quantum fluctuation, these rich low-energy behaviors can always be effectively described by dressed particles with fully quantized charge, spin, and orbitals structure. Such a powerful and simple description is, however, difficult to access through density functional theory (DFT) calculations, since in terms of bare particles the quantum fluctuation would heavily smear the quantized quantities. To address this difficulty, we propose an `interaction annealing' approach to decipher the dominant valence and orbital structure by suppressing the charge fluctuation through enhancing ionic charging energy. Applying this approach to ferroelectric semi-metal WTe${_2}$ as a demonstration, we identify a dominant ferro-orbital ordered structure with W ion in a $d^2$ spin-0 configuration. The proposed approach is straightforward to implement in standard DFT calculations to grant additional access to essential low-energy physics.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Investigating the Segment Anything Foundation Model for Mapping Smallholder Agriculture Field Boundaries Without Training Labels
Authors:
Pratyush Tripathy,
Kathy Baylis,
Kyle Wu,
Jyles Watson,
Ruizhe Jiang
Abstract:
Accurate mapping of agricultural field boundaries is crucial for enhancing outcomes like precision agriculture, crop monitoring, and yield estimation. However, extracting these boundaries from satellite images is challenging, especially for smallholder farms and data-scarce environments. This study explores the Segment Anything Model (SAM) to delineate agricultural field boundaries in Bihar, India…
▽ More
Accurate mapping of agricultural field boundaries is crucial for enhancing outcomes like precision agriculture, crop monitoring, and yield estimation. However, extracting these boundaries from satellite images is challenging, especially for smallholder farms and data-scarce environments. This study explores the Segment Anything Model (SAM) to delineate agricultural field boundaries in Bihar, India, using 2-meter resolution SkySat imagery without additional training. We evaluate SAM's performance across three model checkpoints, various input sizes, multi-date satellite images, and edge-enhanced imagery. Our results show that SAM correctly identifies about 58% of field boundaries, comparable to other approaches requiring extensive training data. Using different input image sizes improves accuracy, with the most significant improvement observed when using multi-date satellite images. This work establishes proof of concept for using SAM and maximizing its potential in agricultural field boundary mapping. Our work highlights SAM's potential in delineating agriculture field boundary in training-data scarce settings to enable a wide range of agriculture related analysis.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Enhancing Spatio-temporal Quantile Forecasting with Curriculum Learning: Lessons Learned
Authors:
Du Yin,
Jinliang Deng,
Shuang Ao,
Zechen Li,
Hao Xue,
Arian Prabowo,
Renhe Jiang,
Xuan Song,
Flora Salim
Abstract:
Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in perfo…
▽ More
Training models on spatio-temporal (ST) data poses an open problem due to the complicated and diverse nature of the data itself, and it is challenging to ensure the model's performance directly trained on the original ST data. While limiting the variety of training data can make training easier, it can also lead to a lack of knowledge and information for the model, resulting in a decrease in performance. To address this challenge, we presented an innovative paradigm that incorporates three separate forms of curriculum learning specifically targeting from spatial, temporal, and quantile perspectives. Furthermore, our framework incorporates a stacking fusion module to combine diverse information from three types of curriculum learning, resulting in a strong and thorough learning process. We demonstrated the effectiveness of this framework with extensive empirical evaluations, highlighting its better performance in addressing complex ST challenges. We provided thorough ablation studies to investigate the effectiveness of our curriculum and to explain how it contributes to the improvement of learning efficiency on ST data.
△ Less
Submitted 16 September, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Knowledge Fusion By Evolving Weights of Language Models
Authors:
Guodong Du,
Jing Li,
Hanting Liu,
Runhua Jiang,
Shuyang Yu,
Yifei Guo,
Sim Kuan Goh,
Ho-Kin Tang
Abstract:
Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to…
▽ More
Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets. This paper examines the approach of integrating multiple models from diverse training scenarios into a unified model. This unified model excels across various data domains and exhibits the ability to generalize well on out-of-domain data. We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms, which does not need further training or additional training data. Specifically, our method involves aggregating the weights of different language models into a population and subsequently generating offspring models through mutation and crossover operations. These offspring models are then evaluated against their parents, allowing for the preservation of those models that show enhanced performance on development datasets. Importantly, our model evolving strategy can be seamlessly integrated with existing model merging frameworks, offering a versatile tool for model enhancement. Experimental results on mainstream language models (i.e., encoder-only, decoder-only, encoder-decoder) reveal that Evolver outperforms previous state-of-the-art models by large margins. The code is publicly available at {https://github.com/duguodong7/model-evolution}.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
A Survey on Human Preference Learning for Large Language Models
Authors:
Ruili Jiang,
Kehai Chen,
Xuefeng Bai,
Zhixuan He,
Juntao Li,
Muyun Yang,
Tiejun Zhao,
Liqiang Nie,
Min Zhang
Abstract:
The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which ma…
▽ More
The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which may prevent a deeper comprehension of the relationships between human preferences and LLMs as well as the realization of their limitations. In this survey, we review the progress in exploring human preference learning for LLMs from a preference-centered perspective, covering the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs. We first categorize the human feedback according to data sources and formats. We then summarize techniques for human preferences modeling and compare the advantages and disadvantages of different schools of models. Moreover, we present various preference usage methods sorted by the objectives to utilize human preference signals. Finally, we summarize some prevailing approaches to evaluate LLMs in terms of alignment with human intentions and discuss our outlooks on the human intention alignment for LLMs.
△ Less
Submitted 18 June, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions
Authors:
Devyani Maladkar,
Ruichen Jiang,
Aryan Mokhtari
Abstract:
Adaptive gradient methods are arguably the most successful optimization algorithms for neural network training. While it is well-known that adaptive gradient methods can achieve better dimensional dependence than stochastic gradient descent (SGD) under favorable geometry for stochastic convex optimization, the theoretical justification for their success in stochastic non-convex optimization remain…
▽ More
Adaptive gradient methods are arguably the most successful optimization algorithms for neural network training. While it is well-known that adaptive gradient methods can achieve better dimensional dependence than stochastic gradient descent (SGD) under favorable geometry for stochastic convex optimization, the theoretical justification for their success in stochastic non-convex optimization remains elusive. In this paper, we aim to close this gap by analyzing the convergence rates of AdaGrad measured by the $\ell_1$-norm of the gradient. Specifically, when the objective has $L$-Lipschitz gradient and the stochastic gradient variance is bounded by $σ^2$, we prove a worst-case convergence rate of $\tilde{\mathcal{O}}(\frac{\sqrt{d}L}{\sqrt{T}} + \frac{\sqrt{d} σ}{T^{1/4}})$, where $d$ is the dimension of the problem.We also present a lower bound of $Ω(\frac{\sqrt{d}}{\sqrt{T}})$ for minimizing the gradient $\ell_1$-norm in the deterministic setting, showing the tightness of our upper bound in the noiseless case. Moreover, under more fine-grained assumptions on the smoothness structure of the objective and the gradient noise and under favorable gradient $\ell_1/\ell_2$ geometry, we show that AdaGrad can potentially shave a factor of $\sqrt{d}$ compared to SGD. To the best of our knowledge, this is the first result for adaptive gradient methods that demonstrates a provable gain over SGD in the non-convex setting.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
CADE: Cosine Annealing Differential Evolution for Spiking Neural Network
Authors:
Runhua Jiang,
Guodong Du,
Shuyang Yu,
Yifei Guo,
Sim Kuan Goh,
Ho-Kin Tang
Abstract:
Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate…
▽ More
Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence, yet optimizing them remains a formidable challenge for gradient-based methods due to their discrete, spike-based computation. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate the mutation factor (F) and crossover rate (CR) of differential evolution (DE) for the SNN model, i.e., Spiking Element Wise (SEW) ResNet. Extensive empirical evaluations were conducted to analyze CADE. CADE showed a balance in exploring and exploiting the search space, resulting in accelerated convergence and improved accuracy compared to existing gradient-based and DE-based methods. Moreover, an initialization method based on a transfer learning setting was developed, pretraining on a source dataset (i.e., CIFAR-10) and fine-tuning the target dataset (i.e., CIFAR-100), to improve population diversity. It was found to further enhance CADE for SNN. Remarkably, CADE elevates the performance of the highest accuracy SEW model by an additional 0.52 percentage points, underscoring its effectiveness in fine-tuning and enhancing SNNs. These findings emphasize the pivotal role of a scheduler for F and CR adjustment, especially for DE-based SNN. Source Code on Github: https://github.com/Tank-Jiang/CADE4SNN.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization
Authors:
Ruichen Jiang,
Ali Kavis,
Qiujiang Jin,
Sujay Sanghavi,
Aryan Mokhtari
Abstract:
We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic meth…
▽ More
We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic method and appropriately combine it with second-order information. Moreover, distinct from common adaptive schemes, we define the step size recursively as a function of the gradient norm and the prediction error in the optimistic update. We first analyze a variant where the step size requires knowledge of the Lipschitz constant of the Hessian. Under the additional assumption of Lipschitz continuous gradients, we further design a parameter-free version by tracking the Hessian Lipschitz constant locally and ensuring the iterates remain bounded. We also evaluate the practical performance of our algorithm by comparing it to existing second-order algorithms for minimax optimization.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Stochastic Newton Proximal Extragradient Method
Authors:
Ruichen Jiang,
Michał Dereziński,
Aryan Mokhtari
Abstract:
Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that…
▽ More
Stochastic second-order methods achieve fast local convergence in strongly convex optimization by using noisy Hessian estimates to precondition the gradient. However, these methods typically reach superlinear convergence only when the stochastic Hessian noise diminishes, increasing per-iteration costs over time. Recent work in [arXiv:2204.09266] addressed this with a Hessian averaging scheme that achieves superlinear convergence without higher per-iteration costs. Nonetheless, the method has slow global convergence, requiring up to $\tilde{O}(κ^2)$ iterations to reach the superlinear rate of $\tilde{O}((1/t)^{t/2})$, where $κ$ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in $\tilde{O}(κ)$ iterations. We accomplish this by extending the Hybrid Proximal Extragradient (HPE) framework, achieving fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
Authors:
Kejia Yin,
Varshanth R. Rao,
Ruowei Jiang,
Xudong Liu,
Parham Aarabi,
David B. Lindell
Abstract:
Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations to identify sparse facial landmarks in the absence of annotated data. To tackle this task, existing state-of-the-art (SOTA) methods (1) extract coarse features from backbones that are trained with instance-level self-supervised learning (SSL) paradigms, which neglect the…
▽ More
Self-supervised landmark estimation is a challenging task that demands the formation of locally distinct feature representations to identify sparse facial landmarks in the absence of annotated data. To tackle this task, existing state-of-the-art (SOTA) methods (1) extract coarse features from backbones that are trained with instance-level self-supervised learning (SSL) paradigms, which neglect the dense prediction nature of the task, (2) aggregate them into memory-intensive hypercolumn formations, and (3) supervise lightweight projector networks to naively establish full local correspondences among all pairs of spatial features. In this paper, we introduce SCE-MAE, a framework that (1) leverages the MAE, a region-level SSL method that naturally better suits the landmark prediction task, (2) operates on the vanilla feature map instead of on expensive hypercolumns, and (3) employs a Correspondence Approximation and Refinement Block (CARB) that utilizes a simple density peak clustering algorithm and our proposed Locality-Constrained Repellence Loss to directly hone only select local correspondences. We demonstrate through extensive experiments that SCE-MAE is highly effective and robust, outperforming existing SOTA methods by large margins of approximately 20%-44% on the landmark matching and approximately 9%-15% on the landmark detection tasks.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Continuous Temporal Domain Generalization
Authors:
Zekun Cai,
Guangji Bai,
Renhe Jiang,
Xuan Song,
Liang Zhao
Abstract:
Temporal Domain Generalization (TDG) addresses the challenge of training predictive models under temporally varying data distributions. Traditional TDG approaches typically focus on domain data collected at fixed, discrete time intervals, which limits their capability to capture the inherent dynamics within continuous-evolving and irregularly-observed temporal domains. To overcome this, this work…
▽ More
Temporal Domain Generalization (TDG) addresses the challenge of training predictive models under temporally varying data distributions. Traditional TDG approaches typically focus on domain data collected at fixed, discrete time intervals, which limits their capability to capture the inherent dynamics within continuous-evolving and irregularly-observed temporal domains. To overcome this, this work formalizes the concept of Continuous Temporal Domain Generalization (CTDG), where domain data are derived from continuous times and are collected at arbitrary times. CTDG tackles critical challenges including: 1) Characterizing the continuous dynamics of both data and models, 2) Learning complex high-dimensional nonlinear dynamics, and 3) Optimizing and controlling the generalization across continuous temporal domains. To address them, we propose a Koopman operator-driven continuous temporal domain generalization (Koodos) framework. We formulate the problem within a continuous dynamic system and leverage the Koopman theory to learn the underlying dynamics; the framework is further enhanced with a comprehensive optimization strategy equipped with analysis and control driven by prior knowledge of the dynamics patterns. Extensive experiments demonstrate the effectiveness and efficiency of our approach.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Adaptive Finite Element Method for a Nonlinear Helmholtz Equation with High Wave Number
Authors:
Run Jiang,
Haijun Wu,
Yifeng Xu,
Jun Zou
Abstract:
A nonlinear Helmholtz (NLH) equation with high frequencies and corner singularities is discretized by the linear finite element method (FEM). After deriving some wave-number-explicit stability estimates and the singularity decomposition for the NLH problem, a priori stability and error estimates are established for the FEM on shape regular meshes including the case of locally refined meshes. Then…
▽ More
A nonlinear Helmholtz (NLH) equation with high frequencies and corner singularities is discretized by the linear finite element method (FEM). After deriving some wave-number-explicit stability estimates and the singularity decomposition for the NLH problem, a priori stability and error estimates are established for the FEM on shape regular meshes including the case of locally refined meshes. Then a posteriori upper and lower bounds using a new residual-type error estimator, which is equivalent to the standard one, are derived for the FE solutions to the NLH problem. These a posteriori estimates have confirmed a significant fact that is also valid for the NLH problem, namely the residual-type estimator seriously underestimates the error of the FE solution in the preasymptotic regime, which was first observed by Babuška et al. [Int J Numer Methods Eng 40 (1997)] for a one-dimensional linear problem. Based on the new a posteriori error estimator, both the convergence and the quasi-optimality of the resulting adaptive finite element algorithm are proved the first time for the NLH problem, when the initial mesh size lying in the preasymptotic regime. Finally, numerical examples are presented to validate the theoretical findings and demonstrate that applying the continuous interior penalty (CIP) technique with appropriate penalty parameters can reduce the pollution errors efficiently. In particular, the nonlinear phenomenon of optical bistability with Gaussian incident waves is successfully simulated by the adaptive CIPFEM.
△ Less
Submitted 27 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Heterogeneity-Informed Meta-Parameter Learning for Spatiotemporal Time Series Forecasting
Authors:
Zheng Dong,
Renhe Jiang,
Haotian Gao,
Hangchen Liu,
Jinliang Deng,
Qingsong Wen,
Xuan Song
Abstract:
Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heter…
▽ More
Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heterogeneity through learning spatial and temporal embeddings, which can be viewed as a clustering process. Then, a novel spatiotemporal meta-parameter learning paradigm is proposed to learn spatiotemporal-specific parameters from meta-parameter pools, which is informed by the captured heterogeneity. Based on these ideas, we develop a Heterogeneity-Informed Spatiotemporal Meta-Network (HimNet) for spatiotemporal time series forecasting. Extensive experiments on five widely-used benchmarks demonstrate our method achieves state-of-the-art performance while exhibiting superior interpretability. Our code is available at https://github.com/XDZhelheim/HimNet.
△ Less
Submitted 3 September, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
RF-based Energy Harvesting: Nonlinear Models, Applications and Challenges
Authors:
Ruihong Jiang
Abstract:
So far, various aspects associated with wireless energy harvesting (EH) have been investigated from diverse perspectives, including energy sources and models, usage protocols, energy scheduling and optimization, and EH implementation in different wireless communication systems. However, a comprehensive survey specifically focusing on models of radio frequency (RF)-based EH behaviors has not yet be…
▽ More
So far, various aspects associated with wireless energy harvesting (EH) have been investigated from diverse perspectives, including energy sources and models, usage protocols, energy scheduling and optimization, and EH implementation in different wireless communication systems. However, a comprehensive survey specifically focusing on models of radio frequency (RF)-based EH behaviors has not yet been presented. To address this gap, this article provides an overview of the mainstream mathematical models that capture the nonlinear behavior of practical EH circuits, serving as a valuable handbook of mathematical models for EH application research. Moreover, we summarize the application of each nonlinear EH model, including the associated challenges and precautions. We also analyze the impact and advancements of each EH model on RF-based EH systems in wireless communication, utilizing artificial intelligence (AI) techniques. Additionally, we highlight emerging research directions in the context of nonlinear RF-based EH. This article aims to contribute to the future application of RF-based EH in novel communication research domains to a significant extent.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Decision-Dependent Uncertainty-Aware Distribution System Planning Under Wildfire Risk
Authors:
Felipe Piancó,
Alexandre Moreira,
Bruno Fanzeres,
Ruiwei Jiang,
Chaoyue Zhao,
Miguel Heleno
Abstract:
The interaction between power systems and wildfires can be dangerous and costly. Damaged structures, load shedding, and high operational costs are potential consequences when the grid is unprepared. In fact, the operation of distribution grids can be liable for the outbreak of wildfires when extreme weather conditions arise. Within this context, investment planning should consider the impact of op…
▽ More
The interaction between power systems and wildfires can be dangerous and costly. Damaged structures, load shedding, and high operational costs are potential consequences when the grid is unprepared. In fact, the operation of distribution grids can be liable for the outbreak of wildfires when extreme weather conditions arise. Within this context, investment planning should consider the impact of operational actions on the uncertainty related to wildfires that can directly affect line failure likelihood. Neglecting this can compromise the cost-benefit evaluation in planning system investments for wildfire risk. In this paper, we propose a decision-dependent uncertainty (DDU) aware methodology that provides the optimal portfolio of investments for distribution systems while considering that high power-flow levels through line segments in high-threat areas can ignite wildfires and, therefore, increase the probability of line failures. The methodology identifies the best combination of system upgrades (installation of new lines, hardening existing lines, and placement of switching devices) to provide the necessary leeway to operate the distribution system under wildfire-prone conditions. Our case study demonstrates that by modeling the DDU relationship between power flow prescriptions and line failures, investment decisions are more accurate and better prepare the grid infrastructure to deal with wildfire risk.
△ Less
Submitted 8 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Multi-Modality Spatio-Temporal Forecasting via Self-Supervised Learning
Authors:
Jiewen Deng,
Renhe Jiang,
Jiaqi Zhang,
Xuan Song
Abstract:
Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST fore…
▽ More
Multi-modality spatio-temporal (MoST) data extends spatio-temporal (ST) data by incorporating multiple modalities, which is prevalent in monitoring systems, encompassing diverse traffic demands and air quality assessments. Despite significant strides in ST modeling in recent years, there remains a need to emphasize harnessing the potential of information from different modalities. Robust MoST forecasting is more challenging because it possesses (i) high-dimensional and complex internal structures and (ii) dynamic heterogeneity caused by temporal, spatial, and modality variations. In this study, we propose a novel MoST learning framework via Self-Supervised Learning, namely MoSSL, which aims to uncover latent patterns from temporal, spatial, and modality perspectives while quantifying dynamic heterogeneity. Experiment results on two real-world MoST datasets verify the superiority of our approach compared with the state-of-the-art baselines. Model implementation is available at https://github.com/beginner-sketch/MoSSL.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Community-Invariant Graph Contrastive Learning
Authors:
Shiyin Tan,
Dongyuan Li,
Renhe Jiang,
Ying Zhang,
Manabu Okumura
Abstract:
Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current know…
▽ More
Graph augmentation has received great attention in recent years for graph contrastive learning (GCL) to learn well-generalized node/graph representations. However, mainstream GCL methods often favor randomly disrupting graphs for augmentation, which shows limited generalization and inevitably leads to the corruption of high-level graph information, i.e., the graph community. Moreover, current knowledge-based graph augmentation methods can only focus on either topology or node features, causing the model to lack robustness against various types of noise. To address these limitations, this research investigated the role of the graph community in graph augmentation and figured out its crucial advantage for learnable graph augmentation. Based on our observations, we propose a community-invariant GCL framework to maintain graph community structure during learnable graph augmentation. By maximizing the spectral changes, this framework unifies the constraints of both topology and feature augmentation, enhancing the model's robustness. Empirical evidence on 21 benchmark datasets demonstrates the exclusive merits of our framework. Code is released on Github (https://github.com/ShiyinTan/CI-GCL.git).
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Some inequalities related to Riesz transform on exterior Lipschitz domains
Authors:
Renjin Jiang,
Sibei Yang
Abstract:
Let $n\ge2$ and $\mathcal{L}=-\mathrm{div}(A\nabla\cdot)$ be an elliptic operator on $\mathbb{R}^n$. Given an exterior Lipschitz domain $Ω$, let $\mathcal{L}_D$ and $\mathcal{L}_N$ be the elliptic operators $\mathcal{L}$ on $Ω$ subject to the Dirichlet and the Neumann boundary {conditions}, respectively. For the Neumann operator, we show that the reverse inequality…
▽ More
Let $n\ge2$ and $\mathcal{L}=-\mathrm{div}(A\nabla\cdot)$ be an elliptic operator on $\mathbb{R}^n$. Given an exterior Lipschitz domain $Ω$, let $\mathcal{L}_D$ and $\mathcal{L}_N$ be the elliptic operators $\mathcal{L}$ on $Ω$ subject to the Dirichlet and the Neumann boundary {conditions}, respectively. For the Neumann operator, we show that the reverse inequality $\|\mathcal{L}_N^{1/2}f\|_{L^p(Ω)} \le C\|\nabla f\|_{L^p(Ω)}$ holds true for any $p\in(1,\infty)$. For the Dirichlet operator, it was known that the Riesz operator $\nabla \mathcal{L}_D^{-1/2}$ is not bounded for $p>2$ and $p\ge n$, even if $\mathcal{L}=-Δ$ being the Laplace operator. Suppose that $A$ are CMO coefficients or VMO coefficients satisfying certain perturbation property, and $\partialΩ$ is $C^1$, we prove that for $p>2$ and $p\in [n,\infty)$, it holds $$ \inf_{φ\in\mathcal{A}^p_0(Ω)}\left\|\nabla f-\nablaφ\right\|_{L^p(Ω)}\le C\left\|\mathcal{L}^{1/2}_D f\right\|_{L^p(Ω)} $$ for $f\in \dot{W}^{1,p}_0(Ω)$. Here $\mathcal{A}^p_0(Ω)=\{f\in \dot{W}^{1,p}_0(Ω):\,\mathcal{L}_Df=0\}$ is a non-trivial subspace generated by harmonic function in $Ω$ with zero boundary value.
△ Less
Submitted 25 April, 2024;
originally announced May 2024.
-
A Survey on Deep Active Learning: Recent Advances and New Frontiers
Authors:
Dongyuan Li,
Zhen Wang,
Yankai Chen,
Renhe Jiang,
Weiping Ding,
Manabu Okumura
Abstract:
Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and…
▽ More
Active learning seeks to achieve strong performance with fewer training samples. It does this by iteratively asking an oracle to label new selected samples in a human-in-the-loop manner. This technique has gained increasing popularity due to its broad applicability, yet its survey papers, especially for deep learning-based active learning (DAL), remain scarce. Therefore, we conduct an advanced and comprehensive survey on DAL. We first introduce reviewed paper collection and filtering. Second, we formally define the DAL task and summarize the most influential baselines and widely used datasets. Third, we systematically provide a taxonomy of DAL methods from five perspectives, including annotation types, query strategies, deep model architectures, learning paradigms, and training processes, and objectively analyze their strengths and weaknesses. Then, we comprehensively summarize main applications of DAL in Natural Language Processing (NLP), Computer Vision (CV), and Data Mining (DM), etc. Finally, we discuss challenges and perspectives after a detailed analysis of current studies. This work aims to serve as a useful and quick guide for researchers in overcoming difficulties in DAL. We hope that this survey will spur further progress in this burgeoning field.
△ Less
Submitted 15 July, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search
Authors:
Qiujiang Jin,
Ruichen Jiang,
Aryan Mokhtari
Abstract:
In this paper, we establish the first explicit and non-asymptotic global convergence analysis of the BFGS method when deployed with an inexact line search scheme that satisfies the Armijo-Wolfe conditions. We show that BFGS achieves a global convergence rate of $(1-\frac{1}κ)^k$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ=\frac{L}μ$ denotes the condition number. Furthe…
▽ More
In this paper, we establish the first explicit and non-asymptotic global convergence analysis of the BFGS method when deployed with an inexact line search scheme that satisfies the Armijo-Wolfe conditions. We show that BFGS achieves a global convergence rate of $(1-\frac{1}κ)^k$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ=\frac{L}μ$ denotes the condition number. Furthermore, if the objective function's Hessian is Lipschitz, BFGS with the Armijo-Wolfe line search achieves a linear convergence rate only determined by the line search parameters and independent of the condition number. These results hold for any initial point $x_0$ and any symmetric positive definite initial Hessian approximation matrix $B_0$, although the choice of $B_0$ affects the iteration count required to attain these rates. Specifically, we show that for $B_0 = LI$, the rate of $O((1-\frac{1}κ)^k)$ appears from the first iteration, while for $B_0 = μI$, it takes $d\log κ$ iterations. Conversely, the condition number-independent linear convergence rate for $B_0 = LI$ occurs after $O\left(κ\left(d +\frac{M \sqrt{f(x_0)-f(x_*)}}{μ^{3/2}}\right)\right)$ iterations, whereas for $B_0 = μI$, it holds after $O\left(\frac{M \sqrt{f(x_0)-f(x_*)}}{μ^{3/2}}\left(d\log κ+ κ\right)\right)$ iterations. Here, $d$ denotes the dimension of the problem, $M$ is the Lipschitz parameter of the Hessian, and $x_*$ denotes the optimal solution. We further leverage these global linear convergence results to characterize the overall iteration complexity of BFGS when deployed with the Armijo-Wolfe line search.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL
Authors:
Lang Qin,
Ziming Wang,
Runhao Jiang,
Rui Yan,
Huajin Tang
Abstract:
Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms,…
▽ More
Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Boolean Matching Reversible Circuits: Algorithm and Complexity
Authors:
Tian-Fu Chen,
Jie-Hong R. Jiang
Abstract:
Boolean matching is an important problem in logic synthesis and verification. Despite being well-studied for conventional Boolean circuits, its treatment for reversible logic circuits remains largely, if not completely, missing. This work provides the first such study. Given two (black-box) reversible logic circuits that are promised to be matchable, we check their equivalences under various input…
▽ More
Boolean matching is an important problem in logic synthesis and verification. Despite being well-studied for conventional Boolean circuits, its treatment for reversible logic circuits remains largely, if not completely, missing. This work provides the first such study. Given two (black-box) reversible logic circuits that are promised to be matchable, we check their equivalences under various input/output negation and permutation conditions subject to the availability/unavailability of their inverse circuits. Notably, among other results, we show that the equivalence up to input negation and permutation is solvable in quantum polynomial time, while its classical complexity is exponential. This result is arguably the first demonstration of quantum exponential speedup in solving design automation problems. Also, as a negative result, we show that the equivalence up to both input and output negations is not solvable in quantum polynomial time unless UNIQUE-SAT is, which is unlikely. This work paves the theoretical foundation of Boolean matching reversible circuits for potential applications, e.g., in quantum circuit synthesis.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Residual Connections Harm Abstract Feature Learning in Masked Autoencoders
Authors:
Xiao Zhang,
Ruoxi Jiang,
William Gao,
Rebecca Willett,
Michael Maire
Abstract:
We demonstrate that adding a weighting factor to decay the strength of identity shortcuts within residual networks substantially improves semantic feature learning in the state-of-the-art self-supervised masked autoencoding (MAE) paradigm. Our modification to the identity shortcuts within a VIT-B/16 backbone of an MAE boosts linear probing accuracy on ImageNet from 67.8% to 72.7%. This significant…
▽ More
We demonstrate that adding a weighting factor to decay the strength of identity shortcuts within residual networks substantially improves semantic feature learning in the state-of-the-art self-supervised masked autoencoding (MAE) paradigm. Our modification to the identity shortcuts within a VIT-B/16 backbone of an MAE boosts linear probing accuracy on ImageNet from 67.8% to 72.7%. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance.
△ Less
Submitted 20 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes
Authors:
Youshao Xiao,
Lin Ju,
Zhenglei Zhou,
Siyuan Li,
Zhaoxin Huan,
Dalong Zhang,
Rujie Jiang,
Lin Wang,
Xiaolu Zhang,
Lei Liang,
Jun Zhou
Abstract:
Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features. However, stragglers frequently occur in distributed training due to resource contention and hardware heterogeneity, which significantly hampers the training efficiency. Previous works only address part of the stragglers and could not adapti…
▽ More
Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features. However, stragglers frequently occur in distributed training due to resource contention and hardware heterogeneity, which significantly hampers the training efficiency. Previous works only address part of the stragglers and could not adaptively solve various stragglers in practice. Additionally, it is challenging to use a systematic framework to address all stragglers because different stragglers require diverse data allocation and fault-tolerance mechanisms. Therefore, this paper proposes a unified distributed training framework called AntDT (Ant Distributed Training Framework) to adaptively solve the straggler problems. Firstly, the framework consists of four components, including the Stateful Dynamic Data Sharding service, Monitor, Controller, and Agent. These components work collaboratively to efficiently distribute workloads and provide a range of pre-defined straggler mitigation methods with fault tolerance, thereby hiding messy details of data allocation and fault handling. Secondly, the framework provides a high degree of flexibility, allowing for the customization of straggler mitigation solutions based on the specific circumstances of the cluster. Leveraging this flexibility, we introduce two straggler mitigation solutions, namely AntDT-ND for non-dedicated clusters and AntDT-DD for dedicated clusters, as practical examples to resolve various types of stragglers at Ant Group. Justified by our comprehensive experiments and industrial deployment statistics, AntDT outperforms other SOTA methods more than 3x in terms of training efficiency. Additionally, in Alipay's homepage recommendation scenario, using AntDT reduces the training duration of the ranking model from 27.8 hours to just 5.4 hours.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Searches for multi-Z boson productions and anomalous gauge boson couplings at a muon collider
Authors:
Ruobing Jiang,
Chuqiao Jiang,
Alim Ruzi,
Tianyi Yang,
Yong Ban,
Qiang Li
Abstract:
Multi-boson productions can be exploited as novel probes either for standard model precision tests or new physics searches, and have become one of those popular topics in the ongoing LHC experiments, and in future collider studies, including those for electron-positron and muon-muon colliders. Here we focus on two examples, i.e., ZZZ direct productions through $μ^{+}μ^{-}$ annihilation at a 1 TeV…
▽ More
Multi-boson productions can be exploited as novel probes either for standard model precision tests or new physics searches, and have become one of those popular topics in the ongoing LHC experiments, and in future collider studies, including those for electron-positron and muon-muon colliders. Here we focus on two examples, i.e., ZZZ direct productions through $μ^{+}μ^{-}$ annihilation at a 1 TeV muon collider, and ZZ productions through vector boson scattering at a 10 TeV muon collider, with an integrated luminosity of $10 \, \text{ab}^{-1}$. Various channels are considered, including, such as $ZZZ \rightarrow 4l2ν$ and $ZZZ \rightarrow 4l + 2 \text{ jets}$, etc. Expected significance on these multi-Z boson production processes are provided based on a detailed Monte Carlo study and signal background analysis. Sensitives on anomalous gauge boson couplings are also presented.
△ Less
Submitted 28 May, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.