Search | arXiv e-print repository

A Near-Optimal Algorithm for Convex Simple Bilevel Optimization under Weak Assumptions

Authors: Rujun Jiang, Xu Shi, Jiulin Wang

Abstract: Bilevel optimization provides a comprehensive framework that bridges single- and multi-objective optimization, encompassing various formulations, including standard nonlinear programs. This paper focuses on a specific class of bilevel optimization known as simple bilevel optimization. In these problems, the objective is to minimize a composite convex function over the optimal solution set of anoth… ▽ More Bilevel optimization provides a comprehensive framework that bridges single- and multi-objective optimization, encompassing various formulations, including standard nonlinear programs. This paper focuses on a specific class of bilevel optimization known as simple bilevel optimization. In these problems, the objective is to minimize a composite convex function over the optimal solution set of another composite convex minimization problem. By reformulating the simple bilevel problem as finding the left-most root of a nonlinear equation, we employ a bisection scheme to efficiently obtain a solution that is $ε$-optimal for both the upper- and lower-level objectives. In each iteration, the bisection narrows down an interval by assessing the feasibility of a discriminating criterion. By introducing a novel dual approach and employing the Accelerated Proximal Gradient (APG) method, we demonstrate that each subproblem in the bisection scheme can be solved in ${\mathcal{O}}(\sqrt{(L_{g_1}+2D_z L_{f_1}+1)/ε}|\logε|^2)$ oracle queries under weak assumptions. Here, $L_{f_1}$ and $L_{g_1}$ represent the Lipschitz constants of the gradients of the upper- and lower-level objectives' smooth components, and $D_z$ is the upper bound of the optimal multiplier of the subproblem. Considering the number of binary searches, the total complexity of our proposed method is ${\mathcal{O}}(\sqrt{(L_{g_1}+2D_z L_{f_1}+1)/ε}|\logε|^3)$. Our method achieves near-optimal complexity results, comparable to those in unconstrained smooth or composite convex optimization when disregarding the logarithmic terms. Numerical experiments also demonstrate the superior performance of our method compared to the state-of-the-art. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.08659 [pdf, other]

A Modified Initial Mass Function of the First Stars with Explodability Theory under Different Enrichment Scenarios

Authors: Ruizheng Jiang, Gang Zhao, Haining Li, Qianfan Xing

Abstract: The most metal-poor stars record the earliest metal enrichment triggered by Population III stars. By comparing observed abundance patterns with theoretical yields of metal-free stars, physical properties of their first star progenitors can be inferred, including zero-age main-sequence mass and explosion energy. In this work, the initial mass distribution (IMF) of first stars is obtained from the l… ▽ More The most metal-poor stars record the earliest metal enrichment triggered by Population III stars. By comparing observed abundance patterns with theoretical yields of metal-free stars, physical properties of their first star progenitors can be inferred, including zero-age main-sequence mass and explosion energy. In this work, the initial mass distribution (IMF) of first stars is obtained from the largest analysis to date of 406 very metal-poor stars with the newest LAMOST/Subaru high-resolution spectroscopic observations. However, the mass distribution fails to be consistent with the Salpeter IMF, which is also reported by previous studies. Here we modify the standard power-law function with explodability theory. The mass distribution of Population III stars could be well explained by ensuring the initial metal enrichment to originate from successful supernova explosions. Based on the modified power-law function, we suggest an extremely top-heavy or nearly flat initial mass function with a large explosion energy exponent. This indicates that supernova explodability should be considered in the earliest metal enrichment process in the Universe. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 15 pages, 5 figures, 1 table, accepted to ApJ

arXiv:2409.07018 [pdf, other]

Clustered Factor Analysis for Multivariate Spatial Data

Authors: Yanxiu Jin, Tomoya Wakayama, Renhe Jiang, Shonosuke Sugasawa

Abstract: Factor analysis has been extensively used to reveal the dependence structures among multivariate variables, offering valuable insight in various fields. However, it cannot incorporate the spatial heterogeneity that is typically present in spatial data. To address this issue, we introduce an effective method specifically designed to discover the potential dependence structures in multivariate spati… ▽ More Factor analysis has been extensively used to reveal the dependence structures among multivariate variables, offering valuable insight in various fields. However, it cannot incorporate the spatial heterogeneity that is typically present in spatial data. To address this issue, we introduce an effective method specifically designed to discover the potential dependence structures in multivariate spatial data. Our approach assumes that spatial locations can be approximately divided into a finite number of clusters, with locations within the same cluster sharing similar dependence structures. By leveraging an iterative algorithm that combines spatial clustering with factor analysis, we simultaneously detect spatial clusters and estimate a unique factor model for each cluster. The proposed method is evaluated through comprehensive simulation studies, demonstrating its flexibility. In addition, we apply the proposed method to a dataset of railway station attributes in the Tokyo metropolitan area, highlighting its practical applicability and effectiveness in uncovering complex spatial dependencies. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.05286 [pdf, other]

Seek and Solve Reasoning for Table Question Answering

Authors: Ruya Jiang, Chun Wang, Weihong Deng

Abstract: Table-based Question Answering (TQA) involves answering questions based on tabular data. The complexity of table structures and question logic makes this task difficult even for Large Language Models (LLMs). This paper improves TQA performance by leveraging LLMs' reasoning capabilities. Inspired by how humans solve TQA tasks, we propose a Seek-and-Solve pipeline that instructs the LLM to first see… ▽ More Table-based Question Answering (TQA) involves answering questions based on tabular data. The complexity of table structures and question logic makes this task difficult even for Large Language Models (LLMs). This paper improves TQA performance by leveraging LLMs' reasoning capabilities. Inspired by how humans solve TQA tasks, we propose a Seek-and-Solve pipeline that instructs the LLM to first seek relevant information and then answer questions. The two stages are integrated at the reasoning level, and their Chain of Thought (CoT) paths are integrated into a coherent Seek-and-Solve CoT (SS-CoT). Furthermore, we present a compact single-stage TQA-solving prompt distilled from the pipeline. Experiments demonstrate that under In-Context Learning settings, using samples with SS-CoT paths as demonstrations, the TQA-solving prompt can effectively guide the LLM to solve complex TQA tasks, resulting in improved performance and reliability. Our results highlight the importance of properly eliciting LLMs' reasoning capabilities in solving complex TQA tasks. △ Less

Submitted 8 September, 2024; originally announced September 2024.

arXiv:2408.16103 [pdf, other]

Orbital magnetoelectric coupling of three dimensional Chern insulators

Authors: Xin Lu, Renwen Jiang, Jianpeng Liu

Abstract: Orbital magnetoelectric effect is closely related to the band topology of bulk crystalline insulators. Typical examples include the half quantized Chern-Simons orbital magnetoelectric coupling in three dimensional (3D) axion insulators and topological insulators, which are the hallmarks of their nontrivial bulk band topology. While the Chern-Simons coupling is well defined only for insulators with… ▽ More Orbital magnetoelectric effect is closely related to the band topology of bulk crystalline insulators. Typical examples include the half quantized Chern-Simons orbital magnetoelectric coupling in three dimensional (3D) axion insulators and topological insulators, which are the hallmarks of their nontrivial bulk band topology. While the Chern-Simons coupling is well defined only for insulators with zero Chern number, the orbital magnetoelectric effects in 3D Chern insulators with nonzero (layer) Chern numbers are still open questions. In this work, we propose a never-mentioned quantization rule for the layer-resolved orbital magnetoelectric response in 3D Chern insulators, the gradient of which is exactly quantized in unit of $e^2/h$. By theoretical analysis and numerical simulations, we demonstrate that the quantized orbital magnetoelectric response remains robust for various types of interlayer hoppings and stackings, even against disorder and lack of symmetries. We argue that the robustness has a topological origin and protected by layer Chern number. It is thus promising to observe the proposed quantized orbital magnetoelectric response in a slab of 3D Chern insulator thanks to recent experimental developments. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: main text: 5 pages, 1 figure and 5 tables; SI: 15 pages, 5 figures and 2 tables

arXiv:2408.12594 [pdf, other]

Non-Homophilic Graph Pre-Training and Prompt Learning

Authors: Xingtong Yu, Jie Zhang, Yuan Fang, Renhe Jiang

Abstract: Graphs are ubiquitous for modeling complex relationships between objects across various fields. Graph neural networks (GNNs) have become a mainstream technique for graph-based applications, but their performance heavily relies on abundant labeled data. To reduce labeling requirement, pre-training and prompt learning has become a popular alternative. However, most existing prompt methods do not dif… ▽ More Graphs are ubiquitous for modeling complex relationships between objects across various fields. Graph neural networks (GNNs) have become a mainstream technique for graph-based applications, but their performance heavily relies on abundant labeled data. To reduce labeling requirement, pre-training and prompt learning has become a popular alternative. However, most existing prompt methods do not differentiate homophilic and heterophilic characteristics of real-world graphs. In particular, many real-world graphs are non-homophilic, not strictly or uniformly homophilic with mixing homophilic and heterophilic patterns, exhibiting varying non-homophilic characteristics across graphs and nodes. In this paper, we propose ProNoG, a novel pre-training and prompt learning framework for such non-homophilic graphs. First, we analyze existing graph pre-training methods, providing theoretical insights into the choice of pre-training tasks. Second, recognizing that each node exhibits unique non-homophilic characteristics, we propose a conditional network to characterize the node-specific patterns in downstream tasks. Finally, we thoroughly evaluate and analyze ProNoG through extensive experiments on ten public datasets. △ Less

Submitted 30 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

Comments: Under review

arXiv:2408.09667 [pdf, other]

BLADE: Benchmarking Language Model Agents for Data-Driven Science

Authors: Ken Gu, Ruoxi Shang, Ruien Jiang, Keying Kuang, Richard-John Lin, Donghe Lyu, Yue Mao, Youran Pan, Teng Wu, Jiaqian Yu, Yikun Zhang, Tianmai M. Zhang, Lanyi Zhu, Mike A. Merrill, Jeffrey Heer, Tim Althoff

Abstract: Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-dri… ▽ More Data-driven scientific discovery requires the iterative integration of scientific domain knowledge, statistical expertise, and an understanding of data semantics to make nuanced analytical decisions, e.g., about which variables, transformations, and statistical models to consider. LM-based agents equipped with planning, memory, and code execution capabilities have the potential to support data-driven science. However, evaluating agents on such open-ended tasks is challenging due to multiple valid approaches, partially correct steps, and different ways to express the same decisions. To address these challenges, we present BLADE, a benchmark to automatically evaluate agents' multifaceted approaches to open-ended research questions. BLADE consists of 12 datasets and research questions drawn from existing scientific literature, with ground truth collected from independent analyses by expert data scientists and researchers. To automatically evaluate agent responses, we developed corresponding computational methods to match different representations of analyses to this ground truth. Though language models possess considerable world knowledge, our evaluation shows that they are often limited to basic analyses. However, agents capable of interacting with the underlying data demonstrate improved, but still non-optimal, diversity in their analytical decision making. Our work enables the evaluation of agents for data-driven science and provides researchers deeper insights into agents' analysis approaches. △ Less

Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

arXiv:2408.06966 [pdf, other]

DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs

Authors: Dongyuan Li, Shiyin Tan, Ying Zhang, Ming Jin, Shirui Pan, Manabu Okumura, Renhe Jiang

Abstract: Dynamic graph learning aims to uncover evolutionary laws in real-world systems, enabling accurate social recommendation (link prediction) or early detection of cancer cells (classification). Inspired by the success of state space models, e.g., Mamba, for efficiently capturing long-term dependencies in language modeling, we propose DyG-Mamba, a new continuous state space model (SSM) for dynamic gra… ▽ More Dynamic graph learning aims to uncover evolutionary laws in real-world systems, enabling accurate social recommendation (link prediction) or early detection of cancer cells (classification). Inspired by the success of state space models, e.g., Mamba, for efficiently capturing long-term dependencies in language modeling, we propose DyG-Mamba, a new continuous state space model (SSM) for dynamic graph learning. Specifically, we first found that using inputs as control signals for SSM is not suitable for continuous-time dynamic network data with irregular sampling intervals, resulting in models being insensitive to time information and lacking generalization properties. Drawing inspiration from the Ebbinghaus forgetting curve, which suggests that memory of past events is strongly correlated with time intervals rather than specific details of the events themselves, we directly utilize irregular time spans as control signals for SSM to achieve significant robustness and generalization. Through exhaustive experiments on 12 datasets for dynamic link prediction and dynamic node classification tasks, we found that DyG-Mamba achieves state-of-the-art performance on most of the datasets, while also demonstrating significantly improved computation and memory efficiency. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.05563 [pdf, other]

Impacts of Darwinian Evolution on Pre-trained Deep Neural Networks

Authors: Guodong Du, Runhua Jiang, Senqiao Yang, Haoyang Li, Wei Chen, Keren Li, Sim Kuan Goh, Ho-Kin Tang

Abstract: Darwinian evolution of the biological brain is documented through multiple lines of evidence, although the modes of evolutionary changes remain unclear. Drawing inspiration from the evolved neural systems (e.g., visual cortex), deep learning models have demonstrated superior performance in visual tasks, among others. While the success of training deep neural networks has been relying on back-propa… ▽ More Darwinian evolution of the biological brain is documented through multiple lines of evidence, although the modes of evolutionary changes remain unclear. Drawing inspiration from the evolved neural systems (e.g., visual cortex), deep learning models have demonstrated superior performance in visual tasks, among others. While the success of training deep neural networks has been relying on back-propagation (BP) and its variants to learn representations from data, BP does not incorporate the evolutionary processes that govern biological neural systems. This work proposes a neural network optimization framework based on evolutionary theory. Specifically, BP-trained deep neural networks for visual recognition tasks obtained from the ending epochs are considered the primordial ancestors (initial population). Subsequently, the population evolved with differential evolution. Extensive experiments are carried out to examine the relationships between Darwinian evolution and neural network optimization, including the correspondence between datasets, environment, models, and living species. The empirical results show that the proposed framework has positive impacts on the network, with reduced over-fitting and an order of magnitude lower time complexity compared to BP. Moreover, the experiments show that the proposed framework performs well on deep neural networks and big datasets. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2408.05109 [pdf, other]

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

Authors: Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang

Abstract: Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its e… ▽ More Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: NL2SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to NL2SQL benchmarks; (3) Evaluation: Evaluating NL2SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing NL2SQL errors to find the root cause and guiding NL2SQL models to evolve. Moreover, we provide a rule of thumb for developing NL2SQL solutions. Finally, we discuss the research challenges and open problems of NL2SQL in the LLMs era. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2408.04570 [pdf, other]

Mathematical Programming For Adaptive Experiments

Authors: Ethan Che, Daniel R. Jiang, Hongseok Namkoong, Jimmy Wang

Abstract: Adaptive experimentation can significantly improve statistical power, but standard algorithms overlook important practical issues including batched and delayed feedback, personalization, non-stationarity, multiple objectives, and constraints. To address these issues, the current algorithm design paradigm crafts tailored methods for each problem instance. Since it is infeasible to devise novel algo… ▽ More Adaptive experimentation can significantly improve statistical power, but standard algorithms overlook important practical issues including batched and delayed feedback, personalization, non-stationarity, multiple objectives, and constraints. To address these issues, the current algorithm design paradigm crafts tailored methods for each problem instance. Since it is infeasible to devise novel algorithms for every real-world instance, practitioners often have to resort to suboptimal approximations that do not address all of their challenges. Moving away from developing bespoke algorithms for each setting, we present a mathematical programming view of adaptive experimentation that can flexibly incorporate a wide range of objectives, constraints, and statistical procedures. By formulating a dynamic program in the batched limit, our modeling framework enables the use of scalable optimization methods (e.g., SGD and auto-differentiation) to solve for treatment allocations. We evaluate our framework on benchmarks modeled after practical challenges such as non-stationarity, personalization, multi-objectives, and constraints. Unlike bespoke algorithms such as modified variants of Thomson sampling, our mathematical programming approach provides remarkably robust performance across instances. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.04531 [pdf, other]

AExGym: Benchmarks and Environments for Adaptive Experimentation

Authors: Jimmy Wang, Ethan Che, Daniel R. Jiang, Hongseok Namkoong

Abstract: Innovations across science and industry are evaluated using randomized trials (a.k.a. A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on… ▽ More Innovations across science and industry are evaluated using randomized trials (a.k.a. A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on real-world datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an open source library, AExGym, which is designed with modularity and extensibility in mind to allow experimentation practitioners to develop custom environments and algorithms. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.03841 [pdf, other]

MaxMind: A Memory Loop Network to Enhance Software Productivity based on Large Language Models

Authors: Yuchen Dong, XiaoXiang Fang, Yuchen Hu, Renshuang Jiang, Zhe Jiang

Abstract: The application of large language models to facilitate automated software operations and tool generation (SOTG), thus augmenting software productivity, mirrors the early stages of human evolution when the ability to create and use tools accelerated the progress of civilization. These complex tasks require AI to continuously summarize and improve. Current research often overlooks the importance of… ▽ More The application of large language models to facilitate automated software operations and tool generation (SOTG), thus augmenting software productivity, mirrors the early stages of human evolution when the ability to create and use tools accelerated the progress of civilization. These complex tasks require AI to continuously summarize and improve. Current research often overlooks the importance of converting real-time task experiences into system memory and differentiating the value of existing knowledge for future reference. This paper addresses these issues by evolving external memory models into Memory-Loop Networks for timely memorization and experience referencing. We also enhance a RAG mechanism with knowledge precision segmentation to utilize memory based on value differentiation, and design the MaxMind model for SOTG accordingly.To demonstrate our approach, we developed MaxMind4Sheet, an electronic spreadsheet processing system aligned with the MaxMind philosophy. Comparative experiments with SheetCopilot have demonstrated that the accumulation and recycling of task memories lead to a steady enhancement in task success rate, with an improvement rate of approximately 3%-6% per round in this implementation example. Note that as the memories continue to grow, this cumulative improvement may be substantial. The inclusion of memory recycling can also boost the system's task execution efficiency by up to 25%, and it can address the retraining issue faced by LLMs when handling specialized tasks through memories transfer.These suggest that MaxMind has significant potential to enhance the capabilities and productivity of LLM systems in SOTG. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2407.16914 [pdf, other]

Learning to Solve Bilevel Programs with Binary Tender

Authors: Bo Zhou, Ruiwei Jiang, Siqian Shen

Abstract: Bilevel programs (BPs) find a wide range of applications in fields such as energy, transportation, and machine learning. As compared to BPs with continuous (linear/convex) optimization problems in both levels, the BPs with discrete decision variables have received much less attention, largely due to the ensuing computational intractability and the incapability of gradient-based algorithms for hand… ▽ More Bilevel programs (BPs) find a wide range of applications in fields such as energy, transportation, and machine learning. As compared to BPs with continuous (linear/convex) optimization problems in both levels, the BPs with discrete decision variables have received much less attention, largely due to the ensuing computational intractability and the incapability of gradient-based algorithms for handling discrete optimization formulations. In this paper, we develop deep learning techniques to address this challenge. Specifically, we consider a BP with binary tender, wherein the upper and lower levels are linked via binary variables. We train a neural network to approximate the optimal value of the lower-level problem, as a function of the binary tender. Then, we obtain a single-level reformulation of the BP through a mixed-integer representation of the value function. Furthermore, we conduct a comparative analysis between two types of neural networks: general neural networks and the novel input supermodular neural networks, studying their representational capacities. To solve high-dimensional BPs, we introduce an enhanced sampling method to generate higher-quality samples and implement an iterative process to refine solutions. We demonstrate the performance of these approaches through extensive numerical experiments, whose lower-level problems are linear and mixed-integer programs, respectively. △ Less