Abstract:Recently, learned query optimizers typically driven by deep learning models have attracted wide attention as they can offer similar or even better performance than state-of-the-art commercial optimizers. A successful learning optimizer often relies on enough high-quality load queries as training data, and poor-quality training will lead to the query failure of learned query optimizers. In this paper, we propose a novel training framework AlphaQO for robust learned query optimizers based on Reinforcement Learning (RL), and the robustness of the optimizers can be improved by finding the bad queries in advance. AlphaQO is a loop system consisting of two main components, namely the query generator and the learned optimizer. A query generator aims at generating ``difficult'' queries (i.e., queries that the learned optimizer provides poor estimates). The learned optimizer will be trained using these generated queries, as well as providing feedback (in terms of numerical rewards) to the query generator for updates. If the generated queries are good, the query generator will get a high reward; otherwise, the query generator will get a low reward. The above process is performed iteratively, with the main goal that within a small budget, the learned optimizer can be trained and generalized well to a wide range of unseen queries. Extensive experiments show that AlphaQO can generate a relatively small number of queries and train a learned optimizer to outperform commercial optimizers. Moreover, learned optimizers require much fewer queries from AlphaQO than randomly generated queries for the quality training of the learned optimizer.