skip to main content
10.1145/3514221.3526052acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Deploying a Steered Query Optimizer in Production at Microsoft

Published: 11 June 2022 Publication History

Abstract

Modern analytical workloads are highly heterogeneous and massively complex, making generic out of the box query optimizers untenable for many customers and scenarios. As a result, it is important to specialize these optimizers to instances of the workloads. In this paper, we continue a recent line of work in steering a query optimizer towards better plans for a given workload, and make major strides in pushing previous research ideas to production deployment. Along the way we solve several operational challenges including, making steering actions more manageable, keeping the costs of steering within budget, and avoiding unexpected performance regressions in production. Our resulting system, QO-Advisor, essentially externalizes the query planner to a massive offline pipeline for better exploration and specialization. We discuss various aspects of our design and show detailed results over production SCOPE workloads at Microsoft, where the system is currently enabled by default.

References

[1]
Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, et al. 2016. Making contextual decisions with low technical debt. arXiv preprint arXiv:1606.03966 (2016).
[2]
Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning. PMLR, 1638--1646.
[3]
Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollar, Arun Marathe, Vivek Narasayya, and Manoj Syamala. 2005. Database tuning advisor for microsoft sql server 2005. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 930--932.
[4]
Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, HM Sajjad Hossain, Alekh Jindal, Peter Orenberg, Hiren Patel, Shi Qiao, Vijay Ramani, Lucas Rosenblatt, et al. 2021. PerfGuard: deploying ML-for-systems without performance regressions, almost! Proceedings of the VLDB Endowment, Vol. 14, 13 (2021), 3362--3375.
[5]
Microsoft Azure. [n.d.]. Azure Data Factory. https://azure.microsoft.com/en-us/services/data-factory/#overview .
[6]
Malay Bag, Alekh Jindal, and Hiren Patel. 2020. Towards plan-aware resource allocation in serverless query processing. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20) .
[7]
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 285--300.
[8]
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. 2008. Scope: easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, Vol. 1, 2 (2008), 1265--1276.
[9]
Carlo Curino, Neha Godwal, Brian Kroth, Sergiy Kuryata, Greg Lapinski, Siqi Liu, Slava Oks, Olga Poppe, Adam Smiechowski, Ed Thayer, et al. 2020. MLOS: An Infrastructure for Automated Software Performance Engineering. In Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning. 1--5.
[10]
Debabrata Dash, Neoklis Polyzotis, and Anastasia Ailamaki. 2011. CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads. Proceedings of the VLDB Endowment, Vol. 4, 6 (2011).
[11]
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R Narasayya. 2019. Ai meets ai: Leveraging query executions to improve index recommendations. In Proceedings of the 2019 International Conference on Management of Data. 1241--1258.
[12]
Dylan J Foster, Claudio Gentile, Mehryar Mohri, and Julian Zimmert. 2020. Adapting to misspecification in contextual bandits. Advances in Neural Information Processing Systems, Vol. 33 (2020), 11478--11489.
[13]
Goetz Graefe. 1995. The cascades framework for query optimization. IEEE Data Eng. Bull., Vol. 18, 3 (1995), 19--29.
[14]
Jayant Gupchup, Ashkan Aazami, Yaran Fan, Senja Filipi, Tom Finley, Scott Inglis, Marcus Asteborg, Luke Caroll, Rajan Chari, Markus Cozowicz, Vishak Gopal, Vinod Prakash, Sasikanth Bendapudi, Jack Gerrits, Eric Lau, Huazhou Liu, Marco Rossi, Dima Slobodianyk, Dmitri Birjukov, Matty Cooper, Nilesh Javar, Dmitriy Perednya, Sriram Srinivasan, John Langford, Ross Cutler, and Johannes Gehrke. 2020. Resonance: Replacing Software Constants with Context-Aware Models in Real-time Communication. In NeurIPS 2020 .
[15]
Alekh Jindal and Matteo Interlandi. 2021. Machine Learning for Cloud Data Systems: the Promise, the Progress, and the Path Forward. Proc. VLDB Endow., Vol. 14, 12 (2021), 3202--3205. http://www.vldb.org/pvldb/vol14/p3202-jindal.pdf
[16]
Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, and Subru Krishnan. 2019. Peregrine: Workload optimization for cloud query engines. In Proceedings of the ACM Symposium on Cloud Computing. 416--427.
[17]
Alekh Jindal, Shi Qiao, Hiren Patel, Zhicheng Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, and Sriram Rao. 2018. Computation reuse in analytics job service at microsoft. In Proceedings of the 2018 International Conference on Management of Data. 191--203.
[18]
Alekh Jindal, Shi Qiao, Rathijit Sen, and Hiren Patel. 2021. Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2423--2434.
[19]
Sangeetha Abdu Jyothi, Carlo Curino, Ishai Menache, Shravan Matthur Narayanamurthy, Alexey Tumanov, Jonathan Yaniv, Ruslan Mavlyutov, Inigo Goiri, Subru Krishnan, Janardhan Kulkarni, et al. 2016. Morpheus: Towards Automated $$SLOs$$ for Enterprise Clusters. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 117--134.
[20]
Tim Kraska. 2021. Towards instance-optimized data systems. Proceedings of the VLDB Endowment, Vol. 14, 12 (2021).
[21]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, H Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In CIDR.
[22]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the 2018 international conference on management of data. 489--504.
[23]
Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph Hellerstein, and Ion Stoica. 2018. Learning to optimize join queries with deep reinforcement learning. arXiv preprint arXiv:1808.03196 (2018).
[24]
John Langford and Tong Zhang. 2007. The epoch-greedy algorithm for multi-armed bandits with side information. Advances in neural information processing systems, Vol. 20 (2007).
[25]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215.
[26]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data. 1275--1288.
[27]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: a learned query optimizer. Proceedings of the VLDB Endowment, Vol. 12, 11 (2019), 1705--1718.
[28]
Ryan Marcus and Olga Papaemmanouil. 2018. Deep reinforcement learning for join order enumeration. In Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1--4.
[29]
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, and Alekh Jindal. 2021. Steering query optimizers: A practical take on big data workloads. In Proceedings of the 2021 International Conference on Management of Data. 2557--2569.
[30]
Hiren Patel, Alekh Jindal, and Clemens Szyperski. 2019. Big Data Processing at Microsoft: Hyper Scale, Massive Complexity, and Minimal Cost. In Proceedings of the ACM Symposium on Cloud Computing. 490--490.
[31]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self-Driving Database Management Systems. In CIDR, Vol. 4. 1.
[32]
Anish Pimpley, Shuo Li, Anubha Srivastava, Vishal Rohra, Yi Zhu, Soundararajan Srinivasan, Alekh Jindal, Hiren Patel, Shi Qiao, and Rathijit Sen. 2021. Optimal resource allocation for serverless queries. arXiv preprint arXiv:2107.08594 (2021).
[33]
Conor Power, Hiren Patel, Alekh Jindal, Jyoti Leeka, Bob Jenkins, Michael Rys, Ed Triou, Dexin Zhu, Lucky Katahanas, Chakrapani Bhat Talapady, et al. 2021. The cosmos big data platform at Microsoft: over a decade of progress and a decade to look forward. Proceedings of the VLDB Endowment, Vol. 14, 12 (2021), 3148--3161.
[34]
Yahoo! Research. [n.d.]. Vowpal Wabbit. https://vowpalwabbit.org/research.html .
[35]
Abhishek Roy, Alekh Jindal, Priyanka Gomatam, Xiating Ouyang, Ashit Gosalia, Nishkam Ravi, Swinky Mann, and Prakhar Jain. 2021. SparkCruise: workload optimization in managed spark clusters at Microsoft. Proceedings of the VLDB Endowment, Vol. 14, 12 (2021), 3122--3134.
[36]
Rathijit Sen, Alekh Jindal, Hiren Patel, and Shi Qiao. 2020. Autotoken: Predicting peak parallelism for big data analytics at microsoft. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 3326--3339.
[37]
Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, and Wangchao Le. 2020. Cost models for big data query processing: Learning, retrofitting, and our findings. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 99--113.
[38]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction .MIT press.
[39]
Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM international conference on management of data. 1009--1024.
[40]
Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dudik. 2017. Optimal and adaptive off-policy evaluation in contextual bandits. In International Conference on Machine Learning. PMLR, 3589--3597.
[41]
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, and Sriram Rao. 2018. Towards a learning optimizer for shared clouds. Proceedings of the VLDB Endowment, Vol. 12, 3 (2018), 210--222.
[42]
Ming-Chuan Wu, Jingren Zhou, Nicolas Bruno, Yu Zhang, and Jon Fowler. 2012. Scope playback: self-validation in the cloud. In Proceedings of the Fifth International Workshop on Testing Database Systems. 1--6.
[43]
Yiwen Zhu, Matteo Interlandi, Abhishek Roy, Krishnadhan Das, Hiren Patel, Malay Bag, Hitesh Sharma, and Alekh Jindal. 2021. Phoebe: a learning-based checkpoint optimizer. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 2505--2518.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. machine learning
  2. query optimization
  3. reinforcement learning
  4. scope

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)12
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media