skip to main content
10.1145/3592980.3595304acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Exploiting Access Pattern Characteristics for Join Reordering

Published: 18 June 2023 Publication History

Abstract

With increasing main memory sizes, data processing has significantly shifted from secondary storage to main memory. However, choosing a good join order is still very important for efficient query execution in modern DBMS. This choice bases mainly on cardinality estimates for intermediate join results. However, the memory access pattern, e.g., sequential or random, on the intermediate state is an often neglected performance factor.
In this paper, we examine this impact on join query performance by evaluating the execution time, and cache misses for n-ary foreign-key joins. Based on this analysis, we propose a novel join reordering algorithm that detects the memory access pattern (using machine learning on hardware performance counters) and adapts the join order accordingly at runtime. By considering the access pattern, our evaluation shows that our adaptive reorder algorithm converges quickly to a good join order and reaches improvements of up to a factor of 5.7 ×.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.
[2]
Andi Kleen. [n. d.]. PMU Tools. Accessed Mar 9, 2023. https://github.com/andikleen/pmu-tools.
[3]
Ron Avnur and Joseph M. Hellerstein. 2000. Eddies: Continuously Adaptive Query Processing. SIGMOD Rec. 29, 2 (may 2000), 261–272. https://doi.org/10.1145/335191.335420
[4]
Philip A. Bernstein, Michael L. Brodie, Stefano Ceri, David J. DeWitt, Michael J. Franklin, Hector Garcia-Molina, Jim Gray, Gerald Held, Joseph M. Hellerstein, H. V. Jagadish, Michael E. Lesk, David Maier, Jeffrey F. Naughton, Hamid Pirahesh, Michael Stonebraker, and Jeffrey D. Ullman. 1998. The Asilomar report on database research. ArXiv cs.DB/9811013 (1998).
[5]
Burton H. Bloom. 1970. Space/Time Trade-Offs in Hash Coding with Allowable Errors. Commun. ACM 13, 7 (jul 1970), 422–426. https://doi.org/10.1145/362686.362692
[6]
Peter Boncz, M. Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. 2nd Biennial Conference on Innovative Data Systems Research, CIDR 2005.
[7]
Surajit Chaudhuri and Vivek Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB ’07). VLDB Endowment, 3–14.
[8]
Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. 2009. Tuning Database Configuration Parameters with ITuned. Proc. VLDB Endow. 2, 1 (aug 2009), 1246–1257. https://doi.org/10.14778/1687627.1687767
[9]
Dominik Durner, Viktor Leis, and Thomas Neumann. 2019. On the Impact of Memory Allocation on High-Performance Query Processing. In Proceedings of the 15th International Workshop on Data Management on New Hardware. ACM. https://doi.org/10.1145/3329785.3329918
[10]
Anshuman Dutt and Jayant R. Haritsa. 2014. Plan Bouquets: Query Processing without Selectivity Estimation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD ’14). Association for Computing Machinery, New York, NY, USA, 1039–1050. https://doi.org/10.1145/2588555.2588566
[11]
Ronald Aylmer Fisher and Frank Yates. 1948. Statistical tables for biological, agricultural and medical research (3rd ed., rev. and enl ed.). Oliver and Boyd, London.
[12]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. 2011. Deep Sparse Rectifier Neural Networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 15), Geoffrey Gordon, David Dunson, and Miroslav Dudík (Eds.). PMLR, Fort Lauderdale, FL, USA, 315–323. https://proceedings.mlr.press/v15/glorot11a.html
[13]
Anastasios Gounaris, Norman W. Paton, Alvaro A. A. Fernandes, and Rizos Sakellariou. 2002. Adaptive Query Processing: A Survey. In Advances in Databases, Barry Eaglestone, Siobhán North, and Alexandra Poulovassilis (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 11–25.
[14]
Philipp M Grulich, Breß Sebastian, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, and Volker Markl. 2020. Grizzly: Efficient stream processing through adaptive query compilation. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2487–2503.
[15]
Trevor Hastie, Robert Tibshirani, Jerome Friedman, and James Franklin. 2004. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Vol. 27. 83–85 pages. https://doi.org/10.1007/BF02985802
[16]
J.K. Hughes. 1986. PL / I Structured Programming. Wiley. https://books.google.de/books?id=WCEiAQAAIAAJ
[17]
Intel. 2021. Intel® 64 and IA-32 ArchitecturesOptimization Reference Manual. Intel Corp.
[18]
Intel Corporation. [n. d.]. Intel® VTune™ Profiler. Accessed Mar 9, 2023. https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler.html.
[19]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
[20]
Thomas Kowalski, Fotios Kounelis, and Holger Pirk. 2020. High-Performance Tree Indices: Locality matters more than one would think. In International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2020, Tokyo, Japan, August 31, 2020, Rajesh Bordawekar and Tirthankar Lahiri (Eds.). 1–7. http://www.adms-conf.org/2020-camera-ready/ADMS20_03.pdf
[21]
Hemalatha Kulala and K. Rani. 2017. Advancements in Multi-Layer Perceptron Training to Improve Classification Accuracy. International Journal on Recent and Innovation Trends in Computing and Communication 5 (06 2017), 353–357.
[22]
Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data blocks: Hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In Proceedings of the 2016 International Conference on Management of Data. 311–326.
[23]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really?Proc. VLDB Endow. 9 (2015), 204–215.
[24]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 38–49.
[25]
Quanzhong Li, Minglong Shao, Volker Markl, Kevin Beyer, Latha Colby, and Guy Lohman. 2007. Adaptively Reordering Joins during Query Execution. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE. https://doi.org/10.1109/icde.2007.367848
[26]
Linux Kernel Organization, Inc.[n. d.]. Perf Wiki. Accessed Mar 9, 2023. https://perf.wiki.kernel.org/.
[27]
Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, and Miso Cilimdzic. 2004. Robust Query Processing through Progressive Optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (Paris, France) (SIGMOD ’04). Association for Computing Machinery, New York, NY, USA, 659–670. https://doi.org/10.1145/1007568.1007642
[28]
Prashanth Menon, Amadou Ngom, Lin Ma, Todd C. Mowry, and Andrew Pavlo. 2020. Permutable Compiled Queries: Dynamically Adapting Compiled Queries without Recompiling. Proc. VLDB Endow. 14, 2 (Oct. 2020), 101–113. https://doi.org/10.14778/3425879.3425882
[29]
Adrian Michalke, Philipp M. Grulich, Clemens Lutz, Steffen Zeuch, and Volker Markl. 2021. An Energy-Efficient Stream Join for the Internet of Things. In Proceedings of the 17th International Workshop on Data Management on New Hardware (DaMoN 2021) (Virtual Event, China) (DAMON’21). Association for Computing Machinery, New York, NY, USA, Article 8, 6 pages. https://doi.org/10.1145/3465998.3466005
[30]
Thomas Neumann and Bernhard Radke. 2018. Adaptive Optimization of Very Large Join Queries. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA, 677–692. https://doi.org/10.1145/3183713.3183733
[31]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In CIDR 2017, Conference on Innovative Data Systems Research. https://db.cs.cmu.edu/papers/2017/p42-pavlo-cidr17.pdf
[32]
Shangfu Peng, Yin Yang, Zhenjie Zhang, Marianne Winslett, and Yong Yu. 2012. DP-tree. In Proceedings of the 2012 international conference on Management of Data - SIGMOD '12. ACM Press. https://doi.org/10.1145/2213836.2213972
[33]
Weibiao Qiao, Mohammad Khishe, and Sajjad Ravakhah. 2021. Underwater targets classification using local wavelet acoustic pattern and Multi-Layer Perceptron neural network optimized by modified Whale Optimization Algorithm. Ocean Engineering 219 (2021), 108415.
[34]
Suprio Ray, Catherine Higgins, Vaishnavi Anupindi, and Saransh Gautam. 2020. Enabling NUMA-aware Main Memory Spatial Join Processing: An Experimental Study. In International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2020, Tokyo, Japan, August 31, 2020, Rajesh Bordawekar and Tirthankar Lahiri (Eds.). 1–8. http://www.adms-conf.org/2020-camera-ready/ADMS20_04.pdf
[35]
Bogdan Răducanu, Peter Boncz, and Marcin Zukowski. 2013. Micro Adaptivity in Vectorwise. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) (SIGMOD ’13). Association for Computing Machinery, New York, NY, USA, 1231–1242. https://doi.org/10.1145/2463676.2465292
[36]
Jagsir Singh and Jaswinder Singh. 2021. Malware Classification Using Multi-layer Perceptron Model. In International Conference on Innovative Computing and Communications, Deepak Gupta, Ashish Khanna, Siddhartha Bhattacharyya, Aboul Ella Hassanien, Sameer Anand, and Ajay Jaiswal (Eds.). Springer Singapore, Singapore, 155–168.
[37]
Alan Jay Smith. 1982. Cache Memories. Comput. Surveys 14, 3 (sep 1982), 473–530. https://doi.org/10.1145/356887.356892
[38]
Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2’s LEarning Optimizer. In Proceedings of the 27th International Conference on Very Large Data Bases(VLDB ’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 19–28.
[39]
Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting Performance Data with PAPI-C. In Tools for High Performance Computing 2009. Springer Berlin Heidelberg, 157–173. https://doi.org/10.1007/978-3-642-11261-4_11
[40]
Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. SkinnerDB. In Proceedings of the 2019 International Conference on Management of Data. ACM. https://doi.org/10.1145/3299869.3300088
[41]
University of Tennessee. [n. d.]. Performance Application Programming Interface. Accessed Mar 9, 2023. https://icl.utk.edu/papi/.
[42]
Lucas Woltmann, Claudio Hartmann, Dirk Habich, and Wolfgang Lehner. 2022. Aggregate-based Training Phase for ML-based Cardinality Estimation. Datenbank-Spektrum 22, 1 (2022), 45–57. https://doi.org/10.1007/s13222-021-00400-z
[43]
Steffen Zeuch, Ankit Chaudhary, Bonaventura Monte, Haralampos Gavriilidis, Dimitrios Giouroukis, Philipp Grulich, Sebastian Breß, Jonas Traub, and Volker Markl. 2020. The NebulaStream Platform: Data and Application Management for the Internet of Things. In Conference on Innovative Data Systems Research (CIDR).
[44]
Steffen Zeuch, Holger Pirk, and Johann-Christoph Freytag. 2016. Non-invasive progressive optimization for in-memory databases. Proceedings of the VLDB Endowment 9, 14 (oct 2016), 1659–1670. https://doi.org/10.14778/3007328.3007332
[45]
Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura Del Monte, Dimitrios Giouroukis, Philipp M Grulich, Ariane Ziehn, and Volker Mark. 2020. NebulaStream: Complex analytics beyond the cloud. The International Workshop on Very Large Internet of Things (VLIoT 2020) (2020).
[46]
Jianqiao Zhu, Navneet Potti, Saket Saurabh, and Jignesh M. Patel. 2017. Looking Ahead Makes Query Plans Robust: Making the Initial Case with in-Memory Star Schema Data Warehouse Workloads. Proc. VLDB Endow. 10, 8 (apr 2017), 889–900. https://doi.org/10.14778/3090163.3090167
[47]
Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, and Bin Cui. 2021. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. arxiv:2011.09022 [cs.DB]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DaMoN '23: Proceedings of the 19th International Workshop on Data Management on New Hardware
June 2023
119 pages
ISBN:9798400701917
DOI:10.1145/3592980
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SIGMOD/PODS '23
Sponsor:

Acceptance Rates

DaMoN '23 Paper Acceptance Rate 17 of 23 submissions, 74%;
Overall Acceptance Rate 94 of 127 submissions, 74%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)7
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media