skip to main content
research-article

The design of a query monitoring system

Published: 23 April 2009 Publication History

Abstract

Query monitoring refers to the problem of observing and predicting various parameters related to the execution of a query in a database system. In addition to being a useful tool for database users and administrators, it can also serve as an information collection service for resource allocation and adaptive query processing techniques. In this article, we present a query monitoring system from the ground up, describing various new techniques for query monitoring, their implementation inside a real database system, and a novel interface that presents the observed and predicted information in an accessible manner. To enable this system, we introduce several lightweight online techniques for progressively estimating and refining the cardinality of different relational operators using information collected at query execution time. These include binary and multiway joins as well as typical grouping operations and combinations thereof. We describe the various algorithms used to efficiently implement estimators and present the results of an evaluation of a prototype implementation of our framework in an open-source data management system. Our results demonstrate the feasibility and practical utility of the approach presented herein.

References

[1]
Alon, N., Gibbons, P. B., Matias, Y., and Szegedy, M. 2002. Tracking join and self-join sizes in limited storage. J. Comput. Syst. Sci. 64, 3, 719--747.
[2]
Babu, S., Bizarro, P., and DeWitt, D. 2005. Proactive re-optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 107--118.
[3]
Bickel, P. and Doksum, K. 2000. Mathematical Statistics: Basic Ideas and Selected Topics. Prentice Hall, Englewood Cliffs, NJ.
[4]
Bloom, B. H. 1970. Space/time trade-offs in hash coding with allowable errors. Communun. ACM 13, 7, 422--426.
[5]
Boneh, S., Boneh, A., and Caron, R. 1998. On estimating the prediction function and the number of unseen species in sampling with replacement. J. Amer. Statist. Assoc. 93, 441, 372--379.
[6]
Bruno, N. and Chaudhuri, S. 2002. Exploiting statistics on query expressions for optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 263--274.
[7]
Charikar, M., Chaudhuri, S., Motwani, R., and Narasayya, V. 2000. Towards estimation error guarantees for distinct values. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, New York, 268--279.
[8]
Chaudhuri, S., Kaushik, R., and Ramamurthy, R. 2005. When can we trust progress estimators for SQL queries? In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 575--586.
[9]
Chaudhuri, S., Motwani, R., and Narasayya, V. 1999. On random sampling over joins. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 263--274.
[10]
Chaudhuri, S. and Narasayya, V. 2009. Program for TPC-D data generation with skew. ftp://ftp.research.microsoft.com/users/viveknar/tpcdskew.
[11]
Chaudhuri, S., Narasayya, V., and Ramamurthy, R. 2004. Estimating progress of execution for SQL queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 803--814.
[12]
Chen, S., Ailamaki, A., Gibbons, P. B., and Mowry, T. C. 2005. Inspector joins. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). VLDB Endowment, 817--828.
[13]
Demaine, E. D., López-Ortiz, A., and Munro, J. I. 2002. Frequency estimation of Internet packet streams with limited space. In Proceedings of the 10th Annual European Symposium on Algorithms (ESA '02). Springer-Verlag, 348--360.
[14]
Estan, C. and Varghese, G. 2003. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21, 3, 270--313.
[15]
Ganguly, S., Gibbons, P. B., Matias, Y., and Silberschatz, A. 1996. Bifocal sampling for skew-resistant join size estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 271--281.
[16]
Haas, P. and Stokes, L. 1998. Estimating the number of classes in a finite population. J. Amer. Statist. Assoc. 93, 444, 1475--1487.
[17]
Haas, P. J. 1997. Large-Sample and deterministic confidence intervals for online aggregation. In Proceedings of the 9th International Conference on Scientific and Statistical Database Management (SSDBM'97). IEEE Computer Society, 51--63.
[18]
Haas, P. J., Naughton, J. F., Seshadri, S., and Stokes, L. 1995. Sampling-Based estimation of the number of distinct values of an attribute. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB'95). Morgan Kaufmann, San Francisco, CA, 311--322.
[19]
Haas, P. J., Naughton, J. F., Seshadri, S., and Swami, A. N. 1993. Fixed-Precision estimation of join selectivity. In Proceedings of the 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '93). ACM, New York, 190--201.
[20]
Haas, P. J., Naughton, J. F., Seshadri, S., and Swami, A. N. 1996. Selectivity and cost estimation for joins based on random sampling. J. Comput. Syst. Sci. 52, 3, 550--569.
[21]
Haas, P. J. and Swami, A. N. 1992. Sequential sampling procedures for query size estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 341--350.
[22]
Hellerstein, J. M., Haas, P. J., and Wang, H. J. 1997. Online aggregation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 171--182.
[23]
Ioannidis, Y. 2003. The history of histograms (abridged). In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB ''03). VLDB Endowment, 19--30.
[24]
Ioannidis, Y. E. 1993. Universality of serial histograms. In Proceedings of the 19th International Conference on Very Large Data Bases (VLDB '93). Morgan Kaufmann, San Francisco, CA, 256--267.
[25]
Ioannidis, Y. E. and Christodoulakis, S. 1993. Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. 18, 4, 709--748.
[26]
Ioannidis, Y. E. and Poosala, V. 1995. Balancing histogram optimality and practicality for query result size estimation. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 233--244.
[27]
Jagadish, H. V., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K. C., and Suel, T. 1998. Optimal histograms with quality guarantees. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB '98). Morgan Kaufmann, San Francisco, CA, USA, 275--286.
[28]
Kabra, N. and DeWitt, D. J. 1998. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 106--117.
[29]
Karp, R. M., Shenker, S., and Papadimitriou, C. H. 2003. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51--55.
[30]
Luo, G., Chen, T., and Yu, H. 2007. Toward a progress indicator for program compilation. Soft. Practice Exper. 37, 9, 909--933.
[31]
Luo, G., Naughton, J. F., Ellmann, C. J., and Watzke, M. W. 2004. Toward a progress indicator for database queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 791--802.
[32]
Luo, G., Naughton, J. F., Ellmann, C. J., and Watzke, M. W. 2005. Increasing the accuracy and coverage of SQL progress indicators. In Proceedings of the 21st International Conference on Data Engineering (ICDE '05). IEEE Computer Society, 853--864.
[33]
Luo, G., Naughton, J. F., and Yu, P. S. 2006. Multi-Query SQL progress indicators. In Advances in Database Technology, Proceedings of the 10th International Conference on Extending Database Technology. Springer-Verlag, 921--941.
[34]
Manku, G. S. and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02). VLDB Endowment, 346--357.
[35]
Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H., and Cilimdzic, M. 2004. Robust query processing through progressive optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 659--670.
[36]
Mishra, C. and Koudas, N. 2007. A lightweight online framework for query progress indicators. In Proceedings of the 23rd International Conference on Data Engineering, (ICDE). IEEE, 1292--1296.
[37]
Mishra, C. and Volkovs, M. 2007. Conex: A system for monitoring queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 1076--1078.
[38]
Myers, B. A. 1985. The importance of percent-done progress indicators for computer-human interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '85). ACM, New York, 11--17.
[39]
Stillger, M., Lohman, G. M., Markl, V., and Kandil, M. 2001. Leo—DB2's learning optimizer. In Proceedings of the 27th International Conference on Very Large Data Bases (VLDB '01). Morgan Kaufmann, San Francisco, CA, 19--28.
[40]
Thaper, N., Guha, S., Indyk, P., and Koudas, N. 2002. Dynamic multidimensional histograms. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 428--439.
[41]
Thompson, S. K. 2002. Sampling. Wiley Interscience.
[42]
Vitter, J. S. 1985. Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 1, 37--57.

Cited By

View all

Index Terms

  1. The design of a query monitoring system

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 34, Issue 1
    April 2009
    349 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/1508857
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2009
    Accepted: 01 August 2008
    Revised: 01 January 2008
    Received: 01 May 2007
    Published in TODS Volume 34, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Query monitoring
    2. progress estimation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 30 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media