skip to main content
article

Improving range-sum query evaluation on data cubes via polynomial approximation

Published: 01 February 2006 Publication History

Abstract

Inefficient query answering is the main drawback in Decision Support Systems (DSS), due to the very large size of the multidimensional data stored in the underlying Data Warehouse Server (DWS). Aggregate queries are the most frequent and useful kind for such systems, as they support several analysis based on the multidimensionality and multi-resolution of data. As a consequence, providing fast answers to aggregate queries (by trading off accuracy for efficiency, if possible) has become a very important requirement in improving the effectiveness of DSS-based applications. In this paper we present a technique based on an analytical interpretation of multidimensional data and on the well-known least squares approximation (LSA) method for supporting approximate aggregate query answering in OLAP, which represents the most common application interfaces for a DWS. Our technique consists in building data synopses by interpreting the original data distributions as a set of discrete functions. These synopses, called Δ-Syn, are obtained by approximating data with a set of polynomial coefficients, and by storing these coefficients instead of the original data. Queries are issued on the compressed representation, thus reducing the number of disk accesses needed to evaluate the answers.

References

[1]
{1} The AQUA Project Home Page. Available from: 〈http://www.bell-labs.com/project/aqua/〉.]]
[2]
{2} S. Acharya, P.B. Gibbons, V. Poosala, AQUA: A fast decision support system using approximate query answers, in: Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, September 1999, pp. 754-757.]]
[3]
{3} S. Acharya, P.B. Gibbons, V. Poosala, S. Ramaswamy, Join synopses for approximate query answering, in: Proceedings of the 1999 ACM International Conference on Management of Data, Philadelphia, PA, USA, June 1999, pp. 275-286.]]
[4]
{4} G. Antoshenkov, M. Ziauddin, Query processing and optimization in Oracle Rdb, Very Large Data Bases Journal 5 (4) (1996) 229-237.]]
[5]
{5} The Data Exploration Project Home Page. Available from: 〈http://research.microsoft.com/dmx/approximateqp/〉.]]
[6]
{6} B. Babcock, S. Chaudhuri, G. Das, Dynamic sample selection for approximate query answers, in: Proceedings of the 2003 ACM International Conference on Management of Data, San Diego, CA, USA, June 2003, pp. 539-550.]]
[7]
{7} R.J. Bayardo, Jr., D.P. Miranker, Processing queries for first few answers, in: Proceedings of the 5th ACM International Conference on Information and Knowledge Management, Rockville, ML, USA, November 1996, pp. 45-52.]]
[8]
{8} P. Bonnet, J.E. Gehrke, P. Seshadri, Towards sensor database systems, in: Proceedings of the 2nd International Conference on Mobile Data Management, Hong Kong, China, January 2001, pp. 3-14.]]
[9]
{9} N. Bruno, S. Chaudhuri, L. Gravano, STHoles: A multidimensional workload-aware histogram, in: Proceedings of the 2001 ACM International Conference on Management of Data, Santa Barbara, CA, USA, June 2001, pp. 211-222.]]
[10]
{10} F. Buccafurri, F. Furfaro, D. Saccà, C. Sirangelo, A quad-tree based multiresolution approach for two-dimensional summary data, in: Proceedings of the 15th IEEE International Conference on Scientific and Statistical Database Management, Cambridge, MA, USA, July 2003, pp. 127-140.]]
[11]
{11} K. Chakrabarti, M. Garofalakis, R. Rastogi, K. Shim, Approximate query processing using wavelets, in: Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, September 2000, pp. 111-122.]]
[12]
{12} S. Chaudhuri, G. Das, M. Datar, R. Motwani, R. Rastogi, Overcoming limitations of sampling for aggregation queries, in: Proceedings of the 17th IEEE International Conference on Data Engineering, Heidelberg, Germany, April 2001, pp. 534-542.]]
[13]
{13} G. Colliat, OLAP, relational, and multidimensional database systems, ACM SIGMOD Record 25 (3) (1996) 64-69.]]
[14]
{14} CONTROL--Continuous Output and Navigation Technology with Refinement On-Line. Available from: 〈http:// control.cs.berkeley.edu〉.]]
[15]
{15} Data Reduction and Knowledge Extraction for On-Line Data Warehouses. Available from: 〈http://www. research.att.com/~drknow/〉.]]
[16]
{16} A. Deligiannakis, N. Roussopoulos, Extended wavelets for multiple measures, in: Proceedings of the 2003 ACM International Conference on Management of Data, San Diego, CA, USA, June 2003, pp. 229-240.]]
[17]
{17} P.M. Deshpande, K. Ramasamy, A. Shukla, J.F. Naughton, Caching multidimensional queries using chuncks, in: Proceedings of the 1998 ACM International Conference on Management of Data, Seattle, WA, USA, June 1998, pp. 259-270.]]
[18]
{18} F. Furfaro, G.M. Mazzeo, D. Saccà, C. Sirangelo, A new histogram-based technique for compressing multidimensional data, in: Proceedings of the 12th Italian Symposium on Advanced Database Systems, Cagliari, Italy, June 2004, pp. 18-29. An extended version of this paper will be published on the Proceedings of the 20th Annual ACM Symposium on Applied Computing, 2005.]]
[19]
{19} V. Ganti, M. Lee, R. Ramakrishnan, ICICLES: Self-tuning samples for approximate query answering, in: Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, September 2000, pp. 176-187.]]
[20]
{20} P.B. Gibbons, Y. Matias, New sampling-based summary statistics for improving approximate query answers, in: Proceedings of the 1998 ACM International Conference on Management of Data, Seattle, WA, USA, June 1998, pp. 331-342.]]
[21]
{21} P.B. Gibbons, Y. Matias, V. Poosala, Fast incremental maintenance of approximate histograms, in: Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece, August 1997, pp. 466-475.]]
[22]
{22} J. Gray, A. Bosworth, A. Layman, H. Pirahesh, Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, in: Proceeding of the 12th IEEE International Conference on Data Engineering, New Orleans, LO, USA, March 1996, pp. 152-159.]]
[23]
{23} J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, USA, 2000.]]
[24]
{24} J.M. Hellerstein, P.J. Haas, H.J. Wang, Online aggregation, in: Proceedings of the 1997 ACM International Conference on Management of Data, Tucson, AZ, USA, May 1997, pp. 171-182.]]
[25]
{25} C.-T. Ho, R. Agrawal, N. Megiddo, R. Srikant, Range queries in OLAP data cubes, in: Proceedings of the 1997 ACM International Conference on Management of Data, Tucson, AZ, USA, May 1997, pp. 73-88.]]
[26]
{26} W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (301) (1963) 13-30.]]
[27]
{27} Y.E. Ioannidis, V. Poosala, Histogram-based approximation of set-valued query answers, in: Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, September 1999, pp. 174-185.]]
[28]
{28} H.V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, T. Suel, Optimal histograms with quality guarantees, in: Proceedings of the 24th International Conference on Very Large Data Bases, New York City, NY, USA, August 1998, pp. 275-286.]]
[29]
{29} N. Karayannidis, T. Sellis, SISYPHUS: the implementation of a chunk-based storage manager for OLAP, Data & Knowledge Engineering 45 (2) (2003) 155-180.]]
[30]
{30} J.F. Kenney, E.S. Keeping, Skewness, in: Mathematics of Statistics, Pt. 1, third ed., Van Nostrand, Princeton, NJ, USA, 1962, pp. 100-101.]]
[31]
{31} S. Khanna, S. Muthukrishnan, M. Paterson, On approximating rectangle tiling and packing, in: Proceedings of 9th ACM SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, January 1998, pp. 384-393.]]
[32]
{32} N. Koudas, S. Muthukrishnan, D. Srivastava, Optimal histograms for hierarchical range queries, in: Proceedings of the 9th ACM Symposium on Principles of Database Systems, Dallas, TX, USA, May 2000, pp. 196-204.]]
[33]
{33} Y. Matias, J.S. Vitter, M. Wang, Wavelet-based histograms for selectivity estimation, in: Proceedings of the 1998 ACM International Conference on Management of Data, Seattle, WA, USA, June 1998, pp. 448-459.]]
[34]
{34} S. Muthukrishnan, V. Poosala, T. Suel, On rectangular partitioning in two dimensions: Algorithms, complexity, and applications, in: Proceedings of the 7th IEEE International Conference on Database Theory, Jerusalem, Israel, January 1999, pp. 236-256.]]
[35]
{35} The NEMESIS Project: Warehousing and Analysis of Network-Management Data. Available from: 〈http:// www.bell-labs.com/project/nemesis/〉.]]
[36]
{36} A. Papoulis, Probability, Random Variables, and Stochastic Processes, second ed., McGraw-Hill, New York City, NY, USA, 1984.]]
[37]
{37} V. Poosala, V. Ganti, Fast approximate answers to aggregate queries on a data cube, in: Proceedings of the 11th International Conference on Statistical and Scientific Database Management, Cleveland, OH, USA, July 1999, pp. 24-33.]]
[38]
{38} V. Poosala, Y.E. Ioannidis, Selectivity estimation without the attribute value independence assumption, in: Proceedings of the 23rd International Conference on Very Large Databases, Athens, Greece, August 1997, pp. 486-495.]]
[39]
{39} V. Poosala, Y.E. Ioannidis, P.J. Haas, E. Shekita, Improved histograms for selectivity estimation of range predicates, in: Proceedings of the 1996 ACM International Conference on Management of Data, Montreal, Canada, May 1996, pp. 294-305.]]
[40]
{40} M.J.D. Powell, Approximation Theory and Methods, Cambridge University Press, Cambridge, England, 1982.]]
[41]
{41} J.R. Smith, V. Castelli, A. Jhingran, C.-S. Li, Dynamic assembly of views in data cubes, in: Proceedings of the 7th ACM Symposium on Principles of Database Systems, Seattle, WA, USA, June 1998, pp. 274-283.]]
[42]
{42} A. Stuart, J.K. Ord, in: Kendall Advanced Theory of Statistics, Vol. 1: Distribution Theory, sixth ed., Oxford University Press, New York City, NY, USA, 1998.]]
[43]
{43} Transactions Processing Council Benchmarks. Available from: 〈http://www.tpc.org〉.]]
[44]
{44} Program for TPC-D Data Generation with Skew. Available from: 〈ftp://ftp.research.microsoft.com/pub/users/ viveknar/tpcdskew〉.]]
[45]
{45} J.S. Vitter, M. Wang, B. Iyer, Data cube approximation and histograms via wavelets, in: Proceeding of the 7th ACM International Conference on Information and Knowledge Management, Bethesda, ML, USA, November 1998, pp. 96-104.]]
[46]
{46} J.S. Vitter, M. Wang, Approximate computation of multidimensional aggregates of sparse data using wavelets, in: Proceedings of the 1999 ACM International Conference on Management of Data, Philadelphia, PA, USA, June 1999, pp. 194-204.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Data & Knowledge Engineering
Data & Knowledge Engineering  Volume 56, Issue 2
February 2006
109 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 February 2006

Author Tags

  1. OLAP
  2. approximate query answering
  3. data synopses
  4. multidimensional data management

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media