article

Improving range-sum query evaluation on data cubes via polynomial approximation

Author:

Alfredo CuzzocreaAuthors Info & Claims

Data & Knowledge Engineering, Volume 56, Issue 2

Pages 85 - 121

https://doi.org/10.1016/j.datak.2005.03.011

Published: 01 February 2006 Publication History

Abstract

Inefficient query answering is the main drawback in Decision Support Systems (DSS), due to the very large size of the multidimensional data stored in the underlying Data Warehouse Server (DWS). Aggregate queries are the most frequent and useful kind for such systems, as they support several analysis based on the multidimensionality and multi-resolution of data. As a consequence, providing fast answers to aggregate queries (by trading off accuracy for efficiency, if possible) has become a very important requirement in improving the effectiveness of DSS-based applications. In this paper we present a technique based on an analytical interpretation of multidimensional data and on the well-known least squares approximation (LSA) method for supporting approximate aggregate query answering in OLAP, which represents the most common application interfaces for a DWS. Our technique consists in building data synopses by interpreting the original data distributions as a set of discrete functions. These synopses, called Δ-Syn, are obtained by approximating data with a set of polynomial coefficients, and by storing these coefficients instead of the original data. Queries are issued on the compressed representation, thus reducing the number of disk accesses needed to evaluate the answers.

References

[1]

{1} The AQUA Project Home Page. Available from: 〈http://www.bell-labs.com/project/aqua/〉.]]

[2]

{2} S. Acharya, P.B. Gibbons, V. Poosala, AQUA: A fast decision support system using approximate query answers, in: Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, September 1999, pp. 754-757.]]

[3]

{3} S. Acharya, P.B. Gibbons, V. Poosala, S. Ramaswamy, Join synopses for approximate query answering, in: Proceedings of the 1999 ACM International Conference on Management of Data, Philadelphia, PA, USA, June 1999, pp. 275-286.]]

Digital Library

[4]

{4} G. Antoshenkov, M. Ziauddin, Query processing and optimization in Oracle Rdb, Very Large Data Bases Journal 5 (4) (1996) 229-237.]]

[5]

{5} The Data Exploration Project Home Page. Available from: 〈http://research.microsoft.com/dmx/approximateqp/〉.]]

[6]

{6} B. Babcock, S. Chaudhuri, G. Das, Dynamic sample selection for approximate query answers, in: Proceedings of the 2003 ACM International Conference on Management of Data, San Diego, CA, USA, June 2003, pp. 539-550.]]

Digital Library

[7]

{7} R.J. Bayardo, Jr., D.P. Miranker, Processing queries for first few answers, in: Proceedings of the 5th ACM International Conference on Information and Knowledge Management, Rockville, ML, USA, November 1996, pp. 45-52.]]

[8]

{8} P. Bonnet, J.E. Gehrke, P. Seshadri, Towards sensor database systems, in: Proceedings of the 2nd International Conference on Mobile Data Management, Hong Kong, China, January 2001, pp. 3-14.]]

[9]

{9} N. Bruno, S. Chaudhuri, L. Gravano, STHoles: A multidimensional workload-aware histogram, in: Proceedings of the 2001 ACM International Conference on Management of Data, Santa Barbara, CA, USA, June 2001, pp. 211-222.]]

[10]

{10} F. Buccafurri, F. Furfaro, D. Saccà, C. Sirangelo, A quad-tree based multiresolution approach for two-dimensional summary data, in: Proceedings of the 15th IEEE International Conference on Scientific and Statistical Database Management, Cambridge, MA, USA, July 2003, pp. 127-140.]]

Digital Library

[11]

{11} K. Chakrabarti, M. Garofalakis, R. Rastogi, K. Shim, Approximate query processing using wavelets, in: Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, September 2000, pp. 111-122.]]

[12]

{12} S. Chaudhuri, G. Das, M. Datar, R. Motwani, R. Rastogi, Overcoming limitations of sampling for aggregation queries, in: Proceedings of the 17th IEEE International Conference on Data Engineering, Heidelberg, Germany, April 2001, pp. 534-542.]]

Digital Library

[13]

{13} G. Colliat, OLAP, relational, and multidimensional database systems, ACM SIGMOD Record 25 (3) (1996) 64-69.]]

[14]

{14} CONTROL--Continuous Output and Navigation Technology with Refinement On-Line. Available from: 〈http:// control.cs.berkeley.edu〉.]]

[15]

{15} Data Reduction and Knowledge Extraction for On-Line Data Warehouses. Available from: 〈http://www. research.att.com/~drknow/〉.]]

[16]

{16} A. Deligiannakis, N. Roussopoulos, Extended wavelets for multiple measures, in: Proceedings of the 2003 ACM International Conference on Management of Data, San Diego, CA, USA, June 2003, pp. 229-240.]]

Digital Library

[17]

{17} P.M. Deshpande, K. Ramasamy, A. Shukla, J.F. Naughton, Caching multidimensional queries using chuncks, in: Proceedings of the 1998 ACM International Conference on Management of Data, Seattle, WA, USA, June 1998, pp. 259-270.]]

Digital Library

[18]

{18} F. Furfaro, G.M. Mazzeo, D. Saccà, C. Sirangelo, A new histogram-based technique for compressing multidimensional data, in: Proceedings of the 12th Italian Symposium on Advanced Database Systems, Cagliari, Italy, June 2004, pp. 18-29. An extended version of this paper will be published on the Proceedings of the 20th Annual ACM Symposium on Applied Computing, 2005.]]

[19]

{19} V. Ganti, M. Lee, R. Ramakrishnan, ICICLES: Self-tuning samples for approximate query answering, in: Proceedings of the 26th International Conference on Very Large Data Bases, Cairo, Egypt, September 2000, pp. 176-187.]]

[20]

{20} P.B. Gibbons, Y. Matias, New sampling-based summary statistics for improving approximate query answers, in: Proceedings of the 1998 ACM International Conference on Management of Data, Seattle, WA, USA, June 1998, pp. 331-342.]]

Digital Library

[21]

{21} P.B. Gibbons, Y. Matias, V. Poosala, Fast incremental maintenance of approximate histograms, in: Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece, August 1997, pp. 466-475.]]

Digital Library

[22]

{22} J. Gray, A. Bosworth, A. Layman, H. Pirahesh, Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, in: Proceeding of the 12th IEEE International Conference on Data Engineering, New Orleans, LO, USA, March 1996, pp. 152-159.]]

[23]

{23} J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, CA, USA, 2000.]]

Digital Library

[24]

{24} J.M. Hellerstein, P.J. Haas, H.J. Wang, Online aggregation, in: Proceedings of the 1997 ACM International Conference on Management of Data, Tucson, AZ, USA, May 1997, pp. 171-182.]]

[25]

{25} C.-T. Ho, R. Agrawal, N. Megiddo, R. Srikant, Range queries in OLAP data cubes, in: Proceedings of the 1997 ACM International Conference on Management of Data, Tucson, AZ, USA, May 1997, pp. 73-88.]]

Digital Library

[26]

{26} W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (301) (1963) 13-30.]]

[27]

{27} Y.E. Ioannidis, V. Poosala, Histogram-based approximation of set-valued query answers, in: Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, September 1999, pp. 174-185.]]

[28]

{28} H.V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, T. Suel, Optimal histograms with quality guarantees, in: Proceedings of the 24th International Conference on Very Large Data Bases, New York City, NY, USA, August 1998, pp. 275-286.]]

[29]

{29} N. Karayannidis, T. Sellis, SISYPHUS: the implementation of a chunk-based storage manager for OLAP, Data & Knowledge Engineering 45 (2) (2003) 155-180.]]

[30]

{30} J.F. Kenney, E.S. Keeping, Skewness, in: Mathematics of Statistics, Pt. 1, third ed., Van Nostrand, Princeton, NJ, USA, 1962, pp. 100-101.]]

[31]

{31} S. Khanna, S. Muthukrishnan, M. Paterson, On approximating rectangle tiling and packing, in: Proceedings of 9th ACM SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, January 1998, pp. 384-393.]]

[32]

{32} N. Koudas, S. Muthukrishnan, D. Srivastava, Optimal histograms for hierarchical range queries, in: Proceedings of the 9th ACM Symposium on Principles of Database Systems, Dallas, TX, USA, May 2000, pp. 196-204.]]

[33]

{33} Y. Matias, J.S. Vitter, M. Wang, Wavelet-based histograms for selectivity estimation, in: Proceedings of the 1998 ACM International Conference on Management of Data, Seattle, WA, USA, June 1998, pp. 448-459.]]

Digital Library

[34]

{34} S. Muthukrishnan, V. Poosala, T. Suel, On rectangular partitioning in two dimensions: Algorithms, complexity, and applications, in: Proceedings of the 7th IEEE International Conference on Database Theory, Jerusalem, Israel, January 1999, pp. 236-256.]]

[35]

{35} The NEMESIS Project: Warehousing and Analysis of Network-Management Data. Available from: 〈http:// www.bell-labs.com/project/nemesis/〉.]]

[36]

{36} A. Papoulis, Probability, Random Variables, and Stochastic Processes, second ed., McGraw-Hill, New York City, NY, USA, 1984.]]

[37]

{37} V. Poosala, V. Ganti, Fast approximate answers to aggregate queries on a data cube, in: Proceedings of the 11th International Conference on Statistical and Scientific Database Management, Cleveland, OH, USA, July 1999, pp. 24-33.]]

Digital Library

[38]

{38} V. Poosala, Y.E. Ioannidis, Selectivity estimation without the attribute value independence assumption, in: Proceedings of the 23rd International Conference on Very Large Databases, Athens, Greece, August 1997, pp. 486-495.]]

Digital Library

[39]

{39} V. Poosala, Y.E. Ioannidis, P.J. Haas, E. Shekita, Improved histograms for selectivity estimation of range predicates, in: Proceedings of the 1996 ACM International Conference on Management of Data, Montreal, Canada, May 1996, pp. 294-305.]]

Digital Library

[40]

{40} M.J.D. Powell, Approximation Theory and Methods, Cambridge University Press, Cambridge, England, 1982.]]

[41]

{41} J.R. Smith, V. Castelli, A. Jhingran, C.-S. Li, Dynamic assembly of views in data cubes, in: Proceedings of the 7th ACM Symposium on Principles of Database Systems, Seattle, WA, USA, June 1998, pp. 274-283.]]

[42]

{42} A. Stuart, J.K. Ord, in: Kendall Advanced Theory of Statistics, Vol. 1: Distribution Theory, sixth ed., Oxford University Press, New York City, NY, USA, 1998.]]

[43]

{43} Transactions Processing Council Benchmarks. Available from: 〈http://www.tpc.org〉.]]

[44]

{44} Program for TPC-D Data Generation with Skew. Available from: 〈ftp://ftp.research.microsoft.com/pub/users/ viveknar/tpcdskew〉.]]

[45]

{45} J.S. Vitter, M. Wang, B. Iyer, Data cube approximation and histograms via wavelets, in: Proceeding of the 7th ACM International Conference on Information and Knowledge Management, Bethesda, ML, USA, November 1998, pp. 96-104.]]

[46]

{46} J.S. Vitter, M. Wang, Approximate computation of multidimensional aggregates of sparse data using wavelets, in: Proceedings of the 1999 ACM International Conference on Management of Data, Philadelphia, PA, USA, June 1999, pp. 194-204.]]

Cited By

Cuzzocrea ABringas P(2022)CORE-BCD-mAI: A Composite Framework for Representing, Querying, and Analyzing Big Clinical Data by Means of Multidimensional AI ToolsHybrid Artificial Intelligent Systems10.1007/978-3-031-15471-3_16(175-185)Online publication date: 5-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-15471-3_16
Cuzzocrea ACiancarini P(2021)SeDaSOMA: A Framework for Supporting Serendipitous, Data-As-A-Service-Oriented, Open Big Data Management and AnalyticsProceedings of the 2021 5th International Conference on Cloud and Big Data Computing10.1145/3481646.3481647(1-7)Online publication date: 13-Aug-2021
https://dl.acm.org/doi/10.1145/3481646.3481647
Cuzzocrea AMercaldo FMartinelli F(2021)A Machine-Learning-Based Framework for Supporting Malware Detection and AnalysisComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-86970-0_25(353-365)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86970-0_25
Show More Cited By

Index Terms

Improving range-sum query evaluation on data cubes via polynomial approximation
1. Applied computing
  1. Operations research
    1. Decision analysis
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction
  2. Information systems applications
    1. Decision support systems

Recommendations

Approximate range---sum query answering on data cubes with probabilistic guarantees

Approximate range aggregate queries are one of the most frequent and useful kinds of queries for Decision Support Systems (DSS), as they are widely used in many data analysis tasks. Traditionally, sampling-based techniques have been proposed to tackle ...
Flexible query answering in data cubes
DaWaK'05: Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

This paper presents a new approach toward approximate query answering in data warehouses. The approach is based on an adaptation of rough set theory to multidimensional data, and offers cube exploration and mining facilities.

Since data in a data ...
Efficient Range-Sum Queries along Dimensional Hierarchies in Data Cubes
DBKDA '09: Proceedings of the 2009 First International Conference on Advances in Databases, Knowledge, and Data Applications

Fast response to users’ query and update requests continues to be one of the key requirements for OLAP systems. We outline the generalization of a space-efficient data structure, which makes it particularly suited for cubes with hierarchically ...

Comments

Information & Contributors

Information

Published In

cover image Data & Knowledge Engineering

Data & Knowledge Engineering Volume 56, Issue 2

February 2006

109 pages

ISSN:0169-023X

Issue’s Table of Contents

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 February 2006

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cuzzocrea ABringas P(2022)CORE-BCD-mAI: A Composite Framework for Representing, Querying, and Analyzing Big Clinical Data by Means of Multidimensional AI ToolsHybrid Artificial Intelligent Systems10.1007/978-3-031-15471-3_16(175-185)Online publication date: 5-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-15471-3_16
Cuzzocrea ACiancarini P(2021)SeDaSOMA: A Framework for Supporting Serendipitous, Data-As-A-Service-Oriented, Open Big Data Management and AnalyticsProceedings of the 2021 5th International Conference on Cloud and Big Data Computing10.1145/3481646.3481647(1-7)Online publication date: 13-Aug-2021
https://dl.acm.org/doi/10.1145/3481646.3481647
Cuzzocrea AMercaldo FMartinelli F(2021)A Machine-Learning-Based Framework for Supporting Malware Detection and AnalysisComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-86970-0_25(353-365)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86970-0_25
Cuzzocrea AMercaldo FMartinelli F(2021)A Framework for Supporting Ransomware Detection and Prevention Based on Hybrid AnalysisComputational Science and Its Applications – ICCSA 202110.1007/978-3-030-86970-0_2(16-27)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86970-0_2
Alcamo TCuzzocrea ABosco GPilato GSchicchi D(2020)Analysis and Comparison of Deep Learning Networks for Supporting Sentiment Mining in Text CorporaProceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services10.1145/3428757.3429144(91-96)Online publication date: 30-Nov-2020
https://dl.acm.org/doi/10.1145/3428757.3429144
Cuzzocrea AFadda EChbeir RManolopoulos YDamiani EBenslimane DBellatreche LMorzy T(2020)Data-Intensive Object-Oriented Adaptive Web SystemsProceedings of the 12th International Conference on Management of Digital EcoSystems10.1145/3415958.3433051(115-123)Online publication date: 2-Nov-2020
https://dl.acm.org/doi/10.1145/3415958.3433051
Al-Ali HCuzzocrea ADamiani EMizouni RTello G(2020)A composite machine-learning-based framework for supporting low-level event logs to high-level business process model activities mappings enhanced by flexible BPMN model translationSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-019-04385-624:10(7557-7578)Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1007/s00500-019-04385-6
Cuzzocrea AMumolo EMoro AOssowski S(2016)Real-time counting of moving objects in complex environmentsProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851714(588-595)Online publication date: 4-Apr-2016
https://dl.acm.org/doi/10.1145/2851613.2851714
Cuzzocrea ADe Maio CFenza GLoia VParente MOssowski S(2016)OLAP analysis of multidimensional tweet streams for supporting advanced analyticsProceedings of the 31st Annual ACM Symposium on Applied Computing10.1145/2851613.2851662(992-999)Online publication date: 4-Apr-2016
https://dl.acm.org/doi/10.1145/2851613.2851662
Ceci MCuzzocrea AMalerba D(2015)Effectively and efficiently supporting roll-up and drill-down OLAP operations over continuous dimensions via hierarchical clusteringJournal of Intelligent Information Systems10.1007/s10844-013-0268-144:3(309-333)Online publication date: 1-Jun-2015
https://dl.acm.org/doi/10.1007/s10844-013-0268-1
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents