skip to main content
10.1145/1379272.1379285acmotherconferencesArticle/Chapter ViewAbstractPublication PagessspsConference Proceedingsconference-collections
research-article

Index tuning for parameterized streaming groupby queries

Published: 29 March 2008 Publication History

Abstract

Similar groupby queries are common in many stream processing applications. We propose the concept of the parameterized streaming groupby query template (PSGB template) as an abstraction for representing potentially infinite number of runtime instantiated groupby queries with customized results. To handle high-speed data streams and large numbers of PSGB queries, the IMP index is proposed for organizing the quickly evolving PSGB operator state to support query workloads. In this paper, we tackle the IMP index tuning problem. We propose the EPrune algorithm that is guaranteed to find the optimal IMP index configuration for a given query workload. To support frequent index tuning required for coping with dynamic stream environments, efficiency of index selection becomes more important than guaranteed optimality. To achieve this, we design a greedy index selection algorithm named RGreedy and equip it with three heuristics - OWL, PCL and Hybrid. Our experiments show that RGreedy finds the optimal IMP configuration in practically all of our extensive test cases. While EPrune takes hours to finish, RGreedy terminates within seconds.

References

[1]
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, pages 496--505, 2000.
[2]
A. V. Aho and J. D. Ullman. Optimal partial-match retrieval when fields are independently specified. ACM ransactions on Database Systems (TODS), 4(2):168--179, 1979.
[3]
A. Arasu and J. Widom. Resource sharing in continuous sliding-window aggregates. In VLDB, pages 336--347, 2004.
[4]
B. Babcock, S. Babu, R. Motwani, and J. Widom. Models and issues in data stream systems. In PODS, pages 1--16, 2002.
[5]
S. Chandrasekaran and M. J. Franklin. PSoup: a system for streaming queries over streaming data. VLDB ournal, 12(2):140--156, 2003.
[6]
D. Chatziantoniou, M. O. Akinde, T. Johnson, and S. Kim. The md-join: An operator for complex olap. In ICDE, pages 524--533, 2001.
[7]
S. Chaudhuri, M. Datar, and V. R. Narasayya. Index selection for databases: A hardness study and a principled heuristic solution. IEEE Trans. Knowl. Data Eng., 16(11):1313--1323, 2004.
[8]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In SODA, pages 635--644, 2002.
[9]
G. R. Finnie and J. Barker. Real-time business intelligence in multi-agent adaptive supply networks. In EEE, pages 218--221, 2005.
[10]
M. R. Frank, E. Omiecinski, and S. B. Navathe. Adaptive and automated index selection in RDBMS. In EDBT, pages 277--292, 1992.
[11]
L. Golab, S. Garg, and M. T. Ozsu. On indexing sliding windows over on-line data streams. In EDBT, pages 712--729, 2004.
[12]
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--169, June 1993.
[13]
A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. In VLDB, pages 358--369, 1995.
[14]
M. Hammer and A. Chan. Index selection in a self-adaptive data base management system. In SIGMOD, pages 1--8, 1976.
[15]
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD, pages 205--216, 1996.
[16]
J. Kang, J. F. Naughton, and S. D. Viglas. Evaluating window joins over unbounded streams. In ICDE, pages 341--352, 2003.
[17]
S. Krishnamurthy, C. Wu, and M. J. Franklin. On-the-fly sharing for streamed aggregation. In SIGMOD, pages 623--634, 2006.
[18]
J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Record, 34(1):39--44, 2005.
[19]
J. W. Lloyd. Optimal partial-match retrieval. BIT, 20(4):406--413, 1980.
[20]
E. A. Rundensteiner, L. Ding, T. Sutherland, Y. Zhu, B. Pielech, and N. Mehta. CAPE: Continuous query engine with heterogeneous-grained adaptivity. In VLDB Demo, pages 1353--1356, Aug/Sep 2004.
[21]
Stanford University. Stream query repository. http://www-db.stanford.edu/stream/sqr/, 2002.
[22]
P. A. Tucker, K. Tufte, V. Papadimos, and D. Maier. Nexmark - a benchmark for querying data streams. Technical report, OGI, 2003.
[23]
A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. DPD, 1(1):103--128, 1993.
[24]
R. Zhang, N. Koudas, B. C. Ooi, and D. Srivastava. Multiple aggregations over data streams. In SIGMOD, pages 299--310, 2005.
[25]
Y. Zhu, E. A. Rundensteiner, and G. T. Heineman. Dynamic plan migration for continuous queries over data streams. In SIGMOD, pages 431--442, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSPS '08: Proceedings of the 2nd international workshop on Scalable stream processing system
March 2008
99 pages
ISBN:9781595939630
DOI:10.1145/1379272
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. groupby
  2. index selection
  3. index tuning
  4. streaming data processing

Qualifiers

  • Research-article

Conference

EDBT '08

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media