skip to main content
article

Continuous queries over data streams

Published: 01 September 2001 Publication History

Abstract

In many recent applications, data may take the form of continuous data streams, rather than finite stored data sets. Several aspects of data management need to be reconsidered in the presence of data streams, offering a new research direction for the database community. In this paper we focus primarily on the problem of query processing, specifically on how to define and evaluate continuous queries over data streams. We address semantic issues as well as efficiency concerns. Our main contributions are threefold. First, we specify a general and flexible architecture for query processing in the presence of data streams. Second, we use our basic architecture as a tool to clarify alternative semantics and processing techniques for continuous queries. The architecture also captures most previous work on continuous queries and data streams, as well as related concepts such as triggers and materialized views. Finally, we map out research topics in the area of query processing over data streams, showing where previous work is relevant and describing problems yet to be addressed.

References

[1]
{AF00} M. Altinel and M. J. Franklin. Efficient filtering of XML documents for selective dissemination of information. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 53-64, September 2000.
[2]
{AGP00} S. Acharya, P. B. Gibbons, and V. Poosala. Congressional samples for approximate answering of group-by queries. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 487-498, May 2000.
[3]
{AGPR99} S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 275-286, June 1999.
[4]
{AH00} R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 261-272, May 2000.
[5]
{AMS96} N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. of the 1996 Annual ACM Symp. on Theory of Computing, pages 20-29, May 1996.
[6]
{B+97} D. Barbara et al. The New Jersey data reduction report. IEEE Data Engineering Bulletin, 20(4):3-45, 1997.
[7]
{Bar99} D. Barbara. The characterization of continuous queries. Intl. Journal of Cooperative Information Systems, 8(4):295-323, December 1999.
[8]
{BCL89} J. A. Blakeley, N. Coburn, and P. A. Larson. Updating derived relations: Detecting irrelevant and autonomously computable updates. ACM Trans. on Database Systems, 14(3):369-400, 1989.
[9]
{BGR01} S. Babu, M. N. Garofalakis, and R. Rastogi. SPARTAN: A model-based semantic compression system for massive data tables. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 283-294, May 2001.
[10]
{CDTW00} J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pages 379-390, May 2000.
[11]
{CFPR00} C. Cortes, K. Fisher, D. Pregibon, and A. Rogers. Hancock: a language for extracting signatures from data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 9-17, August 2000.
[12]
{CGRS00} K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 111-122, September 2000.
[13]
{CMN99} S. Chaudhuri, R. Motwani, and V. R. Narasayya. On random sampling over joins. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 263-274, June 1999.
[14]
{DG00} N. G. Duffield and M. Grossglauser. Trajectory sampling for direct traffic observation. In Proc. of the 2000 ACM SIGCOMM, pages 271-284, September 2000.
[15]
{DH00} P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. of the 2000 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pages 71-80, August 2000.
[16]
{Fin82} S. J. Finkelstein. Common subexpression analysis in database applications. In Proc. of the 1982 ACM SIGMOD Intl. Conf. on Management of Data, pages 235-245, June 1982.
[17]
{FRM94} C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, pages 419-429, May 1994.
[18]
{FW98} A. Fiat and G. J. Woeginger. Online Algorithms, The State of the Art. Springer-Verlag, Berlin, 1998.
[19]
{GJM96} A. Gupta, H. V. Jagadish, and I. S. Mumick. Data integration using self-maintainable views. In Proc. of the 1996 Intl. Conf. on Extending Database Technology, pages 140-144, March 1996.
[20]
{GK01} M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 58-66, May 2001.
[21]
{GKMS01} A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In Proc. of the 2001 Intl. Conf. on Very Large Data Bases, September 2001.
[22]
{GKS01} J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 13-24, May 2001.
[23]
{GM95} A. Gupta and I. S. Mumick. Maintenance of materialized views: Problems, techniques, and applications. IEEE Data Engineering Bulletin, 18(2):3-18, June 1995.
[24]
{GM99} P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In External Memory Algorithms, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 50, 1999.
[25]
{GMLY98} H. Garcia-Molina, W. J. Labio, and J. Yang. Expiring data in a warehouse. In Proc. of the 1998 Intl. Conf. on Very Large Data Bases, pages 500-511, August 1998.
[26]
{GMMO00} S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. of the 2000 Annual Symp. on Foundations of Computer Science, pages 359-366, November 2000.
[27]
{GMP97} P. B. Gibbons, Y. Matias, and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. of the 1997 Intl. Conf. on Very Large Data Bases, pages 466-475, August 1997.
[28]
{Gra90} Goetz Graefe. Encapsulation of parallelism in the volcano query processing system. In Proc. of the 1990 ACM SIGMOD Intl. Conf. on Management of Data, pages 102-111, May 1990.
[29]
{Gra93} G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73-170, 1993.
[30]
{HF+00} J. M. Hellerstein, M. J. Franklin, et al. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 23(2):7-18, June 2000.
[31]
{HH99} P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 287-298, June 1999.
[32]
{HHW97} J. M. Hellerstein, P. J. Haas, and H. Wang. Online aggregation. In Proc. of the 1997 ACM SIGMOD Intl. Conf. on Management of Data, pages 171-182, May 1997.
[33]
{Hid99} C. Hidber. Online association rule mining. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 145-156, June 1999.
[34]
{HRR98} M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report TR-1998-011, Compaq Systems Research Center, Palo Alto, California, May 1998.
[35]
{HSD01} G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. of the 2001 ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, August 2001. (To appear).
[36]
{IFF+99} Z. G. Ives, D. Florescu, M. Friedman, A. Y. Levy, and D. S. Weld. An adaptive query execution system for data integration. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 299-310, June 1999.
[37]
{IP99} Y. E. Ioannidis and V. Poosala. Histogram-based approximation of set-valued query-answers. In Proc. of the 1999 Intl. Conf. on Very Large Data Bases, pages 174-185, September 1999.
[38]
{JMS95} H. V. Jagadish, I. S. Mumick, and A. Silberschatz. View maintenance issues for the Chronicle data model. In Proc. of the 1995 ACM Symp. on Principles of Database Systems, pages 113-124, May 1995.
[39]
{KGM95} B. Kao and H. Garcia-Molina. An overview of real-time database systems. In S. H. Son, editor, Advances in Real-Time Systems, pages 463-486. Prentice Hall, 1995.
[40]
{LPT99} L. Liu, C. Pu, and W. Tang. Continual queries for internet scale event-driven information delivery. IEEE Trans. on Knowledge and Data Engineering, 11(4):583-590, August 1999.
[41]
{MRL99} G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 251-262, June 1999.
[42]
{MVW00} Y. Matias, J. S. Vitter, and M. Wang. Dynamic maintenance of wavelet-based histograms. In Proc. of the 2000 Intl. Conf. on Very Large Data Bases, pages 101-110, September 2000.
[43]
{NACP01} B. Nguyen, S. Abiteboul, G. Cobena, and M. Preda. Monitoring XML data on the web. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 437-448, May 2001.
[44]
{PG99} V. Poosala and V. Ganti. Fast approximate answers to aggregate queries on a data cube. In Proc. of the 1999 Intl. Conf. on Scientific and Statistical Database Management, pages 24-33, July 1999.
[45]
{QGMW96} D. Quass, A. Gupta, I. S. Mumick, and J. Widom. Making views self-maintainable for data warehousing. In Proc. of the 1996 Intl. Conf. on Parallel and Distributed Information Systems, pages 158-169, December 1996.
[46]
{Se188} T. K. Sellis. Multiple-query optimization. ACM Trans. on Database Systems, 13(1):23-52, 1988.
[47]
{SLR94} P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query processing. In Proc. of the 1994 ACM SIGMOD Intl. Conf. on Management of Data, pages 430-441, May 1994.
[48]
{SPAM91} U. Schreier, H. Pirahesh, R. Agrawal, and C. Mohan. Alert: An architecture for transforming a passive DBMS into an active DBMS. In Proc. of the 1991 Intl. Conf. on Very Large Data Bases, pages 469-478, September 1991.
[49]
{STD+00} J. Shanmugasundaram, K. Tufte, D. J. DeWitt, J. F. Naughton, and D. Maier. Architecting a network query engine for producing partial results. In Proc. of the 2000 Intl. Workshop on the Web and Databases, pages 17-22, May 2000.
[50]
{Sul96} M. Sullivan. Tribeca: A stream database manager for network traffic analysis. In Proc. of the 1996 Intl. Conf. on Very Large Data Bases, page 594, September 1996.
[51]
{Tan96} A. S. Tanenbaum. Computer Networks. Prentice Hall, Upper Saddle River, New Jersey, 1996.
[52]
{Tea99} Times-Ten Team. In-memory data management for consumer transactions: The Times-Ten approach. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 528-529, June 1999.
[53]
{TGNO92} D. B. Terry, D. Goldberg, D. Nichols, and B. M. Oki. Continuous queries over append-only databases. In Proc. of the 1992 ACM SIGMOD Intl. Conf. on Management of Data, pages 321-330, June 1992.
[54]
{Tra} Traderbot home page. http://www.traderbot.com.
[55]
{UF01} T. Urhan and M. J. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. In Proc. of the 2001 Intl. Conf. on Very Large Data Bases, September 2001. (To appear).
[56]
{UW97} J.D. Ullman and J. Widom. A First Course in Database Systems. Prentice Hall, Upper Saddle River, New Jersey, 1997.
[57]
{Vit85} J. S. Vitter. Random sampling with a reservoir. ACM Trans. on Mathematical Software, 11(1):37-57, March 1985.
[58]
{VW99} J. S. Vitter and M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 193-204, June 1999.
[59]
{WA91} A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. of the 1991 Intl. Conf. on Parallel and Distributed Information Systems, pages 68-77, December 1991.
[60]
{WC96} J. Widom and S. Ceri. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann, San Francisco, California, 1996.
[61]
{XPA99} XML path language (XPath) version 1.0, November 1999. W3C Recommendation available at http://www.w3.org/TR/xpath.
[62]
{YSJ+00} B. Yi, N. Sidiropoulos; T. Johnson, H. V. Jagadish, C. Faloutsos, and A. Biliris. Online data mining for coevolving time sequences. In Proc. of the 2000 Intl. Conf. on Data Engineering, pages 13-22, March 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 30, Issue 3
September 2001
97 pages
ISSN:0163-5808
DOI:10.1145/603867
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2001
Published in SIGMOD Volume 30, Issue 3

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media