skip to main content
10.1145/564870.564880acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

Distributed streams algorithms for sliding windows

Published: 10 August 2002 Publication History

Abstract

This paper presents algorithms for estimating aggregate functions over a "sliding window" of the N most recent data items in one or more streams. Our results include
For a single stream, we present the first ε-approximation scheme for the number of 1's in a sliding window that is optimal in both worst case time and space. We also present the first ε for the sum of integers in [0.R] in a sliding window that is optimal in both worst case time and space (assuming R is at most polynomial in N). Both algorithms are deterministic and use only logarithmic memory words.
In contrast, we show that an deterministic algorithm that estimates, to within a small constant relative error, the number of 1's (or the sum of integers) in a sliding window over the union of distributed streams requires Ω(N) space.
We present the first randomized (ε,δ)-approximation scheme for the number of 1's in a sliding window over the union of distributed streams that uses only logarithmic memory words. We also present the first (ε,δ)-approximation scheme for the number of distinct values in a sliding window over distributed streams that uses only logarithmic memory words.
Our results are obtained using a novel family of synopsis data structures.

References

[1]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. of Computer and System Sciences, 58:137--147, 1999.]]
[2]
B. Babcock, M. Datar, and R. Motwani. Sampling from a moving window over streaming data. In Proc. 13th ACM-SIAM Symp. on Discrete Algorithms, pages 633--634, Jan. 2002. Short paper.]]
[3]
Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proc. 13th ACM-SIAM Symp. on Discrete Algorithms, pages 623--632, Jan. 2002.]]
[4]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proc. 13th ACM-SIAM Symp. on Discrete Algorithms, pages 635--644, Jan. 2002.]]
[5]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. In Proc. 40th IEEE Symp. on Foundations of Computer Science, pages 501--511, Oct. 1999.]]
[6]
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. Testing and spot-checking of data streams. In Proc. 11th ACM-SIAM Symp. on Discrete Algorithms, pages 165--174, Jan. 2000.]]
[7]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Computer and System Sciences, 31:182--209, 1985.]]
[8]
J. Fong and M. Strauss. An approximate Lp-difference algorithm for massive data streams. In Proc. 17th Symp. on Theoretical Aspects of Computer Science, LNCS 1770, pages 193--204. Springer, Feb. 2000.]]
[9]
J. Gehrke, F. Korn, and D. Srivastava. On computing correlated aggregates over continual data streams. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 13--24, May 2001.]]
[10]
P. B. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In Proc. 27th International Conf. on Very Large Data Bases, pages 541--550, Sept. 2001.]]
[11]
P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 331--342, June 1998.]]
[12]
P. B. Gibbons and Y. Matias. Synopsis data structures for massive data sets. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, pages 39--70. AMS, 1999. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science, Vol. 50. A two page summary appeared as a short paper in SODA'99.]]
[13]
P. B. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In Proc. 13th ACM Symp. on Parallel Algorithms and Architectures, pages 281--290, July 2001.]]
[14]
A. C. Gilbert, Y. Kotidis, and M. J. Strauss. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proc. 27th International Conf. on Very Large Data Bases, pages 79--88, Sept. 2001.]]
[15]
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 58--66, May 2001.]]
[16]
S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In Proc. 33rd ACM Symp. on Theory of Computing, pages 471--475, July 2001.]]
[17]
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. 41st IEEE Symp. on Foundations of Computer Science, pages 359--366, Nov. 2000.]]
[18]
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical report, Digital Systems Research Center, Palo Alto, CA, May 1998.]]
[19]
P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation. In Proc. 41st IEEE Symp. on Foundations of Computer Science, pages 189--197, Nov. 2000.]]
[20]
P. Indyk. Personal communication, 2002.]]
[21]
I. Kremer, N. Nisan, and D. Ron. On randomized one-round communication complexity. Computational Complexity, 8(1):21--49, 1999.]]
[22]
E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, Cambridge, UK, 1997.]]
[23]
I. Newman. Private vs. common random bits in communication complexity. Information Processing Letters, 39:67--71, 1991.]]
[24]
I. Newman and M. Szegedy. Public vs. private coin flips in one round communication games. In Proc. 28th ACM Symp. on the Theory of Computing, pages 561--570, May 1996.]]
[25]
L. Trevisan. A note on counting distinct elements in the streaming model. Unpublished manuscript, April 2001.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '02: Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
August 2002
302 pages
ISBN:1581135297
DOI:10.1145/564870
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2002

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streams
  2. distributed
  3. sliding windows
  4. waves

Qualifiers

  • Article

Conference

SPAA02

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media