skip to main content
10.1145/2591796.2591812acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Turnstile streaming algorithms might as well be linear sketches

Published: 31 May 2014 Publication History

Abstract

In the turnstile model of data streams, an underlying vector x ∈ {--m,--m+1,..., m--1,m}n is presented as a long sequence of positive and negative integer updates to its coordinates. A randomized algorithm seeks to approximate a function f(x) with constant probability while only making a single pass over this sequence of updates and using a small amount of space. All known algorithms in this model are linear sketches: they sample a matrix A from a distribution on integer matrices in the preprocessing phase, and maintain the linear sketch A·x while processing the stream. At the end of the stream, they output an arbitrary function of A · x. One cannot help but ask: are linear sketches universal?
In this work we answer this question by showing that any 1-pass constant probability streaming algorithm for approximating an arbitrary function f of x in the turnstile model can also be implemented by sampling a matrix A from the uniform distribution on O(n logm) integer matrices, with entries of magnitude poly(n), and maintaining the linear sketch Ax. Furthermore, the logarithm of the number of possible states of Ax, as x ranges over {--m,--m + 1,..., m}n, plus the amount of randomness needed to store A, is at most a logarithmic factor larger than the space required of the space-optimal algorithm. Our result shows that to prove space lower bounds for 1-pass streaming algorithms, it suffices to prove lower bounds in the simultaneous model of communication complexity, rather than the stronger 1-way model. Moreover, the fact that we can assume we have a linear sketch with polynomially-bounded entries further simplifies existing lower bounds, e.g., for frequency moments we present a simpler proof of the Ω(n1-2/k) bit complexity lower bound without using communication complexity.

Supplementary Material

MP4 File (p174-sidebyside.mp4)

References

[1]
N. Alon, Y. Matias, and M. Szegedy. The Space Complexity of Approximating the Frequency Moments. JCSS, 58(1):137--147, 1999.
[2]
A. Andoni. High frequency moment via max stability. Available at http://web.mit.edu/andoni/www/papers/fkStable.pdf.
[3]
A. Andoni, R. Krauthgamer, and K. Onak. Streaming algorithms via precision sampling. In FOCS, pages 363--372, 2011.
[4]
A. Andoni, H. L. Nguyên, Y. Polyanskiy, and Y. Wu. Tight lower bound for linear sketches of moments. In ICALP (1), pages 25--32, 2013.
[5]
Z. Bar-Yossef. The Complexity of Massive Data Set Computations. PhD thesis, University of California, Berkeley, 2002.
[6]
Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. Information theory methods in communication complexity. In IEEE Conference on Computational Complexity, pages 93--102, 2002.
[7]
Z. Bar-Yossef, T. S. Jayram, R. Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity. J. Comput. Syst. Sci., 68(4):702--732, 2004.
[8]
A. Barvinok. Integer Points in Polyhedra. Contemporary mathematics. European Mathematical Society, 2008.
[9]
P. Beame, T. S. Jayram, and A. Rudra. Lower bounds for randomized read/write stream algorithms. In STOC, pages 689--698, 2007.
[10]
L. Bhuvanagiri, S. Ganguly, D. Kesh, and C. Saha. Simpler algorithm for estimating frequency moments of data streams. In SODA, pages 708--713, 2006.
[11]
V. Braverman and R. Ostrovsky. Recursive sketching for frequency moments. CoRR, abs/1011.2571, 2010.
[12]
A. Chakrabarti, K. Do Ba, and S. Muthukrishnan. Estimating Entropy and Entropy Norm on Data Streams. In STACS, pages 196--205, 2006.
[13]
A. Chakrabarti, S. Khot, and X. Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In CCC, pages 107--117, 2003.
[14]
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In ICALP, pages 693--703, 2002.
[15]
K. L. Clarkson and D. P. Woodruff. Numerical linear algebra in the streaming model. In STOC, pages 205--214, 2009.
[16]
D. Coppersmith and R. Kumar. An improved data stream algorithm for frequency moments. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 151--156, 2004.
[17]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. J. Algorithms, 55(1):58--75, 2005.
[18]
J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina. On distributing symmetric streaming computations. ACM Transactions on Algorithms, 6(4), 2010.
[19]
P. Flajolet and G. N. Martin. Probabilistic counting. In Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 76--82, 1983.
[20]
S. Ganguly. Estimating frequency moments of data streams using random linear combinations. In Proceedings of the 8th International Workshop on Randomization and Computation (RANDOM), pages 369--380, 2004.
[21]
S. Ganguly. A hybrid algorithm for estimating frequency moments of data streams, 2004. Manuscript.
[22]
S. Ganguly. Lower bounds on frequency estimation of data streams. In Proceedings of the 3rd international conference on Computer science: theory and applications, CSR'08, pages 204--215, 2008.
[23]
S. Ganguly. Distributing frequency-dependent data stream computations. In Proceedings of the Fifteenth Australasian Symposium on Computing: The Australasian Theory - Volume 94, CATS '09, pages 163--170, 2009.
[24]
S. Ganguly. Polynomial estimators for high frequency moments. CoRR, abs/1104.4552, 2011.
[25]
S. Ganguly. A lower bound for estimating high moments of a data stream. CoRR, abs/1201.0253, 2012.
[26]
A. Gronemeier. Asymptotically optimal lower bounds on the nih-multi-party information complexity of the and-function and disjointness. In STACS, pages 505--516, 2009.
[27]
M. Hardt and D. P. Woodruff. How robust are linear sketches to adaptive inputs? In STOC, pages 121--130, 2013.
[28]
P. Indyk. Sketching, streaming and sublinear-space algorithms, 2007. Graduate course notes available at http://stellar.mit.edu/S/course/6/fa07/6.895/.
[29]
P. Indyk and D. P. Woodruff. Optimal approximations of the frequency moments of data streams. In STOC, pages 202--208, 2005.
[30]
T. S. Jayram. Hellinger strikes back: A note on the multi-party information complexity of and. In APPROX-RANDOM, pages 562--573, 2009.
[31]
I. Kremer, N. Nisan, and D. Ron. On randomized one-round communication complexity. Computational Complexity, 8(1):21--49, 1999.
[32]
Y. Li, H. L. Nguyen, and D. P. Woodruff. On sketching matrix norms and the top singular vector. In SODA, 2014.
[33]
M. Monemizadeh and D. P. Woodruff. 1-pass relative-error lp-sampling with applications. In SODA, 2010.
[34]
S. Muthukrishnan. Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 1(2):117--236, 2005.
[35]
I. Newman. Private vs. common random bits in communication complexity. Inf. Process. Lett., pages 67--71, 1991.
[36]
E. Price and D. P. Woodruff. Lower bounds for adaptive sparse recovery. In SODA, pages 652--663, 2013.
[37]
A. Storjohann. Algorithms for Matrix Canonical Forms. PhD thesis, Eidgenössische Technische Hochschule Zürich, 2000.
[38]
J. von zur Gathen and M. Sieveking. A bound on solutions of linear integer equalities and inequalities. Proceedings of the American Mathematical Society, 72(1):155--158, 1978.
[39]
D. P. Woodruff. Optimal space lower bounds for all frequency moments. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 167--175, 2004.

Cited By

View all

Index Terms

  1. Turnstile streaming algorithms might as well be linear sketches

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computing
    May 2014
    984 pages
    ISBN:9781450327107
    DOI:10.1145/2591796
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 May 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. communication complexity
    2. linear sketches
    3. lower bounds
    4. streaming algorithms

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    STOC '14
    Sponsor:
    STOC '14: Symposium on Theory of Computing
    May 31 - June 3, 2014
    New York, New York

    Acceptance Rates

    STOC '14 Paper Acceptance Rate 91 of 319 submissions, 29%;
    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Upcoming Conference

    STOC '25
    57th Annual ACM Symposium on Theory of Computing (STOC 2025)
    June 23 - 27, 2025
    Prague , Czech Republic

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media