skip to main content
article

Deterministic sampling and range counting in geometric data streams

Published: 01 May 2007 Publication History

Abstract

We present memory-efficient deterministic algorithms for constructing ϵ-nets and ϵ-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are at least inverse-polylogarithmic. We also include a lower bound for noniceberg geometric queries.

References

[1]
Agarwal, P. K., Har-Peled, S., and Varadarajan, K. R. 2005. Geometric approximation via coresets. http://valis.cs.uiuc.edu/sariel/research/papers/.
[2]
Agarwal, P. K., Har-Peled, S., and Vardarajan, K. R. 2004. Maintaining the extent of a moving point set. J. Assoc. Comput. Mach. 51, 4, 606--635.
[3]
Amenta, N., Bern, M. W., Eppstein, D., and Teng, S.-H. 2000. Regression depth and center points. Discrete Comput. Geomet. 23, 3, 305--323. arxiv:cs.CG/9809037.
[4]
Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 22nd Annual ACM Symposium on Principles of Databases Systems. dbpubs.stanford.edu:8090/pub/2002-19.
[5]
Bagchi, A., Chaudhary, A., Eppstein, D., and Goodrich, M. T. 2004. Deterministic sampling and range counting in geometric data streams. In Proceedings of the 20th ACM Symposium on Computational Geometry. (Brooklyn, NY). 144--151.
[6]
Bentley, J. L., and Saxe, J. B. 1980. Decomposable searching problems I: Static-to-Dynamic transformation. J. Alg. 1, 4, 301--358.
[7]
Bern, M. W., and Eppstein, D. 2002. Multivariate regression depth. Discrete Comput. Geom. 28, 1 (Jul.), 1--17. arxiv:cs.CG/9912013.
[8]
Brönnimann, H., and Chazelle, B. 1998. Optimal slope selection via cuttings. Comput. Geom.: Theory Appli. 10, 23--39.
[9]
Brönnimann, H., Chen, B., Dash, M., Haas, P., Qiao, Y., and Scheuerman, P. 2003a. Efficient data-reduction methods for on-line association rule discovery. In Data Mining: Next Generation Challenges and Future Directions, Selected papers from the NSF Workshop on Next-Generation Data Mining (NGDM '02). MIT Press, Cambridges, MA. 190--208.
[10]
Brönnimann, H., Chen, B., Dash, M., Haas, P., and Scheuermann, P. 2003b. Efficient data reduction with ease. In Proceedings of the 9th Annual ACM-SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 59--68.
[11]
Chazelle, B., Liu, D., and Magen, A. 2003. Sublinear geometric algorithms. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing. 531--540.
[12]
Chazelle, B., and Welzl, E. 1989. Quasi-Optimal range searching in spaces of finite VC-dimension. Discrete Comput. Geom. 4, 467--489.
[13]
Clarkson, K., Eppstein, D., Miller, G. L., Sturtivant, C., and Teng, S.-H. 1996. Approximating center points with iterated Radon points. Int. J. Comput. Geom. Appl. 6, 3, 357--377.
[14]
Cormode, G., and Muthukrishnan, S. 2003. Radial histograms for spatial streams. Tech. Rep. 2003-11, DIMACS.
[15]
Datar, M., Gionis, A., Indyk, P., and Motwani, R. 2002. Maintaining stream statistics over sliding windows. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. 635--644.
[16]
Demaine, E. D., López-Ortiz, A., and Munro, J. I. 2002. Frequency estimation of Internet packet streams with limited space. In Proceedings of the 10th Annual European Symposium on Algorithms (ESA). 348--360.
[17]
Donoho, D. L. 1982. Breakdown properties of multivariate location estimators. Ph.D. thesis, Harvard University.
[18]
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., and Ullman, J. D. 1998. Computing iceberg queries efficiently. In Proceedings of the International Conference on Very Large Data Bases. 299--310.
[19]
Feigenbaum, J., Kannan, S., and Zhang, J. 2002. Computing diameter in the streaming and sliding-window models. Tech. Rep. YALEU/DCS/TR-1245, Yale University.
[20]
Goodrich, M. T., and Ramos, E. A. 1997. Bounded-Independence derandomization of geometric partitioning with applications to parallel fixed-dimensional linear programming. Discrete Comput. Geom. 18, 397--420.
[21]
Greenwald, M., and Khanna, S. 2001. Space-Efficient online computation of quantile summaries. In Proceedings of the ACM-SIGMOD International Conference on Management of Data. 58--66.
[22]
Guha, S., Mishra, N., Motwani, R., and O'Callaghan, L. 2000. Clustering data streams. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science. 359--366.
[23]
Gupta, A., and Zane, F. X. 2003. Counting inversions in lists. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. 253--254.
[24]
Har-Peled, S., and Mazumdar, S. 2003. Fast algorithms for computing the smallest k-enclosing disc. In Proceedings of the 11th Annual European Symposium on Algorithms. Lecture Notes in Computer Science. Springer Verlag, Berlin.
[25]
Hershberger, J., and Suri, S. 2003. Convex hulls and related problems in data streams. In Proceedings of the ACM/DIMACS Workshop on Management and Processing of Data Streams. http://www.research.att.com/conf/mpds2003/.
[26]
Indyk, P. 2003a. Better algorithms for high-dimensional proximity problems via asymmetric embeddings. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. 539--545.
[27]
Indyk, P. 2003b. Stream-Based geometric algorithms. In Proceedings of the ACM/DIMACS Workshop on Management and Processing of Data Streams. http://www.research.att.com/conf/mpds2003/.
[28]
Jadhav, S., and Mukhopadhyay, A. 1993. Computing a centerpoint of a finite planar set of points in linear time. In Proceedings of the 9th Annual ACM Symposium on Computational Geometry. 83--90.
[29]
Karp, R. M., Shenker, S., and Papadimitriou, C. H. 2003. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 1 (Mar.), 51--55.
[30]
Korn, F., Muthukrishnan, S., and Srivastava, D. 2002. Reverse nearest neighbour aggregates over data streams. In Proceedings of the International Conference on Very Large Data Bases. 814--825.
[31]
Langerman, S., and Steiger, W. 2003. Optimization in arrangements. In Proceedings of the 20th International Symposium Theoretical Aspects of Computer Science, H. Alt and M. Habib, Eds. Lecture Notes in Computer Science, vol. 2607. Springer Verlag, Berlin. 50--61.
[32]
Liu, R. Y. 1990. On a notion of data depth based on random simplices. Ann. Statist. 18, 405--414.
[33]
Manku, G. S., and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the International Conference on Very Large Data Bases. 346--357.
[34]
Matoušek, J. 1991. Computing the center of planar point sets. In Discrete and Computational Geometry: Papers from the DIMACS Special Year, J. E. Goodman et al., eds. Number 6 in DIMACS Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society, Boston, MA. 221--230.
[35]
Matoušek, J. 1993. Epsilon-Nets and computational geometry. In New Trends in Discrete and Computational Geometry, J. Pach, ed. Algorithms and Combinatorics, vol. 10. Springer Verlag, Heidelberg. 69--89.
[36]
Matoušek, J. 1995. Approximations and optimal geometric divide-and-conquer. J. Comput. Syst. Sci. 50, 203--208.
[37]
Matoušek, J. 2000. Derandomization in computational geometry. In Handbook of Computational Geometry, J.-R. Sack and J. Urrutia, eds. Elsevier Science, North Holland, Amsterdam. 559--595.
[38]
Muthukrishnan, S. 2003. Data streams: Algorithms and applications. Invited talk at (Symposium on Discrete Algorithms) (SODA) 2003. Available on request by email to [email protected].
[39]
Rousseeuw, P. J., and Hubert, M. 1999. Regression depth. J. Amer. Statis. Assoc. 94, 446, 388--402.
[40]
Rousseeuw, P. J., and Leroy, A. M. 1987. Robust Regression and Outlier Detection. John Wiley, New York.
[41]
Rousseeuw, P. J., and Struyf, A. 1998. Computing location depth and regression depth in higher dimensions. Statist. Comput. 8, 193--203.
[42]
Sauer, N. 1972. On the density of families of sets. J. Combin. Theory Ser. A 13, 145--147.
[43]
Sen, P. K. 1968. Estimates of the regression coefficient based on Kendall's tau. J. Amer. Statist. Assoc. 63, 1379--1389.
[44]
Suri, S., Tóth, C. D., and Zhou, Y. 2004. Range counting over multidimensional data streams. In Proceedings of the 20th ACM Symposium on Computational Geometry (Brooklyn, NY) 160--169.
[45]
Thiel, H. 1950. A rank-invariant method of linear and polynomial regression analysis, Part 3. Proceedings of the Koninalijke Nederlandse Akademie van Weinenschatpen A 53, 1397--1412.
[46]
Vapnik, V. N., and Chervonenkis, A. Y. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16, 264--280.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Algorithms
ACM Transactions on Algorithms  Volume 3, Issue 2
May 2007
338 pages
ISSN:1549-6325
EISSN:1549-6333
DOI:10.1145/1240233
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2007
Published in TALG Volume 3, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data streams
  2. epsilon nets
  3. geometric data
  4. iceberg queries
  5. range counting
  6. robust statistics
  7. sampling
  8. streaming algorithms

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media