skip to main content
10.1145/3580305.3599505acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

SketchPolymer: Estimate Per-item Tail Quantile Using One Sketch

Published: 04 August 2023 Publication History

Abstract

1Estimating the quantile of distribution, especially tail distribution, is an interesting topic in data stream models, and has obtained extensive interest from many researchers. In this paper, we propose a novel sketch, namely SketchPolymer to accurately estimate per-item tail quantile. SketchPolymer uses a technique called Early Filtration to filter infrequent items, and another technique called VSS to reduce error. Our experimental results show that the accuracy of SketchPolymer is on average 32.67 times better than state-of-the-art techniques. We also implement our SketchPolymer on P4 and FPGA platforms to verify its deployment flexibility. All our codes are available at GitHub.[1]

Supplementary Material

MP4 File (rtfp0273-2min-promo.mp4)
SketchPolymer presentation video

References

[1]
The source codes related to SketchPolymer. https://github.com/SketchPolymer/ SketchPolymer-code.
[2]
Neal Cardwell, Stefan Savage, and Thomas Anderson. Modeling tcp latency. In Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No. 00CH37064), volume 3, pages 1742--1751. IEEE, 2000.
[3]
Jörg Liebeherr, Almut Burchard, and Florin Ciucu. Delay bounds in communication networks with heavy-tailed and self-similar traffic. IEEE Transactions on Information Theory, 58(2):1010--1024, 2012.
[4]
Pu Wang and Ian F Akyildiz. On the origins of heavy-tailed delay in dynamic spectrum access networks. IEEE Transactions on Mobile Computing, 11(2):204--217, 2011.
[5]
Saehoon Kim, Yuxiong He, Seung-won Hwang, Sameh Elnikety, and Seungjin Choi. Delayed-dynamic-selective (dds) prediction for reducing extreme tail latency in web search. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pages 7--16, 2015.
[6]
Kevin Zhao, Prateesh Goyal, Mohammad Alizadeh, and Thomas E Anderson. Scalable tail latency estimation for data center networks. arXiv preprint arXiv:2205.01234, 2022.
[7]
Jeffrey Dean and Luiz André Barroso. The tail at scale. Communications of the ACM, 56(2):74--80, 2013.
[8]
Joy Rahman and Palden Lama. Predicting the end-to-end tail latency of con- tainerized microservices in the cloud. In 2019 IEEE International Conference on Cloud Engineering (IC2E), pages 200--210. IEEE, 2019.
[9]
Myungjin Lee, Nick Duffield, and Ramana Rao Kompella. Not all microseconds are equal: Fine-grained per-flow measurements with reference latency interpolation. In Proceedings of the ACM SIGCOMM 2010 conference, pages 27--38, 2010.
[10]
Muhammad Shahzad and Alex X Liu. Accurate and efficient per-flow latency measurement without probing and time stamping. IEEE/ACM Transactions on Networking, 24(6):3477--3492, 2016.
[11]
J Ian Munro and Mike S Paterson. Selection and sorting with limited storage. Theoretical computer science, 12(3):315--323, 1980.
[12]
Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G Lindsay. Approximate medians and other quantiles in one pass and with limited memory. ACM SIGMOD Record, 27(2):426--435, 1998.
[13]
Michael Greenwald and Sanjeev Khanna. Space-efficient online computation of quantile summaries. ACM SIGMOD Record, 30(2):58--66, 2001.
[14]
David Felber and Rafail Ostrovsky. A randomized online quantile summary in O (1 over ε log 1 over ε) words. Theory of Computing, 13(1):1--17, 2017.
[15]
Zohar Karnin, Kevin Lang, and Edo Liberty. Optimal quantile approximation in streams. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 71--78, 2016.
[16]
Fuheng Zhao, Sujaya Maiyya, Ryan Wiener, Divyakant Agrawal, and Amr El Abbadi. Kll±approximate quantile sketches over dynamic datasets. Proceedings of the VLDB Endowment, 14(7):1215--1227, 2021.
[17]
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science, pages 137--156. Discrete Mathematics and Theoretical Computer Science, 2007.
[18]
Yang Zhou, Tong Yang, Jie Jiang, Bin Cui, Minlan Yu, Xiaoming Li, and Steve Uhlig. Cold filter: A meta-framework for faster and more accurate stream processing. In Proceedings of the 2018 International Conference on Management of Data, pages 741--756, 2018.
[19]
Kaicheng Yang, Yuanpeng Li, Zirui Liu, Tong Yang, Yu Zhou, Jintao He, Tong Zhao, Zhengyi Jia, Yongqiang Yang, et al. Sketchint: Empowering int with towersketch for per-flow per-switch measurement. In 2021 IEEE 29th International Conference on Network Protocols (ICNP), pages 1--12. IEEE, 2021.
[20]
Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[21]
Cristian Estan and George Varghese. New directions in traffic measurement and accounting. In Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, pages 323--336, 2002.
[22]
Haoyu Li, Qizhi Chen, Yixin Zhang, Tong Yang, and Bin Cui. Stingy sketch: a sketch framework for accurate and fast frequency estimation. Proceedings of the VLDB Endowment, 15(7):1426--1438, 2022.
[23]
Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, and Steve Uhlig. Elastic sketch: Adaptive and fast network-wide measurements. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 561--575, 2018.
[24]
Ge Luo, Lu Wang, Ke Yi, and Graham Cormode. Quantiles over data streams: experimental comparisons, new analyses, and further improvements. The VLDB Journal, 25(4):449--472, 2016.
[25]
Charles Masson, Jee E Rim, and Homin K Lee. Ddsketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. Proceedings of the VLDB Endowment, 12(12).
[26]
Jintao He, Jiaqi Zhu, and Qun Huang. Histsketch: A compact data structure for accurate per-key distribution monitoring. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 2023.
[27]
Reinaldo Boris Arellano-Valle, Heleno Bolfarine, and Victor H Lachos. Skew-normal linear mixed models. Journal of data science, 3(4):415--438, 2005.
[28]
Kai Cheng, Limin Xiang, and Mizuho Iwaihara. Time-decaying bloom filters for data streams with skewed distributions. In 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05), pages 63--69. IEEE, 2005.
[29]
Jing Gao, Wei Fan, Jiawei Han, and Philip S Yu. A general framework for mining concept-drifting data streams with skewed distributions. In Proceedings of the 2007 siam international conference on data mining, pages 3--14. SIAM, 2007.
[30]
Burton H Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[31]
Rana Shahout, Roy Friedman, and Ran Ben Basat. Squad: combining sketching and sampling is better than either for per-item quantile estimation. arXiv preprint arXiv:2201.01958, 2022.
[32]
Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathe- matical Software (TOMS), 11(1):37--57, 1985.
[33]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. Efficient computation of frequent and top-k elements in data streams. In International conference on database theory, pages 398--412. Springer, 2005.
[34]
The source code of Bob Hash. http://burtleburtle.net/bob/hash/evahash.html.
[35]
The CAIDA Anonymized Internet Traces. http://www.caida.org/data/overview/.
[36]
Justin Cappos, Ivan Beschastnikh, Arvind Krishnamurthy, and Tom Anderson. Seattle: a platform for educational cloud computing. In Proceedings of the 40th ACM technical symposium on Computer science education, pages 111--115, 2009.
[37]
Rui Zhu, Bang Liu, Di Niu, Zongpeng Li, and Hong Vicky Zhao. Network latency estimation for personal devices: A matrix completion approach. IEEE/ACM Transactions on Networking, 25(2):724--737, 2016.
[38]
Alemnew Sheferaw Asrese, Steffie Jacob Eravuchira, Vaibhav Bajpai, Pasi Saro- lahti, and Jörg Ott. Measuring web latency and rendering performance: Method, tools & longitudinal dataset. https://doi.org/10.5281/zenodo.2547512, January 2019.
[39]
Alemnew Sheferaw Asrese, Steffie Jacob Eravuchira, Vaibhav Bajpai, Pasi Saro- lahti, and Jörg Ott. Measuring web latency and rendering performance: Method, tools, and longitudinal dataset. IEEE Transactions on Network and Service Management, 16(2):535--549, 2019.
[40]
Barefoot tofino: World's fastest p4-programmable ethernet switch asics. https: //barefootnetworks.com/products/brief-tofino/.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data stream
  2. quantile estimation
  3. sketch
  4. tail quantile estimation

Qualifiers

  • Research-article

Funding Sources

  • Key-Area Research and Development Program of Guangdong Province
  • National Natural Science Foundation of China (NSFC)

Conference

KDD '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)196
  • Downloads (Last 6 weeks)11
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media