skip to main content
research-article

Enabling Adaptive Sampling for Intra-Window Join: Simultaneously Optimizing Quantity and Quality

Published: 30 September 2024 Publication History

Abstract

>Sampling is one of the most widely employed approximations in big data processing. Among various challenges in sampling design, sampling for join is particularly intriguing yet complex. This perplexing problem starts with a classical case where the join of two Bernoulli samples shrinks its output size quadratically and exhibits a strong dependency on the input data, presenting a unique challenge that necessitates adaptive sampling to guarantee both the quantity and quality of the sampled data. The community has made strides in achieving this goal by constructing offline samples and integrating support from indexes or key frequencies. However, when dealing with stream data, due to the need for real-time processing and high-quality analysis, methods developed for processing static data become unavailable. Consequently, a fundamental question arises: Is it possible to achieve adaptive sampling in stream data without relying on offline techniques?
To address this problem, we propose FreeSam, which couples hybrid sampling with intra-window join, a key stream join operator. Our focus lies on two widely used metrics: output size, ensuring quantity, and variance, ensuring quality. FreeSam enables adaptability in both the desired quantity and quality of data sampling by offering control on the two-dimensional space spanned by these metrics. Meanwhile, adjustable trade-offs between quality and performance make FreeSam practical for use. Our experiments show that, for every 1% increase in latency limitation, FreeSam can yield a 3.83% increase in the output size while maintaining the level of the estimator's variance. Additionally, we give FreeSam a multi-core implementation and ensure predictability of its latency through both an analytic model and a neural network model. The accuracy of these models is 88.05% and 96.75% respectively.

References

[1]
2000. Sampling: Design and Analysis. Technometrics (2000).
[2]
2018. Interval Join in Apache Flink. Retrieved March 19, 2022 from https://nightlies.apache.org/flink/flink-docsrelease- 1.14/docs/dev/datastream/operators/joining/
[3]
Daniel J. Abadi, Donald Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stanley B. Zdonik. 2003. Aurora: a new model and architecture for data stream management. The VLDB Journal 12 (2003), 120--139.
[4]
Fatima Abdullah, Limei Peng, and Byungchul Tak. 2021. A Survey of IoT Stream Query Execution Latency Optimization within Edge and Cloud. Wireless Communications and Mobile Computing 2021 (2021), 1--16.
[5]
Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. 1999. Join Synopses for Approximate Query Answering. SIGMOD Rec. 28, 2 (jun 1999), 275--286. https://doi.org/10.1145/304181.304207
[6]
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, and Ion Stoica. 2013. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. In Proceedings of the 8th ACM European Conference on Computer Systems (Prague, Czech Republic) (EuroSys '13). Association for Computing Machinery, New York, NY, USA, 29--42. https://doi.org/10.1145/2465351.2465355
[7]
Charu C. Aggarwal. 2006. On Biased Reservoir Sampling in the Presence of Stream Evolution. In Proceedings of the 32nd International Conference on Very Large Data Bases (Seoul, Korea) (VLDB '06). VLDB Endowment, 607--618.
[8]
Asaad Althoubi, Reem Alshahrani, and Hassan Peyravi. 2021. Delay Analysis in IoT Sensor Networks. Sensors 21, 11 (2021). https://doi.org/10.3390/s21113876
[9]
Jieliang Ang, Tianyuan Fu, Johns Paul, Shuhao Zhang, Bingsheng He, Teddy Wenceslao, and Sien Tan. 2019. TraV: An Interactive Exploration System for Massive Trajectory Data. 309--313. https://doi.org/10.1109/BigMM.2019.000--4
[10]
Albert Atserias, Martin Grohe, and Dániel Marx. 2013. Size Bounds and Query Plans for Relational Joins. SIAM J. Comput. 42, 4 (2013), 1737--1767. https://doi.org/10.1137/110859440 arXiv:https://doi.org/10.1137/110859440
[11]
Brian Babcock, Surajit Chaudhuri, and Gautam Das. 2003. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (San Diego, California) (SIGMOD '03). Association for Computing Machinery, New York, NY, USA, 539--550. https://doi.org/10.1145/872757. 872822
[12]
Cagri Balkesen, Gustavo Alonso, Jens Teubner, and M. Tamer Özsu. 2013. Multi-Core, Main-Memory Joins: Sort svs. Hash Revisited. Proc. VLDB Endow. 7, 1 (sep 2013), 85--96. https://doi.org/10.14778/2732219.2732227
[13]
Paul Beame, Paraschos Koutris, and Dan Suciu. 2014. Skew in Parallel Query Processing. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Snowbird, Utah, USA) (PODS '14). Association for Computing Machinery, New York, NY, USA, 212--223. https://doi.org/10.1145/2594538.2594558
[14]
Spyros Blanas, Yinan Li, and Jignesh M. Patel. 2011. Design and Evaluation of Main Memory Hash Join Algorithms for Multi-Core CPUs. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 37--48. https://doi.org/10.1145/ 1989323.1989328
[15]
Walter Cai, Magdalena Balazinska, and Dan Suciu. 2019. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 18--35. https: //doi.org/10.1145/3299869.3319894
[16]
Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding Frequent Items in Data Streams. In Proceedings of the 29th International Colloquium on Automata, Languages and Programming (ICALP '02). Springer-Verlag, Berlin, Heidelberg, 693--703.
[17]
Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. 1999. On Random Sampling over Joins. SIGMOD Record (ACM Special Interest Group on Management of Data) 28, 2 (1999), 263--273. https://doi.org/10.1145/304181.304206
[18]
Yu Chen and Ke Yi. 2017. Two-Level Sampling for Join Size Estimation. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 759--774. https://doi.org/10.1145/3035918.3035921
[19]
Y. Chen and Ke Yi. 2020. Random Sampling and Size Estimation Over Cyclic Joins. In International Conference on Database Theory.
[20]
Yanqiu Chen, Linjiang Zheng, andWeining Liu. 2020. Performance-Sensitive Data Distribution Method for Distributed Stream Processing Systems. In Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference & 2020 3rd International Conference on Big Data and Artificial Intelligence (Qingdao, China) (HPCCT & BDAI '20). Association for Computing Machinery, New York, NY, USA, 212--217. https://doi.org/10.1145/3409501.3409536
[21]
Yu Cheng, Weijie Zhao, and Florin Rusu. 2017. Bi-level online aggregation on raw data. ACM International Conference Proceeding Series Part F1286 (2017). https://doi.org/10.1145/3085504.3085514
[22]
Jatin Chhugani, Anthony D. Nguyen, Victor W. Lee, William Macy, Mostafa Hagog, Yen-Kuang Chen, Akram Baransi, Sanjeev Kumar, and Pradeep Dubey. 2008. Efficient Implementation of Sorting on Multi-Core SIMD CPU Architecture. Proc. VLDB Endow. 1, 2 (aug 2008), 1313--1324. https://doi.org/10.14778/1454159.1454171
[23]
R. Cirstea, B. Yang, C. Guo, T. Kieu, and S. Pan. 2022. Towards Spatio- Temporal Aware Traffic Time Series Forecasting. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE Computer Society, Los Alamitos, CA, USA, 2900--2913. https://doi.org/10.1109/ICDE53745.2022.00262
[24]
Graham Cormode, Minos Garofalakis, Peter J. Haas, and Chris Jermaine. 2011. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches.
[25]
Graham Cormode, Minos Garofalakis, and Dimitris Sacharidis. 2006. Fast Approximate Wavelet Tracking on Streams. In Advances in Database Technology - EDBT 2006, Yannis Ioannidis, Marc H. Scholl, Joachim W. Schmidt, Florian Matthes, Mike Hatzopoulos, Klemens Boehm, Alfons Kemper, Torsten Grust, and Christian Boehm (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 4--22.
[26]
Graham Cormode and S. Muthukrishnan. 2004. An improved data stream summary: The Count-Min sketch and its applications. J. Algorithms 55 (2004), 29--38.
[27]
James Croft. 2024. Identifying drift in ML models: Best practices for generating consistent, reliable responses. Retrieved April, 2024 from https://techcommunity.microsoft.com/t5/fasttrack-for-azure/identifying-drift-in-ml-models-bestpractices- for-generating/ba-p/4040531
[28]
Abhinandan Das, Johannes Gehrke, and Mirek Riedewald. 2003. Approximate join processing over data streams. (2003), 40. https://doi.org/10.1145/872763.872765
[29]
A. Das, J. Gehrke, and M. Riedewald. 2005. Semantic approximation of data stream joins. IEEE Transactions on Knowledge and Data Engineering 17, 1 (2005), 44--59. https://doi.org/10.1109/TKDE.2005.17
[30]
Shiyuan Deng, Shangqi Lu, and Yufei Tao. 2023. On Join Sampling and the Hardness of Combinatorial Output- Sensitive Join Algorithms. In Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Seattle, WA, USA) (PODS '23). Association for Computing Machinery, New York, NY, USA, 99--111. https://doi.org/10.1145/3584372.3588666
[31]
Jens-Peter Dittrich, Bernhard Seeger, David Scot Taylor, and Peter Widmayer. 2002. Progressive Merge Join: A Generic and Non-Blocking Sort-Based Join Algorithm. In Proceedings of the 28th International Conference on Very Large Data Bases (Hong Kong, China) (VLDB '02). VLDB Endowment, 299--310.
[32]
Marina Drosou and Evaggelia Pitoura. 2010. Search Result Diversification. SIGMOD Rec. 39, 1 (sep 2010), 41--47. https://doi.org/10.1145/1860702.1860709
[33]
Pavlos Efraimidis. 2010. Weighted Random Sampling over Data Streams. (12 2010). https://doi.org/10.1007/978--3- 319--24024--4_12
[34]
Pavlos S. Efraimidis and Paul G. Spirakis. 2006. Weighted random sampling with a reservoir. Inform. Process. Lett. 97, 5 (2006), 181--185. https://doi.org/10.1016/j.ipl.2005.11.003
[35]
Mohammed Elseidy, Abdallah Elguindy, Aleksandar Vitorovic, and Christoph Koch. 2014. Scalable and Adaptive Online Joins. Proc. VLDB Endow. 7, 6 (feb 2014), 441--452. https://doi.org/10.14778/2732279.2732281
[36]
C. Estan and J.F. Naughton. 2006. End-biased Samples for Join Cardinality Estimation. In 22nd International Conference on Data Engineering (ICDE'06). 20--20. https://doi.org/10.1109/ICDE.2006.61
[37]
Wenfei Fan, Ziyan Han, Yaoshu Wang, and Min Xie. 2022. Parallel Rule Discovery from Large Datasets by Sampling. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 384--398. https://doi.org/10.1145/3514221.3526165
[38]
Raphaël Féraud, Fabrice Clérot, and Pascal Gouzien. 2010. Sampling the Join of Streams. In Classification as a Tool for Research, Hermann Locarek-Junge and Claus Weihs (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 307--314.
[39]
Flink. 2022. "Package org.apache.flink.api.java.sampling". https://nightlies.apache.org/flink/flink-docs-master/api/ java/org/apache/flink/api/java/sampling/package-summary.html
[40]
Sumit Ganguly, Phillip B. Gibbons, Yossi Matias, and Avi Silberschatz. 1996. Bifocal Sampling for Skew-Resistant Join Size Estimation. SIGMOD Rec. 25, 2 (jun 1996), 271--281. https://doi.org/10.1145/235968.233340
[41]
Rajesh R. Bordawekar, and Philip S. Yu. 2009. CellJoin: A Parallel Stream Join Operator for the Cell Processor. The VLDB Journal 18, 2 (apr 2009), 501--519. https://doi.org/10.1007/s00778-008-0116-z
[42]
Bugra Gedik, Kun-Lung Wu, Philip S. Yu, and Ling Liu. 2007. GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding. IEEE Transactions on Knowledge and Data Engineering 19, 10 (2007), 1363--1380. https://doi.org/10.1109/TKDE.2007.190630
[43]
Bugra Gedik, Kun-Lung Wu, Philip S. Yu, and Ling Liu. 2007. A Load Shedding Framework and Optimizations for M-way Windowed Stream Joins. In 2007 IEEE 23rd International Conference on Data Engineering. 536--545. https: //doi.org/10.1109/ICDE.2007.367899
[44]
Thanasis Georgiadis and Nikos Mamoulis. 2023. Raster Intervals: An Approximation Technique for Polygon Intersection Joins. Proc. ACM Manag. Data 1, 1, Article 36 (may 2023), 18 pages. https://doi.org/10.1145/3588716
[45]
Lukasz Golab, Shaveen Garg, and M Tamer Özsu. 2004. On indexing sliding windows over online data streams. In International Conference on Extending Database Technology. Springer, 712--729.
[46]
Xiangyang Gou, Long He, Yinda Zhang, Ke Wang, Xilai Liu, Tong Yang, Yi Wang, and Bin Cui. 2020. Sliding Sketches: A Framework Using Time Zones for Data Stream Processing in Sliding Windows. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD '20). Association for Computing Machinery, New York, NY, USA, 1015--1025. https://doi.org/10.1145/3394486.3403144
[47]
Rong Gu, Han Li, Haipeng Dai, Wenjie Huang, Jie Xue, Meng Li, Jiaqi Zheng, Haoran Cai, Yihua Huang, and Guihai Chen. 2023. ShadowAQP: Efficient Approximate Group-by and Join Query via Attribute-Oriented Sample Size Allocation and Data Generation. Proc. VLDB Endow. 16, 13 (sep 2023), 4216--4229. https://doi.org/10.14778/3625054.3625059
[48]
Sudipto Guha and Boulos Harb. 2005. Wavelet Synopsis for Data Streams: Minimizing Non-Euclidean Error. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (Chicago, Illinois, USA) (KDD '05). Association for Computing Machinery, New York, NY, USA, 88--97. https://doi.org/10.1145/1081870.1081884
[49]
Sudipto Guha, Nick Koudas, and Kyuseok Shim. 2006. Approximation and Streaming Algorithms for Histogram Construction Problems. ACM Trans. Database Syst. 31, 1 (mar 2006), 396--438. https://doi.org/10.1145/1132863.1132873
[50]
Vincenzo Gulisano, Zbigniew Jerzak, Spyros Voulgaris, and Holger Ziekow. 2016. The DEBS 2016 Grand Challenge. In Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems (Irvine, California) (DEBS '16). Association for Computing Machinery, New York, NY, USA, 289--292. https://doi.org/10.1145/2933267.2933519
[51]
Peter J. Haas. 2016. Data-Stream Sampling: Basic Techniques and Results. Springer Berlin Heidelberg, Berlin, Heidelberg, 13--44. https://doi.org/10.1007/978--3--540--28608-0_2
[52]
Peter J. Haas and Joseph M. Hellerstein. 1999. Ripple Joins for Online Aggregation. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (Philadelphia, Pennsylvania, USA) (SIGMOD '99). Association for Computing Machinery, New York, NY, USA, 287--298. https://doi.org/10.1145/304182.304208
[53]
Peter J. Haas and Christian König. 2004. A Bi-Level Bernoulli Scheme for Database Sampling. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (Paris, France) (SIGMOD '04). Association for Computing Machinery, New York, NY, USA, 275--286. https://doi.org/10.1145/1007568.1007601
[54]
Carole J. Hahn, Stephen G. Warren, and Julius London. 1996. Edited Synoptic Cloud Reports from Ships and Land Stations Over the Globe, 1982--1991 (NDP-026B). (1 1996). https://doi.org/10.3334/CDIAC/cli.ndp026b
[55]
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. Proc. VLDB Endow. 15, 4 (dec 2021), 752--765. https://doi.org/10.14778/ 3503585.3503586
[56]
Gilseok HONG, Seonghyeon KANG, Chang soo KIM, and Jun-Ki MIN. 2018. Efficient Parallel Join Processing Exploiting SIMD in Multi-Thread Environments. IEICE Transactions on Information and Systems E101.D, 3 (2018), 659--667. https://doi.org/10.1587/transinf.2017EDP7300
[57]
D. G. Horvitz and D. J. Thompson. 1952. A Generalization of Sampling Without Replacement from a Finite Universe. J. Amer. Statist. Assoc. 47, 260 (1952), 663--685. https://doi.org/10.1080/01621459.1952.10483446
[58]
Xiao Hu, Yufei Tao, and Ke Yi. 2017. Output-Optimal Parallel Algorithms for Similarity Joins. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (Chicago, Illinois, USA) (PODS '17). Association for Computing Machinery, New York, NY, USA, 79--90. https://doi.org/10.1145/3034786.3056110
[59]
Dawei Huang, Dong Young Yoon, Seth Pettie, and Barzan Mozafari. 2019. Joins on Samples: A Theoretical Guide for Practitioners. Proc. VLDB Endow. 13, 4 (dec 2019), 547--560. https://doi.org/10.14778/3372716.3372726
[60]
Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Zhuohao Yu, Tieying Zhang, Hong Chen, and Cuiping Li. 2024. LLMTune: Accelerate Database Knob Tuning with Large Language Models. arXiv preprint arXiv:2404.11581 (2024).
[61]
Janardan and Shikha Mehta. 2017. Concept drift in Streaming Data Classification: Algorithms, Platforms and Issues. Procedia Computer Science 122 (2017), 804--811. https://doi.org/10.1016/j.procs.2017.11.440 5th International Conference on Information Technology and Quantitative Management, ITQM 2017.
[62]
Christopher Jermaine, Subramanian Arumugam, Abhijit Pol, and Alin Dobra. 2007. Scalable approximate query processing with the DBO engine. Proceedings of the ACM SIGMOD International Conference on Management of Data (2007), 725--736. https://doi.org/10.1145/1247480.1247560
[63]
Yuanzhen Ji, Jun Sun, Anisoara Nica, Zbigniew Jerzak, Gregor Hackenbroich, and Christof Fetzer. 2016. Qualitydriven disorder handling for m-way sliding window stream joins. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). 493--504. https://doi.org/10.1109/ICDE.2016.7498265
[64]
Wei Jiang, Liu-Gen Xu, Hai-Bo Hu, and Yue Ma. 2019. Improvement design for distributed real-time stream processing systems. Journal of Electronic Science and Technology 17, 1 (2019), 3--12.
[65]
Ming Jin, Yu Zheng, Yuan-Fang Li, Siheng Chen, Bin Yang, and Shirui Pan. 2023. Multivariate Time Series Forecasting With Dynamic Graph Neural ODEs. IEEE Transactions on Knowledge and Data Engineering 35, 9 (2023), 9168--9180. https://doi.org/10.1109/TKDE.2022.3221989
[66]
Theodore Johnson, Shanmugavelayutham Muthukrishnan, and Irina Rozenbaum. 2005. Sampling algorithms in a stream operator. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 1--12.
[67]
Evangelia Kalyvianaki, Wolfram Wiesemann, Quang Hieu Vu, Daniel Kuhn, and Peter Pietzuch. 2011. SQPR: Stream query planning with reuse. In 2011 IEEE 27th International Conference on Data Engineering. 840--851. https://doi.org/10.1109/ICDE.2011.576785
[68]
Srikanth Kandula, Anil Shanbhag, Aleksandar Vitorovic, Matthaios Olma, Robert Grandl, Surajit Chaudhuri, and Bolin Ding. 2016. Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 631--646. https://doi.org/10.1145/2882903.2882940
[69]
Jaewoo Kang, Jeffrey F Naughton, and Stratis D Viglas. 2003. Evaluating window joins over unbounded streams. In Proceedings 19th International Conference on Data Engineering (Cat. No. 03CH37405). IEEE, 341--352.
[70]
Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, and Volker Markl. 2018. Benchmarking Distributed Stream Data Processing Systems. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 1507--1518. https://doi.org/10.1109/ICDE.2018.00169
[71]
Martin Kiefer, Max Heimel, Sebastian Breß, and Volker Markl. 2017. Estimating Join Selectivities Using Bandwidth- Optimized Kernel Density Models. Proc. VLDB Endow. 10, 13 (sep 2017), 2085--2096. https://doi.org/10.14778/3151106. 3151112
[72]
Changkyu Kim, Tim Kaldewey, Victor W. Lee, Eric Sedlar, Anthony D. Nguyen, Nadathur Satish, Jatin Chhugani, Andrea Di Blas, and Pradeep Dubey. 2009. Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs. Proc. VLDB Endow. 2, 2 (aug 2009), 1378--1389. https://doi.org/10.14778/1687553.1687564
[73]
Ilya Kolchinsky and Assaf Schuster. 2019. Real-Time Multi-Pattern Detection over Event Streams. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 589--606. https://doi.org/10.1145/3299869.3319869
[74]
Hai Lan, Zhifeng Bao, and Yuwei Peng. 2021. A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration. Data Science and Engineering 6 (2021), 86--101.
[75]
Feifei Li, BinWu, Ke Yi, and Zhuoyue Zhao. 2016. Wander Join: Online Aggregation via RandomWalks. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 615--629. https://doi.org/10.1145/2882903.2915235
[76]
F. Li, B. Wu, K. Yi, and Z. Zhao. 2017. Wander Join and XDB: Online Aggregation via Random Walks. Acm Sigmod Record 46, 1 (2017), 33--40.
[77]
Yanying Li, Haipei Sun, Boxiang Dong, and Hui (Wendy) Wang. 2018. Cost-Efficient Data Acquisition on Online Data Marketplaces for Correlation Analysis. Proc. VLDB Endow. 12, 4 (dec 2018), 362--375. https://doi.org/10.14778/ 3297753.3297757
[78]
Qian Lin, Beng Chin Ooi, ZhengkuiWang, and Cui Yu. 2015. Scalable Distributed Stream Join Processing. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 811--825. https://doi.org/10.1145/2723372.2746485
[79]
R. J. Lipton and J. F. Naughton. 1990. Query size estimation by adaptive sampling (extended abstract). Journal of Computer & System ences 51, 1 (1990), 18--25.
[80]
Jiesong Liu, Feng Zhang, Lv Lu, Chang Qi, Xiaoguang Guo, Dong Deng, Guoliang Li, Huanchen Zhang, Jidong Zhai, Hechen Zhang, Yuxing Chen, Anqun Pan, and Xiaoyong Du. 2024. G-Learned Index: Enabling Efficient Learned Index on GPU. IEEE Transactions on Parallel and Distributed Systems 35, 6 (2024), 950--967. https://doi.org/10.1109/TPDS. 2024.3381214
[81]
Tianyu Liu and Chi Wang. 2020. Understanding the hardness of approximate query processing with joins. arXiv:2010.00307 [cs.DB]
[82]
Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl. 2022. Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 1017--1032. https://doi.org/10.1145/3514221.3517911
[83]
Riccardo Mancini, Srinivas Karthik, Bikash Chandra, Vasilis Mageirakos, and Anastasia Ailamaki. 2022. Efficient Massively Parallel Join Optimization for Large Queries. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 122--135. https://doi.org/10.1145/3514221.3517871
[84]
Wanli Min and LauraWynter. 2011. Real-time road traffic prediction with spatio-temporal correlations. Transportation Research Part C: Emerging Technologies 19, 4 (2011), 606--616. https://doi.org/10.1016/j.trc.2010.10.002
[85]
J. Misra and David Gries. 1982. Finding repeated elements. Science of Computer Programming 2, 2 (1982), 143--152. https://doi.org/10.1016/0167--6423(82)90012-0
[86]
Aloysius K. Mok, Honguk Woo, and Chan-Gun Lee. 2006. Probabilistic Timing Join over Uncertain Event Streams. In 12th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'06). 17--26. https://doi.org/10.1109/RTCSA.2006.52
[87]
Barzan Mozafari. 2017. Approximate Query Engines: Commercial Challenges and Research Opportunities. In Proceedings of the 2017 ACM International Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD '17). Association for Computing Machinery, New York, NY, USA, 521--524. https://doi.org/10.1145/3035918.3056098
[88]
Mohammadreza Najafi, Mohammad Sadoghi, and Hans-Arno Jacobsen. 2016. SplitJoin: A Scalable, Low-Latency Stream Join Architecture with Adjustable Ordering Precision. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (Denver, CO, USA) (USENIX ATC '16). USENIX Association, USA, 493--505.
[89]
Creator of the Angry Birds game. 2022. Flink. Retrieved March 19, 2022 from https://nightlies.apache.org/flink/flinkdocs- release-1.14/
[90]
Adegoke Ojewole, Qiang Zhu, and Wen-Chi Hou. 2006. Window Join Approximation over Data Streams with Importance Semantics. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (Arlington, Virginia, USA) (CIKM '06). Association for Computing Machinery, New York, NY, USA, 112--121. https://doi.org/10.1145/1183614.1183635
[91]
Frank Olken. 1993. Random sampling from databases. thesis UC Berkeley (1993), 172. http://www.cs.washington.edu/ education/courses/cse590q/05au/papers/Olken-Sampling.pdf
[92]
Margaret A. Palmer, Christine C. Hakenkamp, and Kären Nelson-Baker. 1997. Ecological Heterogeneity in Streams: Why Variance Matters. Journal of the North American Benthological Society 16, 1 (1997), 189--202. http://www.jstor. org/stable/1468251
[93]
Niketan Pansare, Vinayak Borkar, Chris Jermaine, and Tyson Condie. 2011. Online Aggregation for Large MapReduce Jobs. Proc. VLDB Endow. 4, 11 (aug 2011), 1135--1145. https://doi.org/10.14778/3402707.3402748
[94]
Jinglin Peng, Bolin Ding, JiannanWang, Kai Zeng, and Jingren Zhou. 2022. One Size Does Not Fit All: A Bandit-Based Sampler Combination Framework with Theoretical Guarantees. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 531--544. https://doi.org/10.1145/3514221.3517900
[95]
Vibhor Porwal, Subrata Mitra, Fan Du, John Anderson, Nikhil Sheoran, Anup Rao, Tung Mai, Gautam Kowshik, Sapthotharan Nair, Sameeksha Arora, and Saurabh Mahapatra. 2022. Efficient Insights Discovery through Conditional Generative Model Based Query Approximation. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 2397--2400. https://doi.org/10.1145/3514221.3520161
[96]
Navneet Potti and Jignesh M. Patel. 2015. DAQ: A New Paradigm for Approximate Query Processing. Proc. VLDB Endow. 8, 9 (may 2015), 898--909. https://doi.org/10.14778/2777598.2777599
[97]
Xu Qian, Yang Juan, Zhang Feng, Chen Zheng, Guan Jiawei, Chen Kang, Fan Ju, Shen Youren, Yang Ke, Zhang Yu, and Du Xiaoyong. 2024. Improving Graph Compression for Efficient Resource-Constrained Graph Analytics. PVLDB (2024). https://doi.org/10.14778/3665844.3665852
[98]
Yuan Qiu, Serafeim Papadias, and Ke Yi. 2019. Streaming HyperCube: A Massively Parallel Stream Join Algorithm. In Advances in Database Technology - 22nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal, March 26--29, 2019, Melanie Herschel, Helena Galhardas, Berthold Reinwald, Irini Fundulaki, Carsten Binnig, and Zoi Kaoudi (Eds.). OpenProceedings.org, 642--645. https://doi.org/10.5441/002/edbt.2019.76
[99]
Do Le Quoc, Ruichuan Chen, Pramod Bhatotia, Christof Fetzer, Volker Hilt, and Thorsten Strufe. 2017. StreamApprox: Approximate Computing for Stream Analytics. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (Las Vegas, Nevada) (Middleware '17). Association for Computing Machinery, New York, NY, USA, 185--197. https: //doi.org/10.1145/3135974.3135989
[100]
Chuitian Rong, Chunbin Lin, Yasin N. Silva, Jianguo Wang, Wei Lu, and Xiaoyong Du. 2017. Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 1059--1070. https://doi.org/10.1109/ICDE.2017.151
[101]
Pratanu Roy, Arijit Khan, and Gustavo Alonso. 2016. Augmented Sketch: Faster and More Accurate Stream Processing. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1449--1463. https://doi.org/10.1145/2882903.2882948
[102]
Pratanu Roy, Jens Teubner, and Rainer Gemulla. 2014. Low-Latency Handshake Join. Proc. VLDB Endow. 7, 9 (may 2014), 709--720. https://doi.org/10.14778/2732939.2732944
[103]
Florin Rusu and Alin Dobra. 2008. Sketches for Size of Join Estimation. ACM Trans. Database Syst. 33, 3, Article 15 (sep 2008), 46 pages. https://doi.org/10.1145/1386118.1386121
[104]
Viktor Sanca and Anastasia Ailamaki. 2022. Sampling-Based AQP in Modern Analytical Engines. In Data Management on New Hardware (Philadelphia, PA, USA) (DaMoN'22). Association for Computing Machinery, New York, NY, USA, Article 4, 8 pages. https://doi.org/10.1145/3533737.3535095
[105]
Viktor Sanca, Periklis Chrysogelos, and Anastasia Ailamaki. 2023. LAQy: Efficient and Reusable Query Approximations via Lazy Sampling. Proc. ACM Manag. Data 1, 2, Article 174 (jun 2023), 26 pages. https://doi.org/10.1145/3589319
[106]
Majid Rostami Shahrbabaki, Ali Akbar Safavi, Markos Papageorgiou, and Ioannis Papamichail. 2018. A data fusion approach for real-time traffic state estimation in urban signalized links. Transportation Research Part C: Emerging Technologies 92 (2018), 525--548. https://doi.org/10.1016/j.trc.2018.05.020
[107]
Ali Mohammadi Shanghooshabad, Meghdad Kurmanji, Qingzhi Ma, Michael Shekelyan, Mehrdad Almasi, and Peter Triantafillou. 2021. PGMJoins: Random Join Sampling with Graphical Models. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 1610--1622. https: //doi.org/10.1145/3448016.3457302
[108]
Michael Shekelyan, Graham Cormode, Peter Triantafillou, Ali Mohammadi Shanghooshabad, and Qingzhi Ma. 2022. Weighted Random Sampling over Joins. CoRR abs/2201.02670 (2022). arXiv:2201.02670 https://arxiv.org/abs/2201.02670
[109]
Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin Varshney, Tung Mai, Anup Rao, and Vikas Maddukuri. 2022. Conditional Generative Model Based Predicate-Aware Query Approximation. Proceedings of the AAAI Conference on Artificial Intelligence 36, 8 (Jun. 2022), 8259--8266. https://doi.org/10.1609/aaai.v36i8.20800
[110]
Spark. 2022. "Java API for random utilities in Spark ". https://spark.apache.org/docs/3.4.0/api/java/org/apache/spark/ util/random/
[111]
Utkarsh Srivastava, Kamesh Munagala, and Jennifer Widom. 2005. Operator Placement for In-Network Stream Query Processing. In Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (Baltimore, Maryland) (PODS '05). Association for Computing Machinery, New York, NY, USA, 250--258. https://doi.org/10.1145/1065167.1065199
[112]
Utkarsh Srivastava and Jennifer Widom. 2004. Memory-Limited Execution of Windowed Stream Joins. In Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30 (Toronto, Canada) (VLDB '04). VLDB Endowment, 324--335.
[113]
Michal Startek. 2016. An asymptotically optimal, online algorithm for weighted random sampling with replacement. ArXiv abs/1611.00532 (2016).
[114]
GYÖRGY STEINBRECHER and WILLIAM T. SHAW. 2008. Quantile mechanics. European Journal of Applied Mathematics 19, 2 (2008), 87--112. https://doi.org/10.1017/S0956792508007341
[115]
Yufei Tao, Xiang Lian, Dimitris Papadias, and Marios Hadjieleftheriou. 2007. Random Sampling for Continuous Streams with Arbitrary Updates. IEEE Transactions on Knowledge and Data Engineering 19, 1 (2007), 96--110. https: //doi.org/10.1109/TKDE.2007.250588
[116]
Jens Teubner and Rene Mueller. 2011. How Soccer Players Would Do Stream Joins. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 625--636. https://doi.org/10.1145/1989323.1989389
[117]
Lasse Thostrup, Gloria Doci, Nils Boeschen, Manisha Luthra, and Carsten Binnig. 2023. Distributed GPU Joins on Fast RDMA-Capable Networks. Proc. ACM Manag. Data 1, 1, Article 29 (may 2023), 26 pages. https://doi.org/10.1145/ 3588709
[118]
Daniel Ting. 2022. Adaptive Threshold Sampling. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 1612--1625. https://doi.org/10.1145/3514221.3526122
[119]
Daniel Ting and Rick Cole. 2021. Conditional Cuckoo Filters. Association for Computing Machinery, New York, NY, USA, 1838--1850. https://doi.org/10.1145/3448016.3452811
[120]
Wee Hyong Tok, Stéphane Bressan, and Mong-Li Lee. 2008. A Stratified Approach to Progressive Approximate Joins. In Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology (Nantes, France) (EDBT '08). Association for Computing Machinery, New York, NY, USA, 582--593. https://doi.org/10.1145/1353343.1353414
[121]
Jonas Traub, Philipp Marian Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl. 2019. Efficient Window Aggregation with General Stream Slicing. In International Conference on Extending Database Technology.
[122]
David Vengerov, Andre Cavalheiro Menck, Mohamed Zait, and Sunil P. Chakkappen. 2015. Join Size Estimation Subject to Filter Conditions. Proc. VLDB Endow. 8, 12 (aug 2015), 1530--1541. https://doi.org/10.14778/2824032.2824051
[123]
Jeffrey S. Vitter. 1985. Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11, 1 (mar 1985), 37--57. https://doi.org/10.1145/3147.3165
[124]
Martha Vlachou-Konchylaki. 2016. Efficient Data Stream Sampling on Apache Flink.
[125]
Feiyu Wang, Qizhi Chen, Yuanpeng Li, Tong Yang, Yaofeng Tu, Lian Yu, and Bin Cui. 2023. JoinSketch: A Sketch Algorithm for Accurate and Unbiased Inner-Product Estimation. Proc. ACM Manag. Data 1, 1, Article 81 (may 2023), 26 pages. https://doi.org/10.1145/3588935
[126]
Y.Wang, A. Khan, X. Xu, J. Jin, Q. Hong, and T. Fu. 2022. Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE Computer Society, Los Alamitos, CA, USA, 2914--2927. https://doi.org/10.1109/ICDE53745.2022.00263
[127]
Xiaohui Wei, Yuanyuan Liu, Xingwang Wang, Bingyi Sun, Shang Gao, and Jon Rokne. 2019. A survey on qualityassurance approximate stream processing and applications. Future Generation Computer Systems 101 (2019), 1062--1080. https://doi.org/10.1016/j.future.2019.07.047
[128]
A.N. Wilschut and P.M.G. Apers. 1991. Dataflow query execution in a parallel main-memory environment. In
[129]
Proceedings of the First International Conference on Parallel and Distributed Information Systems. 68--77. https: //doi.org/10.1109/PDIS.1991.183069
[130]
Xinle Wu, Dalin Zhang, Miao Zhang, Chenjuan Guo, Bin Yang, and Christian S. Jensen. 2023. AutoCTS: Joint Neural Architecture and Hyperparameter Search for Correlated Time Series Forecasting. Proc. ACM Manag. Data 1, 1, Article 97 (may 2023), 26 pages. https://doi.org/10.1145/3588951
[131]
Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, and Rahul Potharaju. 2021. Move fast and meet deadlines: Fine-grained real-time stream processing with cameo. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 389--405.
[132]
Yu Ya-xin, Yang Xing-hua, Yu Ge, and Wu Shan-shan. 2006. An indexed non-equijoin algorithm based on sliding windows over data streams. Wuhan University Journal of Natural Sciences 11, 1 (2006), 294--298.
[133]
Zongheng Yang and Chenggang Wu. "2019". "Github repository: naru project". https://github.com/naru-project/naru
[134]
Alpaslan Yarar. 2014. A HybridWavelet and Neuro-Fuzzy Model for Forecasting the Monthly Streamflow Data. Water Resources Management 28 (2014), 553--565.
[135]
Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova, Nurjahan Begum, Yifei Ding, Hoang Anh Dau, Zachary Zimmerman, Diego Furtado Silva, Abdullah Mueen, and Eamonn Keogh. 2018. Time Series Joins, Motifs, Discords and Shapelets: A Unifying View That Exploits the Matrix Profile. Data Min. Knowl. Discov. 32, 1 (jan 2018), 83--123. https://doi.org/10.1007/s10618-017-0519--9
[136]
Kai Zeng, Shi Gao, Barzan Mozafari, and Carlo Zaniolo. 2014. The Analytical Bootstrap: A New Method for Fast Error Estimation in Approximate Query Processing. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (Snowbird, Utah, USA) (SIGMOD '14). Association for Computing Machinery, New York, NY, USA, 277--288. https://doi.org/10.1145/2588555.2588579
[137]
Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2022. POCLib: A High-Performance Framework for Enabling Near Orthogonal Processing on Compression. IEEE Transactions on Parallel and Distributed Systems 33, 2 (2022), 459--475. https://doi.org/10.1109/TPDS.2021.3093234
[138]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 323--336. https://doi.org/10.1145/3183713.3196931
[139]
Jiaoyi Zhang, Kai Su, and Huanchen Zhang. 2024. Making In-Memory Learned Indexes Efficient on Disk. Proc. ACM Manag. Data 2, 3, Article 151 (may 2024), 26 pages. https://doi.org/10.1145/3654954
[140]
Jiayao Zhang, Qiheng Sun, Jinfei Liu, Li Xiong, Jian Pei, and Kui Ren. 2023. Efficient Sampling Approaches to Shapley Value Approximation. Proc. ACM Manag. Data 1, 1, Article 48 (may 2023), 24 pages. https://doi.org/10.1145/3588728
[141]
Shuhao Zhang, Yancan Mao, Jiong He, Philipp M. Grulich, Steffen Zeuch, Bingsheng He, Richard T. B. Ma, and Volker Markl. 2021. Parallelizing Intra-Window Join on Multicores: An Experimental Study. Association for Computing Machinery, New York, NY, USA, 2089--2101. https://doi.org/10.1145/3448016.3452793
[142]
Shuhao Zhang, Feng Zhang, YingjunWu, Bingsheng He, and Paul Johns. 2020. Hardware-Conscious Stream Processing: A Survey. SIGMOD Rec. 48, 4 (feb 2020), 18--29. https://doi.org/10.1145/3385658.3385662
[143]
Xiaojian Zhang, Wanchang Jiang, Yadong Zhang, and Cong Huo. 2007. Clustering-Variable-Width Histogram Based Window Semi-hash Multi-join over Streams. In 2007 International Conference on Convergence Information Technology (ICCIT 2007). 850--853. https://doi.org/10.1109/ICCIT.2007.246
[144]
Bo Zhao, Han van der Aa, Thanh Tam Nguyen, Quoc Viet Hung Nguyen, and Matthias Weidlich. 2021. EIRES: Efficient Integration of Remote Data in Event Stream Processing. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD '21). Association for Computing Machinery, New York, NY, USA, 2128--2141. https://doi.org/10.1145/3448016.3457304
[145]
Zhuoyue Zhao, Robert Christensen, Feifei Li, Xiao Hu, and Ke Yi. 2018. Random Sampling over Joins Revisited. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 1525--1539. https://doi.org/10.1145/3183713.3183739

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 4
SIGMOD
September 2024
458 pages
EISSN:2836-6573
DOI:10.1145/3698442
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2024
Published in PACMMOD Volume 2, Issue 4

Permissions

Request permissions for this article.

Author Tags

  1. approximate join processing
  2. data stream
  3. stream sampling

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 90
    Total Downloads
  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)32
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media