skip to main content
10.1145/3459637.3482435acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

HASTE: A Distributed System for Hybrid and Adaptive Processing on Streaming Spatial-Textual Data

Published: 30 October 2021 Publication History

Abstract

Streaming spatial-textual data that contains geographic and textual information, e.g., geo-tagged tweets, has an unprecedented increase in amount. As one of the basic operations, the continuous spatial-textual queries that retrieve real-time results continuously on large-scale spatial-textual streams call for means of efficient distributed processing. However, existing proposals either are spatialaware only, or superficially exploit textual information for pruning. We propose a distributed system, called HASTE, for hybrid and adaptive processing on streaming spatial-textual data. The novelty lies on three aspects: (1) We propose a novel method to reduce the workload beforehand by dividing objects and queries into mutually exclusive types; (2) We develop a novel load partitioning strategy and a novel cost model that consider both spatial and textual properties; (3) We design a multi-level load adjustment strategy that adaptively copes with different degrees of load imbalance. We report on extensive experiments with real-world data that offer insight into the performance of the solution, and show that the solution is capable of outperforming the state-of-the-art proposals.

Supplementary Material

MP4 File (cikm-rgfp1588.mp4)
Presentation video of HASTE: A Distributed System for Hybrid and Adaptive Processing on Streaming Spatial-Textual Data. Streaming spatial-textual data, e.g., geo-tagged tweets, has an unprecedented increase in amount. As one of the basic operations, the continuous queries that retrieve real-time results continuously on large-scale spatial-textual streams call for means of efficient distributed processing. However, existing proposals are superficially exploit textual information for pruning. We propose HASTE, a distributed system for hybrid and adaptive processing on streaming spatial-textual data. We propose a novel method to reduce the workload beforehand. And We develop a novel load partitioning strategy and a novel cost model that consider both spatial and textual properties. In addition, we design an adaptive load adjustment strategy. We report on extensive experiments with real-world data and results show that the solution is capable of outperforming the state-of-the-art proposals.

References

[1]
2021. Apache Flink. (2021). https://flink.apache.org/.
[2]
Ahmed S. Abdelhamid, MingJie Tang, Ahmed M. Aly, Ahmed R. Mahmood, Thamir Qadah, Walid G. Aref, and Saleh M. Basalamah. 2016. Cruncher: Distributed in-memory processing for location-based services. In ICDE. IEEE Computer Society, 1406--1409.
[3]
Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel H. Saltz. 2013. Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce. Proc. VLDB Endow. 6, 11 (2013), 1009--1020.
[4]
Ahmed M. Aly, Ahmed R. Mahmood, Mohamed S. Hassan, Walid G. Aref, Mourad Ouzzani, Hazem Elmeleegy, and Thamir Qadah. 2015. AQWA: Adaptive QueryWorkload-Aware Partitioning of Big Spatial Data. Proc. VLDB Endow. 8, 13 (2015), 2062--2073.
[5]
Hanhua Chen, Fan Zhang, and Hai Jin. 2020. PStream: a Popularity-aware Differentiated Distributed Stream Processing System. IEEE Trans. Comput. (2020).
[6]
Lisi Chen, Gao Cong, and Xin Cao. 2013. An efficient query indexing mechanism for filtering geo-textual data. In SIGMOD Conference. ACM, 749--760.
[7]
Lisi Chen, Gao Cong, Xin Cao, and Kian-Lee Tan. 2015. Temporal Spatial-Keyword Top-k publish/subscribe. In ICDE. IEEE Computer Society, 255--266.
[8]
Yue Chen, Zhida Chen, Gao Cong, Ahmed R. Mahmood, and Walid G. Aref. 2020. SSTD: A Distributed System on Streaming Spatio-Textual Data. Proc. VLDB Endow. 13, 11 (2020), 2284--2296.
[9]
Zhida Chen, Gao Cong, Zhenjie Zhang, Tom Z. J. Fu, and Lisi Chen. 2017. Distributed Publish/Subscribe Query Processing on the Spatio-Textual Data Stream. In ICDE. IEEE Computer Society, 1095--1106.
[10]
Thu-Lan Dam, Sean Chester, Kjetil Nørvåg, and Quang-Huy Duong. 2021. Efficient top-k recently-frequent term querying over spatio-temporal textual streams. Inf. Syst. 97 (2021), 101687. https://doi.org/10.1016/j.is.2020.101687
[11]
Ahmed Eldawy and Mohamed F. Mokbel. 2015. SpatialHadoop: A MapReduce framework for spatial data. In ICDE. IEEE Computer Society, 1352--1363.
[12]
Huiqi Hu, Yiqun Liu, Guoliang Li, Jianhua Feng, and Kian-Lee Tan. 2015. A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In ICDE. IEEE Computer Society, 711--722.
[13]
Roy Jonker and A. Volgenant. 1987. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38, 4 (1987), 325--340. https://doi.org/10.1007/BF02278710
[14]
Narendra Karmarkar and Richard M. Karp. 1982. An Efficient Approximation Scheme for the One-Dimensional Bin-Packing Problem. In FOCS. IEEE Computer Society, 312--320.
[15]
Richard E. Korf. 2009. Multi-Way Number Partitioning. In IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11--17, 2009, Craig Boutilier (Ed.). 538--543. http://ijcai.org/Proceedings/09/Papers/096.pdf
[16]
Harold W. Kuhn. 2010. The Hungarian Method for the Assignment Problem. In 50 Years of Integer Programming 1958--2008 - From the Early Years to the State-of-theArt, Michael Jünger, Thomas M. Liebling, Denis Naddef, George L. Nemhauser, William R. Pulleyblank, Gerhard Reinelt, Giovanni Rinaldi, and Laurence A. Wolsey (Eds.). Springer, 29--47. https://doi.org/10.1007/978--3--540--68279-0_2
[17]
Guoliang Li, Yang Wang, Ting Wang, and Jianhua Feng. 2013. Location-aware publish/subscribe. In KDD. ACM, 802--810.
[18]
Jiamin Lu and Ralf Hartmut Güting. 2012. Parallel Secondo: Boosting Database Engines with Hadoop. In ICPADS. IEEE Computer Society, 738--743.
[19]
Ahmed R. Mahmood, Ahmed M. Aly, and Walid G. Aref. 2018. FAST: FrequencyAware Indexing for Spatio-Textual Data Streams. In ICDE. IEEE Computer Society, 305--316.
[20]
Ahmed R. Mahmood, Ahmed M. Aly, Thamir Qadah, El Kindi Rezig, Anas Daghistani, Amgad Madkour, Ahmed S. Abdelhamid, Mohamed S. Hassan, Walid G. Aref, and Saleh M. Basalamah. 2015. Tornado: A Distributed Spatio-Textual Stream Processing System. Proc. VLDB Endow. 8, 12 (2015), 2020--2023.
[21]
Ahmed R. Mahmood, Anas Daghistani, Ahmed M. Aly, MingJie Tang, Saleh M. Basalamah, Sunil Prabhakar, and Walid G. Aref. 2018. Adaptive processing of spatial-keyword data over a distributed streaming cluster. In SIGSPATIAL/GIS. ACM, 219--228.
[22]
John E. Mitchell and Michael J. Todd. 1992. Solving combinatorial optimization problems using Karmarkar's algorithm. Math. Program. 56 (1992), 245--284. https://doi.org/10.1007/BF01580902
[23]
Shoji Nishimura, Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. 2011. MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services. In Mobile Data Management (1). IEEE Computer Society, 7--16.
[24]
Ethan L. Schreiber, Richard E. Korf, and Michael D. Moffitt. 2018. Optimal MultiWay Number Partitioning. J. ACM 65, 4 (2018), 24:1--24:61. https://doi.org/10.1145/3184400
[25]
MingJie Tang, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani, and Walid G. Aref. 2016. LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data. Proc. VLDB Endow. 9, 13 (2016), 1565--1568.
[26]
Xiang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Zengfeng Huang. 2016. SKYPE: Top-k Spatial-keyword Publish/Subscribe Over Sliding Window. Proc. VLDB Endow. 9, 7 (2016), 588--599.
[27]
Xiang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Wei Wang. 2015. AP-Tree: Efficiently support continuous spatial-keyword queries over stream. In ICDE. IEEE Computer Society, 1107--1118.
[28]
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient In-Memory Spatial Analytics. In SIGMOD Conference. ACM, 1071--1085.
[29]
Jia Yu, Jinxuan Wu, and Mohamed Sarwat. 2015. GeoSpark: a cluster computing framework for processing large-scale spatial data. In SIGSPATIAL/GIS. ACM, 70:1--70:4.
[30]
Minghe Yu, Guoliang Li, and Jianhua Feng. 2015. A Cost-based Method for Location-Aware Publish/Subscribe Services. In CIKM. ACM, 693--702.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. continuous query
  2. distributed system
  3. load partition
  4. streaming spatial-textual data

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media