skip to main content
research-article

An RDMA-enabled In-memory Computing Platform for R-tree on Clusters

Published: 12 February 2022 Publication History

Abstract

R-tree is a foundational data structure used in spatial databases and scientific databases. With the advancement of networks and computer architectures, in-memory data processing for R-tree in distributed systems has become a common platform. We have observed new performance challenges to process R-tree as the amount of multidimensional datasets become increasingly high. Specifically, an R-tree server can be heavily overloaded while the network and client CPU are lightly loaded, and vice versa.
In this article, we present the design and implementation of Catfish, an RDMA-enabled R-tree for low latency and high throughput by adaptively utilizing the available network bandwidth and computing resources to balance the workloads between clients and servers. We design and implement two basic mechanisms of using RDMA for a client-server R-tree data processing system. First, in the fast messaging design, we use RDMA writes to send R-tree requests to the server and let server threads process R-tree requests to achieve low query latency. Second, in the RDMA offloading design, we use RDMA reads to offload tree traversal from the server to the client, which rescues the server as it is overloaded. We further develop an adaptive scheme to effectively switch an R-tree search between fast messaging and RDMA offloading, maximizing the overall performance. Our experiments show that the adaptive solution of Catfish on InfiniBand significantly outperforms R-tree that uses only fast messaging or only RDMA offloading in both latency and throughput. Catfish can also deliver up to one order of magnitude performance over the traditional schemes using TCP/IP on 1 and 40 Gbps Ethernet. We make a strong case to use RDMA to effectively balance workloads in distributed systems for low latency and high throughput.

References

[1]
A. M. Abdullah. 1997. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specification. https://www.iith.ac.in/tbr/teaching/docs/802.11-2007.pdf.
[2]
Ablimit Aji, Fusheng Wang, Hoang Vo, Rubao Lee, Qiaoling Liu, Xiaodong Zhang, and Joel Saltz. 2013. Hadoop GIS: A high performance spatial data warehousing system over mapreduce. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1009–1020.
[3]
Lars Arge, Mark de Berg, Herman J. Haverkort, and Ke Yi. 2004. The priority R-Tree: A practically efficient and worst-case optimal R-Tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04). Association for Computing Machinery, New York, NY, 347–358.
[4]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (Atlantic City, New Jersey) (SIGMOD’90). ACM, New York, NY, 322–331.
[5]
Norbert Beckmann and Bernhard Seeger. 2008. A benchmark for multidimensional index structures. Retrieved from http://www.mathematik.uni-marburg.de/seeger/rrstar/index.html.
[6]
Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, and Erfan Zamanian. 2016. The end of slow networks: It’s time for a redesign. Proc. VLDB Endow. 9, 7 (Mar. 2016), 528–539.
[7]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1994. Multi-Step processing of spatial joins. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD’94). Association for Computing Machinery, New York, NY, 197–208.
[8]
Youmin Chen, Youyou Lu, and Jiwu Shu. 2019. Scalable RDMA RPC on reliable connection with efficient resource sharing. In Proceedings of the 14th EuroSys Conference 2019. 1–14.
[9]
Yanzhe Chen, Xingda Wei, Jiaxin Shi, Rong Chen, and Haibo Chen. 2016. Fast and general distributed transactions using RDMA and HTM. In Proceedings of the 11th European Conference on Computer Systems (EuroSys’16). ACM, New York, NY, Article 26, 17 pages.
[10]
David J. DeWitt, Navin Kabra, Jun Luo, Jignesh M. Patel, and Jie-Bing Yu. 1994. Client-Server paradise. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Morgan Kaufmann Publishers, San Francisco, CA, 558–569.
[11]
Aleksandar Dragojevic, Dushyanth Narayanan, and Miguel Castro. 2017. RDMA reads: To use or not to use?IEEE Data Eng. Bull. 40, 1 (2017), 3–14.
[12]
Aleksandar Dragojević, Dushyanth Narayanan, Orion Hodson, and Miguel Castro. 2014. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation (NSDI’14). USENIX Association, Berkeley, CA, 401–414.
[13]
Aleksandar Dragojević, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No Compromises: Distributed transactions with consistency, availability, and performance. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY, 54–70.
[14]
Ahmed Eldawy and Mohamed F. Mokbel. 2015. SpatialHadoop: A mapreduce framework for spatial data. In Proceedings of the IEEE 31st International Conference on Data Engineering (ICDE’15). IEEE, 1352–1363.
[15]
Google. 2005. Google Maps. Retrieved from https://maps.google.com.
[16]
Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’84). ACM, New York, NY, 47–57.
[17]
Herman Haverkort and Freek V. Walderveen. 2008. Four-Dimensional hilbert curves for R-Trees. ACM J. Exp. Algorithmics 16, Article 3.4 (Nov. 2008), 19 pages.
[18]
Nusrat S. Islam, Md. Wasi-Ur-Rahman, Jithin Jose, Raghunath Rajachandrasekar, Hao Wang, Hari Subramoni, Chet Murthy, and Dhabaleswar K. Panda. 2012. High performance RDMA-based design of HDFS over infiniband. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE Computer Society Press, Los Alamitos, CA, Article 35, 35 pages.
[19]
Anand Padmanabha Iyer and Ion Stoica. 2017. A scalable distributed spatial index for the internet-of-things. In Proceedings of the Symposium on Cloud Computing (SoCC’17). ACM, New York, NY, 548–560.
[20]
Jithin Jose, Hari Subramoni, Krishna Kandalla, Md. Wasi-ur Rahman, Hao Wang, Sundeep Narravula, and Dhabaleswar K. Panda. 2012. Scalable memcached design for infiniband clusters using hybrid transports. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’12). IEEE Computer Society, Washington, DC, 236–243.
[21]
Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi-ur Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan Sur, and Dhabaleswar K. Panda. 2011. Memcached design on high performance RDMA capable interconnects. In Proceedings of the International Conference on Parallel Processing (ICPP’11). IEEE Computer Society, Washington, DC, 743–752.
[22]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA efficiently for key-value services. In Proceedings of the ACM Conference on SIGCOMM (SIGCOMM’14). ACM, New York, NY, 295–306.
[23]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design guidelines for high performance RDMA systems. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference (USENIX ATC’16). USENIX Association, Berkeley, CA, 437–450.
[24]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, Berkeley, CA, 185–201.
[25]
Ibrahim Kamel and Christos Faloutsos. 1994. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). 500–509.
[26]
Mincheol Kim, Ling Liu, and Wonik Choi. 2018. A GPU-aware parallel index for processing high-dimensional big data. IEEE Trans. Comput. 67, 10 (2018), 1388–1402.
[27]
Mincheol Kim, Ling Liu, and Woink Choi. 2021. Multi-GPU efficient indexing for maximizing parallelism of high dimensional range query services. IEEE Trans. Serv. Comput. (2021). https://www.computer.org/csdl/journal/sc/5555/01/09430517/1tzufIEV6Vy.
[28]
Marcel Kornacker and Douglas Banks. 1995. High-Concurrency locking in R-Trees. In Proceedings of the 21th International Conference on Very Large Data Bases (VLDB’95). Morgan Kaufmann Publishers, San Francisco, CA, 134–145.
[29]
S. T. Leutenegger, M. A. Lopez, and J. Edgington. 1997. STR: A simple and efficient algorithm for R-tree packing. In Proceedings of the 13th International Conference on Data Engineering. 497–506.
[30]
Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, Jie Zhang, and Dhabaleswar K. Panda. 2016. Designing MPI library with on-demand paging (ODP) of infiniband: Challenges and benefits. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, 37.
[31]
Tianxi Li, Haiyang Shi, and Xiaoyi Lu. 2021. HatRPC: Hint-Accelerated thrift RPC over RDMA. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’21). Association for Computing Machinery, New York, NY, Article 36, 14 pages. https://doi.org/10.1145/3458817.3476191
[32]
Zhisheng Li, Ken C. K. Lee, Baihua Zheng, Wang-Chien Lee, Dik Lee, and Xufa Wang. 2011. IR-Tree: An efficient index for geographic document search. IEEE Trans. Knowl. Data Eng. 23, 4 (2011), 585–599.
[33]
Yanhui Liang, Hoang Vo, Jun Kong, and Fusheng Wang. 2017. iSPEED: An efficient in-memory-based spatial query system for large-scale 3D data with complex structures. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’17). ACM, New York, NY, Article 17, 10 pages.
[34]
Feilong Liu, Lingyan Yin, and Spyros Blanas. 2017. Design and evaluation of an RDMA-aware data shuffling operator for parallel database systems. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17). ACM, New York, NY, 48–63.
[35]
Xiaoyi Lu, Nusrat S. Islam, Md. Wasi-Ur-Rahman, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K. Panda. 2013. High-Performance design of Hadoop RPC with RDMA over infiniband. In Proceedings of the 42nd International Conference on Parallel Processing (ICPP’13). IEEE Computer Society, Washington, DC, 641–650.
[36]
Miao Luo, Dhabaleswar K. Panda, Khaled Z. Ibrahim, and Costin Iancu. 2012. Congestion avoidance on manycore high performance computing systems. In Proceedings of the 26th ACM international conference on Supercomputing. 121–132.
[37]
Robert M. Metcalfe and David R. Boggs. 1976. Ethernet: Distributed packet switching for local computer networks. Commun. ACM 19, 7 (1976), 395–404.
[38]
Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using one-sided RDMA reads to build a fast, CPU-efficient Key-value store. In Proceedings of the USENIX Conference on Annual Technical Conference (USENIX ATC’13). USENIX Association, Berkeley, CA, 103–114.
[39]
Christopher Mitchell, Kate Montgomery, Lamont Nelson, Siddhartha Sen, and Jinyang Li. 2016. Balancing CPU and network in the cell distributed B-tree store. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference (Denver, CO) (USENIX ATC’16). USENIX Association, Berkeley, CA, 451–464.
[40]
Sushil K Prasad, Michael McDermott, Xi He, and Satish Puri. 2015. GPU-based parallel R-tree construction and querying. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshop. IEEE, 618–627.
[41]
Jianzhong Qi, Yufei Tao, Yanchuan Chang, and Rui Zhang. 2018. Theoretically optimal and empirically efficient R-Trees with strong parallelizability. Proc. VLDB Endow. 11, 5 (Jan. 2018), 621–634.
[42]
An Qin, Mengbai Xiao, Jin Ma, Dai Tan, Rubao Lee, and Xiaodong Zhang. 2019. DirectLoad: A Fast Web-scale Index System across Large Regional Centers. In Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE’19). IEEE.
[43]
Nick Roussopoulos and Daniel Leifker. 1985. Direct spatial search on pictorial databases using packed R-Trees. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’85). Association for Computing Machinery, New York, NY, 17–31.
[44]
Abdallah Salama, Carsten Binnig, Tim Kraska, Ansgar Scherp, and Tobias Ziegler. 2017. Rethinking distributed query execution on high-speed networks. Data Eng. 40, 1 (2017), 27–37.
[45]
Timos K. Sellis, Nick Roussopoulos, and Christos Faloutsos. 1987. The R+-Tree: A dynamic index for multi-dimensional objects. In Proceedings of the 13th International Conference on Very Large Data Bases (VLDB’87). Morgan Kaufmann Publishers, San Francisco, CA, 507–518.
[46]
Zeyuan Shang, Guoliang Li, and Zhifeng Bao. 2018. DITA: Distributed in-memory trajectory analytics. In Proceedings of the International Conference on Management of Data (SIGMOD’18). ACM, New York, NY, 725–740.
[47]
Maomeng Su, Mingxing Zhang, Kang Chen, Zhenyu Guo, and Yongwei Wu. 2017. RFP: When RPC is faster than server-bypass with RDMA. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17). ACM, New York, NY, 1–15.
[48]
Mingjie Tang, Yongyang Yu, Qutaibah M. Malluhi, Mourad Ouzzani, and Walid G. Aref. 2016. LocationSpark: A distributed in-memory data management system for big spatial data. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1565–1568.
[49]
Mellanox Technologies. 2015. Performance Tests (perftest) package for OFED. Retrieved from https://github.com/linux-rdma/perftest.
[50]
Shin-Yeh Tsai and Yiying Zhang. 2017. LITE kernel RDMA support for datacenter applications. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). ACM, New York, NY, 306–324.
[51]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17). ACM, New York, NY, 1009–1024.
[52]
Yandong Wang, Xiaoqiao Meng, Li Zhang, and Jian Tan. 2014. C-Hint: An effective and reliable cache management for RDMA-accelerated key-value stores. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’14). ACM, New York, NY, Article 23, 13 pages.
[53]
Yandong Wang, Li Zhang, Jian Tan, Min Li, Yuqing Gao, Xavier Guerin, Xiaoqiao Meng, and Shicong Meng. 2015. HydraDB: A resilient RDMA-driven Key-value middleware for in-memory cluster computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’15). ACM, New York, NY, Article 22, 11 pages.
[54]
Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). ACM, New York, NY, 87–104.
[55]
Mengbai Xiao, Hao Wang, Liang Geng, Rubao Lee, and Xiaodong Zhang. 2019. Catfish: Adaptive RDMA-enabled R-Tree for low latency and high throughput. In Proceedings of the IEEE 39th International Conference on Distributed Computing Systems (ICDCS’19). IEEE Computer Society, Washington, DC, 164–175.
[56]
Dong Xie, Feifei Li, Bin Yao, Gefei Li, Liang Zhou, and Minyi Guo. 2016. Simba: Efficient in-memory spatial analytics. In Proceedings of the International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, 1071–1085.
[57]
[58]
Erfan Zamanian, Carsten Binnig, Tim Harris, and Tim Kraska. 2017. The end of a myth: Distributed transactions can scale. Proc. VLDB Endow. 10, 6 (Feb. 2017), 685–696.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Spatial Algorithms and Systems
ACM Transactions on Spatial Algorithms and Systems  Volume 8, Issue 2
June 2022
253 pages
ISSN:2374-0353
EISSN:2374-0361
DOI:10.1145/3506671
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2022
Accepted: 01 December 2021
Revised: 01 September 2021
Received: 01 March 2021
Published in TSAS Volume 8, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDMA
  2. R-tree

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)85
  • Downloads (Last 6 weeks)9
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media