skip to main content
research-article

Toward Fast and Scalable Random Walks over Disk-Resident Graphs via Efficient I/O Management

Published: 11 November 2022 Publication History

Abstract

Traditional graph systems mainly use the iteration-based model, which iteratively loads graph blocks into memory for analysis so as to reduce random I/Os. However, this iteration-based model limits the efficiency and scalability of running random walk, which is a fundamental technique to analyze large graphs. In this article, we first propose a state-aware I/O model to improve the I/O efficiency of running random walk, then we develop a block-centric indexing and buffering scheme for managing walk data, and leverage an asynchronous walk updating strategy to improve random walk efficiency. We implement an I/O-efficient graph system, GraphWalker, which is efficient to handle very large disk-resident graphs and also scalable to run tens of billions of random walks with only a single commodity machine. Experiments show that GraphWalker can achieve more than an order of magnitude speedup when compared with DrunkardMob, which is tailored for random walks based on the classical graph system GraphChi, as well as two state-of-the-art single-machine graph systems, Graphene and GraFSoft. Furthermore, when compared with the most recent distributed system KnightKing, GraphWalker still achieves comparable performance with only a single machine, thereby making it a more cost-effective alternative.

References

[2]
Graph500. [n.d]. Home Page. Retrieved October 5, 2022 from https://graph500.org/.
[3]
Web Data Commons. [n.d]. The 2012 Common Crawl Graph. Available at http://webdatacommons.org.
[5]
Yahoo! [n.d]. Yahoo Webscope Program. Retrieved October 5, 2022 from http://webscope.sandbox.yahoo.com.
[6]
Zhiyuan Ai, Mingxing Zhang, Yongwei Wu, Xuehai Qian, Kang Chen, and Weimin Zheng. 2017. Squeezing out all the value of loaded data: An out-of-core graph processing system with reduced disk I/O. In Proceedings of USENIX ATC.
[7]
Reid Andersen, Christian Borgs, Jennifer Chayes, Uriel Feige, Abraham Flaxman, Adam Kalai, Vahab Mirrokni, and Moshe Tennenholtz. 2008. Trust-based recommendation systems: An axiomatic approach. In Proceedings of WWW. ACM, New York, NY.
[8]
Ziv Bar-Yossef, Alexander Berg, Steve Chien, Jittat Fakcharoenphol, and Dror Weitz. 2000. Approximating aggregate queries about web pages via random walks. In Proceedings of VLDB.
[9]
Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, and James Cheng. 2018. G-Miner: An efficient task-oriented graph mining system. In Proceedings of EuroSys. ACM, New York, NY.
[10]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of EuroSys. ACM, New York, NY.
[11]
Wei Chen, Yajun Wang, and Siyu Yang. 2009. Efficient influence maximization in social networks. In Proceedings of KDD. ACM, New York, NY.
[12]
Disa Mhembere Da Zheng, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of FAST.
[13]
Souvik Debnath, Niloy Ganguly, and Pabitra Mitra. 2008. Feature weighting in content based recommendation system using social network analysis. In Proceedings of WWW. ACM, New York, NY.
[14]
Nima Elyasi, Changho Choi, and Anand Sivasubramaniam. 2019. Large-scale graph processing on emerging storage devices. In Proceedings of FAST.
[15]
Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Mathematics 2, 3 (2005), 333–358.
[16]
Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, and John Paul Sondag. 2007. Adaptive fastest path computation on a road network: A traffic mining approach. In Proceedings of VLDB.
[17]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of OSDI.
[18]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of OSDI.
[19]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of KDD. ACM, New York, NY, 855–864.
[20]
Taher H. Haveliwala. 2002. Topic-sensitive pagerank. In Proceedings of WWW. ACM, New York, NY.
[21]
Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork. 1999. Measuring index quality using random walks on the web. Computer Networks 31, 11 (1999), 1291–1303.
[22]
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-Marl: A DSL for easy and efficient graph analysis. ACM SIGPLAN Notices 47, 4 (2012), 349–362.
[23]
Andreas Hotho, Robert Jäschke, Christoph Schmitz, Gerd Stumme, and Klaus-Dieter Althoff. 2006. FolkRank: A ranking algorithm for folksonomies. In Proceedings of LWA.
[24]
Mohsen Jamali and Martin Ester. 2009. TrustWalker: A random walk model for combining trust-based and ttem-based recommendation. In Proceedings of KDD. ACM, New York, NY, 397–406.
[25]
Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of KDD. ACM, New York, NY, 538–543.
[26]
Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of WWW. ACM, New York, NY.
[27]
Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Using accelerated flash storage for external graph analytics. In Proceedings of ISCA. IEEE, Los Alamitos, CA.
[28]
David Kempe, Jon Kleinberg, and Éva Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of KDD. ACM, New York, NY, 137–146.
[29]
Arijit Khan, Gustavo Segovia, and Donald Kossmann. 2018. On smart query routing: For distributed graph querying with decoupled storage. In Proceedings of USENIX ATC.
[30]
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of HPDC. ACM, New York, NY, 239–252.
[31]
Aapo Kyrola. 2013. DrunkardMob: Billions of random walks on just a PC. In Proceedings of RecSys. ACM, New York, NY.
[32]
Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of OSDI.
[33]
A. N. Langville and C. D. Meyer. 2004. Deeper inside PageRank. Internet Mathematics 1, 3 (2004), 335–380.
[34]
Chul-Ho Lee, Xin Xu, and Do Young Eun. 2012. Beyond random walk and Metropolis-Hastings samplers: Why you should not backtrack for unbiased graph sampling. In Proceedings of SIGMETRICS.
[35]
Rong-Hua Li, Jeffrey Xu Yu, Xin Huang, and Hong Cheng. 2014. Random-walk domination in large graphs. In Proceedings of ICDE. IEEE, Los Alamitos, CA.
[36]
Hang Liu and H. Howie Huang. 2015. Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of SC. IEEE, Los Alamitos, CA.
[37]
Hang Liu and H. Howie Huang. 2017. Graphene: Fine-grained IO management for graph computing. In Proceedings of FAST.
[38]
Christian Lochert, Hannes Hartenstein, Jing Tian, Holger Fussler, Dagmar Hermann, and Martin Mauve. 2003. A routing strategy for vehicular ad hoc networks in city environments. In Proceedings of IV. IEEE, Los Alamitos, CA.
[39]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of VLDB.
[40]
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In Proceedings of EuroSys. ACM, New York, NY.
[41]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of SIGMOD. ACM, New York, NY.
[42]
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A lightweight infrastructure for graph analytics. In Proceedings of SOSP. ACM, New York, NY.
[43]
Larry Page. 1998. The PageRank Citation Ranking: Bring Order to the Web. Technical Report. Stanford University.
[44]
Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In Proceedings of KDD. ACM, New York, NY, 653–658.
[45]
Nataša Pržulj. 2007. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177–e183.
[46]
Nataša Pržulj, Derek G. Corneil, and Igor Jurisica. 2004. Modeling interactome: Scale-free or geometric? Bioinformatics 20, 18 (2004), 3508–3515.
[47]
Bruno Ribeiro and Don Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of SIGCOMM.
[48]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of SOSP. ACM, New York, NY.
[49]
Paat Rusmevichientong, David M. Pennock, Steve Lawrence, and C. Lee Giles. 2001. Methods for sampling pages uniformly from the World Wide Web. In Proceedings of AAAI.
[50]
Julian Shun and Guy E. Blelloch. 2013. Ligra: A lightweight graph processing framework for shared memory. ACM SIGPLAN Notices 48, 8 (2013), 135–146.
[51]
Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: A system for distributed graph mining. In Proceedings of SOSP. ACM, New York, NY.
[52]
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In Proceedings of ICDM. IEEE, Los Alamitos, CA.
[53]
Keval Vora. 2019. LUMOS: Dependency-driven disk-based graph processing. In Proceedings of USENIX ATC.
[54]
Keval Vora, Guoqing (Harry) Xu, and Rajiv Gupta. 2016. Load the edges you need: A generic I/O optimization for disk-based graph processing. In Proceedings of USENIX ATC.
[55]
Rui Wang, Min Lv, Zhiyong Wu, Yongkun Li, and Yinlong Xu. 2019. Fast graph centrality computation via sampling: A case study of influence maximisation over OSNs. International Journal of High Performance Computing and Networking 14, 1 (2019), 92–101.
[56]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. ACM SIGPLAN Notices 51, 8 (2016), Article 11, 12 pages.
[57]
Ke Yang, Xiaosong Ma, Saravanan Thirumuruganathan, Kang Chen, and Yongwei Wu. 2021. Random walks on huge graphs at cache efficiency. In Proceedings of SOSP. ACM, New York, NY, 311–326.
[58]
Ke Yang, MingXing Zhang, Kang Chen, Xiaosong Ma, Yang Bai, and Yong Jiang. 2019. KnightKing: A fast distributed graph random walk engine. In Proceedings of SOSP. ACM, New York, NY.
[59]
Pengpeng Zhao, Yongkun Li, Hong Xie, Zhiyong Wu, Yinlong Xu, and John C. S. Lui. 2017. Measuring and maximizing influence via random walk in social activity networks. In Proceedings of DASFAA. 323–338.
[60]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of OSDI.
[61]
Xiaowei Zhu, Guanyu Feng, Marco Serafini, Xiaosong Ma, Jiping Yu, Lei Xie, Ashraf Aboulnaga, and Wenguang Chen. 2020. LiveGraph: A transactional graph storage system with purely sequential adjacency list scans. Proceedings of the VLDB Endowment 13 (2020), 1020–1034.
[62]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of USENIX ATC.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 18, Issue 4
November 2022
255 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3570642
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2022
Online AM: 27 September 2022
Accepted: 25 April 2022
Revised: 19 January 2022
Received: 30 August 2021
Published in TOS Volume 18, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph processing system
  2. random walk
  3. graph storage

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Key R&D Program of China
  • Youth Innovation Promotion Association CAS
  • GRF
  • National Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)114
  • Downloads (Last 6 weeks)8
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media