skip to main content
research-article

CompressGraph: Efficient Parallel Graph Analytics with Rule-Based Compression

Published: 30 May 2023 Publication History

Abstract

Modern graphs exert colossal time and space pressure on graph analytics applications. In 2022, Facebook social graph reaches 2.91 billion users with trillions of edges. Many compression algorithms have been developed to support direct processing on compressed graphs to address this challenge. However, previous graph compression algorithms do not focus on leveraging redundancy in repeated neighbor sequences, so they do not save the amount of computation for graph analytics. We develop CompressGraph, an efficient rule-based graph analytics engine that leverages data redundancy in graphs to achieve both performance boost and space reduction for common graph applications. CompressGraph has three advantages over previous works. First, the rule-based abstraction of CompressGraph supports the reuse of intermediate results during graph traversal, thus saving time. Second, CompressGraph has intense expressiveness to support a wide range of graph applications. Third, CompressGraph scales well under high parallelism because the context-free rules have few dependencies. Experiments show that CompressGraph provides significant performance and space benefits on both CPUs and GPUs. On evaluating six typical graph applications, CompressGraph can achieve 1.97× speedup on the CPU, while 3.95× speedup on the GPU, compared to the state-of-the-art CPU and GPU methods, respectively. Moreover, CompressGraph can save an average of 71.27% memory savings on CPU and 70.36 on GPU.

Supplemental Material

MP4 File
Presentation video
PDF File
Read me
ZIP File
Source Code

References

[1]
Rachit Agarwal, Anurag Khandelwal, and Ion Stoica. 2015. Succinct: Enabling queries on compressed data. In 12th $$USENIX$$ Symposium on Networked Systems Design and Implementation ($$NSDI$$ 15). 337--350.
[2]
Sebastian E Ahnert. 2013. Power graph compression reveals dominant relationships in genetic transcription networks. Molecular BioSystems, Vol. 9, 11 (2013), 2681--2685.
[3]
Ahmed Al-Baghdadi and Xiang Lian. 2020. Topic-based Community Search over Spatial-Social Networks. Proc. VLDB Endow., Vol. 13, 11 (2020), 2104--2117. http://www.vldb.org/pvldb/vol13/p2104-al-baghdadi.pdf
[4]
Alberto Apostolico and Guido Drovandi. 2009. Graph compression by BFS. Algorithms, Vol. 2, 3 (2009), 1031--1044.
[5]
Arash Ashari, Naser Sedaghati, John Eisenlohr, Srinivasan Parthasarath, and P Sadayappan. 2014. Fast sparse matrix-vector multiplication on GPUs for graph applications. In SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 781--792.
[6]
Prithu Banerjee, Wei Chen, and Laks VS Lakshmanan. 2019. Maximizing welfare in social networks under a utility driven influence diffusion model. In Proceedings of the 2019 International Conference on Management of Data. 1078--1095.
[7]
Chris Barrett, Keith Bisset, Martin Holzer, Goran Konjevod, Madhav Marathe, and Dorothea Wagner. 2008. Engineering label-constrained shortest-path algorithms. In International conference on algorithmic applications in management. Springer, 27--37.
[8]
André M Bastos and Jan-Mathijs Schoffelen. 2016. A tutorial review of functional connectivity analysis methods and their interpretational pitfalls. Frontiers in systems neuroscience, Vol. 9 (2016), 175.
[9]
Maciej Besta and Torsten Hoefler. 2018. Survey and taxonomy of lossless graph compression and space-efficient graph representations. arXiv preprint arXiv:1806.01799 (2018).
[10]
Song Bian, Qintian Guo, Sibo Wang, and Jeffrey Xu Yu. 2020. Efficient Algorithms for Budgeted Influence Maximization on Massive Social Networks. Proc. VLDB Endow., Vol. 13, 9 (2020), 1498--1510. https://doi.org/10.14778/3397230.3397244
[11]
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research, Vol. 3 (2003), 993--1022.
[12]
Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web, Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar (Eds.). ACM Press, 587--596.
[13]
Paolo Boldi and Sebastiano Vigna. 2004. The webgraph framework I: compression techniques. In Proceedings of the 13th international conference on World Wide Web. 595--602.
[14]
Dhruba Borthakur et al. 2008. HDFS architecture guide. Hadoop Apache Project, Vol. 53, 1--13 (2008), 2.
[15]
Mireille Bousquet-Mélou, Markus Lohrey, Sebastian Maneth, and Eric Noeth. 2015. XML compression via directed acyclic graphs. Theory of Computing Systems, Vol. 57, 4 (2015), 1322--1371.
[16]
Nieves R Brisaboa, Susana Ladra, and Gonzalo Navarro. 2009. k 2-trees for compact web graph representation. In International symposium on string processing and information retrieval. Springer, 18--30.
[17]
Gregory Buehrer and Kumar Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proceedings of the 2008 international conference on web search and data mining. 95--106.
[18]
Michael Burrows and David Wheeler. 1994. A block-sorting lossless data compression algorithm. In Digital SRC Research Report. Citeseer.
[19]
Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, and Jianhua Feng. 2016. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach. In SIGMOD, Fatma Ö zcan, Georgia Koutrika, and Sam Madden (Eds.). ACM, 969--984.
[20]
Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, and Jianhua Feng. 2018. A partial-order-based framework for cost-effective crowdsourced entity resolution. VLDB J., Vol. 27, 6 (2018), 745--770.
[21]
Venkat Venkat Bala Chandar. 2010. Sparse graph codes for compression, sensing, and secrecy. Ph.,D. Dissertation. Massachusetts Institute of Technology.
[22]
Moses Charikar, Eric Lehman, Ding Liu, Rina Panigrahy, Manoj Prabhakaran, Amit Sahai, and Abhi Shelat. 2005. The smallest grammar problem. IEEE Transactions on Information Theory, Vol. 51, 7 (2005), 2554--2576.
[23]
Hongzhi Chen, Xiaoxi Wang, Chenghuan Huang, Juncheng Fang, Yifan Hou, Changji Li, and James Cheng. 2019. Large Scale Graph Mining with G-Miner. In Proceedings of the 2019 International Conference on Management of Data. 1881--1884.
[24]
Xinyu Chen, Marco Minutoli, Jiannan Tian, Mahantesh Halappanavar, Ananth Kalyanaraman, and Dingwen Tao. 2022. HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures. arXiv preprint arXiv:2208.00613 (2022).
[25]
Yixin Chen, Guozhu Dong, Jiawei Han, Jian Pei, Benjamin W Wah, and Jianyong Wang. 2006. Regression cubes with lossless compression and aggregation. IEEE Transactions on Knowledge and Data Engineering, Vol. 18, 12 (2006), 1585--1599.
[26]
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Michael Mitzenmacher, Alessandro Panconesi, and Prabhakar Raghavan. 2009. On compressing social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 219--228.
[27]
Trishul M Chilimbi. 2001. Efficient representations and abstractions for quantifying and exploiting data reference locality. ACM SIGPLAN Notices, Vol. 36, 5 (2001), 191--202.
[28]
Trishul M Chilimbi and Martin Hirzel. 2002. Dynamic hot data stream prefetching for general-purpose programs. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation. 199--209.
[29]
Francisco Claude and Gonzalo Navarro. 2010. Fast and compact web graph representations. ACM Transactions on the Web (TWEB), Vol. 4, 4 (2010), 1--31.
[30]
Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009. Introduction to algorithms. MIT press.
[31]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2020. Aggregation Support for Modern Graph Analytics in TigerGraph. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 377--392. https://doi.org/10.1145/3318464.3386144
[32]
HN Dheemanth. 2014. LZW data compression. American Journal of Engineering Research, Vol. 3, 2 (2014), 22--26.
[33]
Edsger W Dijkstra et al. 1959. A note on two problems in connexion with graphs. Numerische mathematik, Vol. 1, 1 (1959), 269--271.
[34]
Kasper Dinkla, Michel A Westenberg, and Jarke J van Wijk. 2012. Compressed adjacency matrices: Untangling gene regulatory networks. IEEE Transactions on Visualization and Computer Graphics, Vol. 18, 12 (2012), 2457--2466.
[35]
Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M Patel. 2015. The Case Against Specialized Graph Analytics Engines. In CIDR.
[36]
Wenfei Fan, Ruochun Jin, Muyang Liu, Ping Lu, Xiaojian Luo, Ruiqi Xu, Qiang Yin, Wenyuan Yu, and Jingren Zhou. 2020. Application Driven Graph Partitioning. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1765--1779. https://doi.org/10.1145/3318464.3389745
[37]
Wenfei Fan, Jianzhong Li, Xin Wang, and Yinghui Wu. 2012. Query preserving graph compression. In Proceedings of the 2012 ACM SIGMOD international conference on management of data. 157--168.
[38]
Wenfei Fan, Yuanhao Li, Muyang Liu, and Can Lu. 2022. A Hierarchical Contraction Scheme for Querying Big Graphs. In Proceedings of the 2022 International Conference on Management of Data. 1726--1740.
[39]
Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. 2019. Graph neural networks for social recommendation. In The world wide web conference. 417--426.
[40]
Andrea Farruggia, Paolo Ferragina, and Rossano Venturini. 2014. Bicriteria data compression: Efficient and usable. In European Symposium on Algorithms. Springer, 406--417.
[41]
Paolo Ferragina, Rodrigo González, Gonzalo Navarro, and Rossano Venturini. 2009a. Compressed text indexes: From theory to practice. Journal of Experimental Algorithmics (JEA), Vol. 13 (2009), 1--12.
[42]
Paolo Ferragina and Giovanni Manzini. 2005. Indexing compressed text. Journal of the ACM (JACM), Vol. 52, 4 (2005), 552--581.
[43]
Paolo Ferragina, Igor Nitto, and Rossano Venturini. 2009b. On the bit-complexity of Lempel-Ziv compression. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 768--777.
[44]
Linton C Freeman, Douglas Roeder, and Robert R Mulholland. 1979. Centrality in social networks: II. Experimental results. Social networks, Vol. 2, 2 (1979), 119--141.
[45]
Jun Gao, Jiazun Chen, Zhao Li, and Ji Zhang. 2021. ICS-GNN: Lightweight Interactive Community Search via Graph Neural Network. Proc. VLDB Endow., Vol. 14, 6 (2021), 1006--1018. http://www.vldb.org/pvldb/vol14/p1006-gao.pdf
[46]
Shangqian Gao, Feihu Huang, Jian Pei, and Heng Huang. 2020. Discrete model compression with resource constraint for deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1899--1908.
[47]
Adrià Gascón, Markus Lohrey, Sebastian Maneth, Carl Philipp Reh, and Kurt Sieber. 2020. Grammar-based compression of unranked trees. Theory of Computing Systems, Vol. 64, 1 (2020), 141--176.
[48]
Advitya Gemawat. 2021. GraphGem: Optimized Scalable System for Graph Convolutional Networks. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 2920--2922. https://doi.org/10.1145/3448016.3450573
[49]
Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, and Martin C. Herbordt. 2020. AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 922--936. https://doi.org/10.1109/MICRO50266.2020.00079
[50]
Prasun Gera, Hyojong Kim, Piyush Sao, Hyesoon Kim, and David Bader. 2020. Traversing large graphs on GPUs with unified memory. Proceedings of the VLDB Endowment, Vol. 13, 7 (2020), 1119--1133.
[51]
Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. 2014. From theory to practice: Plug and play with succinct data structures. In International Symposium on Experimental Algorithms. Springer, 326--337.
[52]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. Powergraph: Distributed graph-parallel computation on natural graphs. In 10th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 12). 17--30.
[53]
Roberto Grossi, Ankur Gupta, and Jeffrey Scott Vitter. 2004. When indexing equals compression: experiments with compressing suffix arrays and applications. In SODA, Vol. 4. 636--645.
[54]
Ankit Gupta and Sergio Verdú. 2009. Nonlinear sparse-graph codes for lossy compression. IEEE Transactions on Information Theory, Vol. 55, 5 (2009), 1961--1975.
[55]
Wei Han, Daniel Mawhirter, Bo Wu, and Matthew Buland. 2017. Graphie: Large-scale asynchronous graph traversals on just a GPU. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 233--245.
[56]
Wentao Han, Xiaowei Zhu, Ziyan Zhu, Wenguang Chen, Weimin Zheng, and Jianguo Lu. 2016. A comparative analysis on Weibo and Twitter. Tsinghua Science and Technology, Vol. 21, 1 (2016), 1--16.
[57]
Cecilia Hernández and Gonzalo Navarro. 2011. Compression of web and social graphs supporting neighbor and community queries. In Proc. 5th ACM Workshop on Social Network Mining and Analysis (SNA-KDD). ACM.
[58]
Wing-Kai Hon, Tak Wah Lam, Wing-Kin Sung, Wai-Leuk Tse, Chi-Kwong Wong, and Siu-Ming Yiu. 2004. Practical aspects of Compressed Suffix Arrays and FM-Index in Searching DNA Sequences. In ALENEX/ANALC. Citeseer, 31--38.
[59]
Zhihao Jia, Yongkee Kwon, Galen Shipman, Pat McCormick, Mattan Erez, and Alex Aiken. 2017. A distributed multi-gpu system for fast graph processing. Proceedings of the VLDB Endowment, Vol. 11, 3 (2017), 297--310.
[60]
Xiaowei Jiang, Xiang Zhang, Feifei Gao, Chunan Pu, and Peng Wang. 2013. Graph compression strategies for instance-focused semantic mining. In China Semantic Web Symposium and Web Science Conference. Springer, 50--61.
[61]
Alekh Jindal, Samuel Madden, Malú Castellanos, and Meichun Hsu. 2015. Graph analytics using vertica relational database. In 2015 IEEE International Conference on Big Data (Big Data). IEEE, 1191--1200.
[62]
Anurag Khandelwal, Zongheng Yang, Evan Ye, Rachit Agarwal, and Ion Stoica. 2017. Zipg: A memory-efficient graph store for interactive queries. In Proceedings of the 2017 ACM International Conference on Management of Data. 1149--1164.
[63]
Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N Bhuyan. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd international symposium on High-performance parallel and distributed computing. 239--252.
[64]
Jon M Kleinberg. 1999. Hubs, authorities, and communities. ACM computing surveys (CSUR), Vol. 31, 4es (1999), 5--es.
[65]
Christine Klymko, David Gleich, and Tamara G Kolda. 2014. Using triangles to improve community detection in directed networks. arXiv preprint arXiv:1404.5874 (2014).
[66]
Seongyun Ko, Taesung Lee, Kijae Hong, Wonseok Lee, In Seo, Jiwon Seo, and Wook-Shin Han. 2021. iTurboGraph: Scaling and Automating Incremental Graph Analytics. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 977--990. https://doi.org/10.1145/3448016.3457243
[67]
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. $$GraphChi$$:$$Large-Scale$$ Graph Computation on Just a $$PC$$. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 31--46.
[68]
Laks VS Lakshmanan, Jian Pei, and Yan Zhao. 2003 a. Efficacious data cube exploration by semantic summarization and compression. In Proceedings 2003 VLDB Conference. Elsevier, 1125--1128.
[69]
Laks VS Lakshmanan, Jian Pei, and Yan Zhao. 2003 b. Socqet: semantic olap with compressed cube and summarization. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data. 658--658.
[70]
N Jesper Larsson and Alistair Moffat. 2000. Off-line dictionary-based compression. Proc. IEEE, Vol. 88, 11 (2000), 1722--1732.
[71]
David Lee and Mihalis Yannakakis. 1996. Principles and methods of testing finite state machines-a survey. Proc. IEEE, Vol. 84, 8 (1996), 1090--1123.
[72]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
[73]
Jinbao Li and Jianzhong Li. 2005. Data sampling control and compression in sensor networks. In International Conference on Mobile Ad-Hoc and Sensor Networks. Springer, 42--51.
[74]
Jinbao Li and Jianzhong Li. 2007. Data sampling control, compression and query in sensor networks. International Journal of Sensor Networks, Vol. 2, 1--2 (2007), 53--61.
[75]
Jianzhong Li, Qianqian Ren, et al. 2011. Compressing information of target tracking in wireless sensor networks. Wireless Sensor Network, Vol. 3, 02 (2011), 73.
[76]
Jianzhong Li, Doron Rotem, and Jaideep Srivastava. 1999. Aggregation algorithms for very large compressed data warehouses. In VLDB, Vol. 99. 651--662.
[77]
JZ Li, Doron Rotem, and Harry KT Wong. 1987. A new compression method with fast searching on large databases. (1987).
[78]
Jianzhong Li and Jaideep Srivastava. 2002. Efficient aggregation algorithms for compressed data warehouses. IEEE Transactions on Knowledge and Data Engineering, Vol. 14, 3 (2002), 515--529.
[79]
Wentao Li, Miao Qiao, Lu Qin, Ying Zhang, Lijun Chang, and Xuemin Lin. 2019. Scaling distance labeling on small-world networks. In Proceedings of the 2019 International Conference on Management of Data. 1060--1077.
[80]
Hyeontaek Lim, Bin Fan, David G Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 1--13.
[81]
Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, et al. 2018. Shentu: processing multi-trillion edge graphs on millions of cores in seconds. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 706--716.
[82]
Qing Liu, Minjun Zhao, Xin Huang, Jianliang Xu, and Yunjun Gao. 2020. Truss-based Community Search over Large Directed Graphs. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 2183--2197. https://doi.org/10.1145/3318464.3380587
[83]
Wei Liu, Andrey Kan, Jeffrey Chan, James Bailey, Christopher Leckie, Jian Pei, and Ramamohanarao Kotagiri. 2012. On compressing weighted time-evolving graphs. In Proceedings of the 21st ACM international conference on Information and knowledge management. 2319--2322.
[84]
Markus Lohrey, Sebastian Maneth, and Roy Mennicke. 2013. XML tree structure compression using RePair. Information Systems, Vol. 38, 8 (2013), 1150--1167.
[85]
Markus Lohrey, Sebastian Maneth, and Carl Philipp Reh. 2017. Compression of unordered XML trees. In 20th International Conference on Database Theory (ICDT 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[86]
István Lukovits. 2000. A compact form of the adjacency matrix. Journal of chemical information and computer sciences, Vol. 40, 5 (2000), 1147--1150.
[87]
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In Proceedings of the Twelfth European Conference on Computer Systems. 527--543.
[88]
Sebastian Maneth. 2019. Grammar-Based Compression. Journal: Encyclopedia of Big Data Technologies (2019), 801--808.
[89]
Sebastian Maneth and Fabian Peternek. 2015. A survey on methods and systems for graph compression. arXiv preprint arXiv:1504.00616 (2015).
[90]
Sebastian Maneth and Fabian Peternek. 2016. Compressing graphs by grammars. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 109--120.
[91]
Sebastian Maneth and Fabian Peternek. 2018. Grammar-based graph compression. Information Systems, Vol. 76 (2018), 19--45.
[92]
Sebastian Maneth and Fabian Peternek. 2020. Constant delay traversal of grammar-compressed graphs with bounded rank. Information and Computation, Vol. 273 (2020), 104520.
[93]
Davide Marengo, Cornelia Sindermann, Jon D Elhai, and Christian Montag. 2020. One social media company to rule them all: Associations between use of Facebook-owned social media platforms, sociodemographic characteristics, and the Big Five personality traits. Frontiers in psychology, Vol. 11 (2020), 936.
[94]
Hossein Maserrat and Jian Pei. 2010. Neighbor query friendly compression of social networks. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 533--542.
[95]
Hossein Maserrat and Jian Pei. 2012. Community preserving lossy compression of social networks. In 2012 IEEE 12th International Conference on Data Mining. IEEE, 509--518.
[96]
Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Computing Surveys (CSUR), Vol. 48, 2 (2015), 1--39.
[97]
Frank McSherry, Michael Isard, and Derek G Murray. 2015. Scalability! But at what $$COST$$?. In 15th Workshop on Hot Topics in Operating Systems (HotOS $$XV$$).
[98]
Kinichi Mitsui. 1993. Information retrieval based on rank-ordered cumulative query scores calculated from weights of all keywords in an inverted index file for minimizing access to a main database. US Patent 5,263,159.
[99]
Lifeng Nai, Yinglong Xia, Ilie G Tanase, Hyesoon Kim, and Ching-Yung Lin. 2015. GraphBIG: understanding graph computing in the context of industrial solutions. In SC'15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 1--12.
[100]
Craig G Nevill-Manning and Ian H Witten. 1997 a. Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research, Vol. 7 (1997), 67--82.
[101]
Craig G Nevill-Manning and Ian H Witten. 1997 b. Linear-time, incremental hierarchy inference for compression. In Proceedings DCC'97. Data Compression Conference. IEEE, 3--11.
[102]
Bradford Nichols, Dick Buttlar, and Jacqueline Proulx Farrell. 1996. Pthreads programming: A POXIS standard for better multiprocessing. Vol. 19. O'reilly Sebastopol, CA, USA.
[103]
Dian Ouyang, Dong Wen, Lu Qin, Lijun Chang, Ying Zhang, and Xuemin Lin. 2020. Progressive Top-K Nearest Neighbors Search in Large Road Networks. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD '20). Association for Computing Machinery, New York, NY, USA, 1781--1795. https://doi.org/10.1145/3318464.3389746
[104]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report. Stanford InfoLab.
[105]
Sankar K Pal and Sushmita Mitra. 1992. Multilayer perceptron, fuzzy sets, classifiaction. (1992).
[106]
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, and John D Owens. 2017. Multi-GPU graph analytics. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 479--490.
[107]
Zaifeng Pan, Feng Zhang, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2021. Exploring data analytics without decompression on embedded GPU systems. IEEE Transactions on Parallel and Distributed Systems, Vol. 33, 7 (2021), 1553--1568.
[108]
Alberto Parravicini, Rhicheek Patra, Davide B. Bartolini, and Marco D. Santambrogio. 2019. Fast and Accurate Entity Linking via Graph Embedding. In Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Amsterdam, The Netherlands, 30 June 2019, Akhil Arora, Arnab Bhattacharya, and George H. L. Fletcher (Eds.). ACM, 10:1--10:9. https://doi.org/10.1145/3327964.3328499
[109]
Punit Patel and Kanu Patel. 2015. A Review of PageRank and HITS Algorithms. International Journal of Advance Research in Engineering, Science & Technology (IJAREST), Vol. 2, 1 (2015), 1--4.
[110]
Jiezhong Qiu, Laxman Dhulipala, Jie Tang, Richard Peng, and Chi Wang. 2021. LightNE: A Lightweight Graph Processing System for Network Embedding. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 2281--2289. https://doi.org/10.1145/3448016.3457329
[111]
Erhard Rahm and Hong Hai Do. 2000. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull., Vol. 23, 4 (2000), 3--13.
[112]
Qianqian Ren, Jianzhong Li, and Jinbao Li. 2007. An efficient clustering-based method for data gathering and compressing in sensor networks. In Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), Vol. 1. IEEE, 823--828.
[113]
Ryan A Rossi and Rong Zhou. 2018. Graphzip: a clique-based sparse graph compression method. Journal of Big Data, Vol. 5, 1 (2018), 1--14.
[114]
Kunihiko Sadakane. 2007. Compressed suffix trees with full functionality. Theory of Computing Systems, Vol. 41, 4 (2007), 589--607.
[115]
Quan Shi, Yanghua Xiao, Nik Bessis, Yiqi Lu, Yaoliang Chen, and Richard Hill. 2012. Optimizing K2 trees: A case for validating the maturity of network of practices. Computers & Mathematics with Applications, Vol. 63, 2 (2012), 427--436.
[116]
Julian Shun and Guy E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming. 135--146.
[117]
Julian Shun, Laxman Dhulipala, and Guy E Blelloch. 2015. Smaller and faster: Parallel processing of compressed graphs with Ligra. In 2015 Data Compression Conference. IEEE, 403--412.
[118]
Abraham Silberschatz, Peter B Galvin, and Greg Gagne. 2014. Operating system concepts essentials. Wiley Hoboken.
[119]
Harmanjit Singh and Richa Sharma. 2012. Role of adjacency matrix & adjacency list in graph theory. International Journal of Computers & Technology, Vol. 3, 1 (2012), 179--183.
[120]
Jie Sun, Erik M Bollt, and Daniel Ben-Avraham. 2008. Graph compression-save information by exploiting redundancy. Journal of Statistical Mechanics: Theory and Experiment, Vol. 2008, 06 (2008), P06001.
[121]
Wen Sun, Achille Fokoue, Kavitha Srinivas, Anastasios Kementsietsidis, Gang Hu, and Guotong Xie. 2015. Sqlgraph: An efficient relational-based property graph store. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 1887--1901.
[122]
Yizhou Sun. 2020. Graph Neural Networks for Graph Search. In GRADES-NDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Portland, OR, USA, June 14, 2020, Akhil Arora, Semih Salihoglu, and Nikolay Yakovets (Eds.). ACM, 1:1. https://doi.org/10.1145/3398682.3399159
[123]
Frank Tetzel, Romans Kasperovics, and Wolfgang Lehner. 2019. Graph Traversals for Regular Path Queries. In Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Amsterdam, The Netherlands, 30 June 2019, Akhil Arora, Arnab Bhattacharya, and George H. L. Fletcher (Eds.). ACM, 5:1--5:8. https://doi.org/10.1145/3327964.3328494
[124]
Stephen Lyle Tu, M Frans Kaashoek, Samuel R Madden, and Nickolai Zeldovich. 2013. Processing analytical queries over encrypted data. (2013).
[125]
Lucien DJ Valstar, George HL Fletcher, and Yuichi Yoshida. 2017. Landmark indexing for evaluation of label-constrained reachability queries. In Proceedings of the 2017 ACM International Conference on Management of Data. 345--358.
[126]
Flavian Vasile, Elena Smirnova, and Alexis Conneau. 2016. Meta-prod2vec: Product embeddings using side-information for recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems. 225--232.
[127]
Alina Vretinaris, Chuan Lei, Vasilis Efthymiou, Xiao Qin, and Fatma Özcan. 2021. Medical Entity Disambiguation Using Graph Neural Networks. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD/PODS '21). Association for Computing Machinery, New York, NY, USA, 2310--2318. https://doi.org/10.1145/3448016.3457328
[128]
Xuhong Wang, Ding Lyu, Mengjian Li, Yang Xia, Qi Yang, Xinwen Wang, Xinguang Wang, Ping Cui, Yupu Yang, Bowen Sun, and Zhenyu Guo. 2021a. APAN: Asynchronous Propagation Attention Network for Real-time Temporal Graph Embedding. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 2628--2638. https://doi.org/10.1145/3448016.3457564
[129]
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 1--12.
[130]
Ye Wang, Qing Wang, Henning Koehler, and Yu Lin. 2021b. Query-by-Sketch: Scaling Shortest Path Graph Queries on Very Large Networks. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 1946--1958. https://doi.org/10.1145/3448016.3452826
[131]
Zeke Wang, Hongjing Huang, Jie Zhang, Fei Wu, and Gustavo Alonso. 2022. FpgaNIC: An FPGA-based Versatile 100Gb SmartNIC for GPUs. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA, 967--986.
[132]
Stanley Wasserman, Katherine Faust, et al. 1994. Social network analysis: Methods and applications. (1994).
[133]
Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of ?small-world'networks. nature, Vol. 393, 6684 (1998), 440--442.
[134]
Jack Waudby, Benjamin A. Steer, Arnau Prat-Pé rez, and Gá bor Szá rnyas. 2020. Supporting Dynamic Graphs and Temporal Entity Deletions in the LDBC Social Network Benchmark's Data Generator. In GRADES-NDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Portland, OR, USA, June 14, 2020, Akhil Arora, Semih Salihoglu, and Nikolay Yakovets (Eds.). ACM, 8:1--8:8. https://doi.org/10.1145/3398682.3399165
[135]
Reynold S Xin, Joseph E Gonzalez, Michael J Franklin, and Ion Stoica. 2013. Graphx: A resilient distributed graph system on spark. In First international workshop on graph data management experiences and systems. 1--6.
[136]
Konstantinos Xirogiannopoulos and Amol Deshpande. 2017. Extracting and analyzing hidden graphs from relational databases. In Proceedings of the 2017 ACM International Conference on Management of Data. 897--912.
[137]
Konstantinos Xirogiannopoulos, Udayan Khurana, and Amol Deshpande. 2015. Graphgen: Exploring interesting graphs in relational data. Proceedings of the VLDB Endowment, Vol. 8, 12 (2015), 2032--2035.
[138]
Chi Yang, Xuyun Zhang, Changmin Zhong, Chang Liu, Jian Pei, Kotagiri Ramamohanarao, and Jinjun Chen. 2014. A spatiotemporal compression based approach for efficient big data processing on cloud. J. Comput. System Sci., Vol. 80, 8 (2014), 1563--1583.
[139]
Feng Zhang, Yihua Hu, Haipeng Ding, Zhiming Yao, Zhewei Wei, Xiao Zhang, and Xiaoyong Du. 2022a. Optimizing random access to hierarchically-compressed data on GPU. In 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE Computer Society, 233--247.
[140]
Feng Zhang, Zaifeng Pan, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2021a. G-TADOC: Enabling efficient GPU-based text analytics without decompression. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1679--1690.
[141]
Feng Zhang, Weitao Wan, Chenyang Zhang, Jidong Zhai, Yunpeng Chai, Haixiang Li, and Xiaoyong Du. 2022b. CompressDB: Enabling Efficient Compressed Data Direct Processing for Various Databases. In Proceedings of the 2022 International Conference on Management of Data. 1655--1669.
[142]
Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Wenguang Chen. 2018a. Efficient document analytics on compressed data: Method, challenges, algorithms, insights. Proceedings of the VLDB Endowment, Vol. 11, 11 (2018), 1522--1535.
[143]
Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Wenguang Chen. 2018b. Zwift: A programming framework for high performance text analytics on compressed data. In Proceedings of the 2018 International Conference on Supercomputing. 195--206.
[144]
Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2020c. Enabling efficient random access to hierarchically-compressed data. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1069--1080.
[145]
Feng Zhang, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du. 2022c. POCLib: a high-performance framework for enabling near orthogonal processing on compression. IEEE Transactions on Parallel and Distributed Systems, Vol. 33, 2 (2022), 459--475.
[146]
Feng Zhang, Jidong Zhai, Xipeng Shen, Dalin Wang, Zheng Chen, Onur Mutlu, Wenguang Chen, and Xiaoyong Du. 2021c. TADOC: Text analytics directly on compression. The VLDB Journal, Vol. 30, 2 (2021), 163--188.
[147]
Feng Zhang, Jidong Zhai, Bo Wu, Bingsheng He, Wenguang Chen, and Xiaoyong Du. 2019. Automatic irregularity-aware fine-grained workload partitioning on integrated architectures. IEEE Transactions on Knowledge and Data Engineering (2019).
[148]
Wentao Zhang, Xupeng Miao, Yingxia Shao, Jiawei Jiang, Lei Chen, Olivier Ruas, and Bin Cui. 2020a. Reliable Data Distillation on Graph Convolutional Network. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14--19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1399--1414. https://doi.org/10.1145/3318464.3389706
[149]
Wentao Zhang, Yu Shen, Yang Li, Lei Chen, Zhi Yang, and Bin Cui. 2021b. ALG: Fast and Accurate Active Learning Framework for Graph Convolutional Networks. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 2366--2374. https://doi.org/10.1145/3448016.3457325
[150]
Xiaofei Zhang, M. Tamer Ö zsu, and Lei Chen. 2020b. ELite: Cost-effective Approximation of Exploration-based Graph Analysis. In GRADES-NDA'20: Proceedings of the 3rd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Portland, OR, USA, June 14, 2020, Akhil Arora, Semih Salihoglu, and Nikolay Yakovets (Eds.). ACM, 6:1--6:10. https://doi.org/10.1145/3398682.3399164
[151]
Yu Zhang, Feng Zhang, Hourun Li, Shuhao Zhang, and Xiaoyong Du. 2023. CompressStreamDB: Fine-Grained Adaptive Stream Processing without Decompression. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE.
[152]
Jianlong Zhong and Bingsheng He. 2013. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, Vol. 25, 6 (2013), 1543--1552.
[153]
Amelie Chi Zhou, Juanyun Luo, Ruibo Qiu, Haobin Tan, Bingsheng He, and Rui Mao. 2022. Adaptive Partitioning for Large-Scale Graph Analytics in Geo-Distributed Data Centers. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). 2818--2830. https://doi.org/10.1109/ICDE53745.2022.00256
[154]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In 12th $$USENIX$$ symposium on operating systems design and implementation ($$OSDI$$ 16). 301--316.
[155]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. Gridgraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In 2015 $$USENIX$$ Annual Technical Conference ($$USENIX$$$$ATC$$ 15). 375--386.
[156]
Yuqing Zhu, Jing Tang, and Xueyan Tang. 2020. Pricing Influential Nodes in Online Social Networks. Proc. VLDB Endow., Vol. 13, 10 (2020), 1614--1627. https://doi.org/10.14778/3401960.3401961

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Badges

Author Tags

  1. compressed data direct processing
  2. compression
  3. graph analytic

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)321
  • Downloads (Last 6 weeks)29
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media