skip to main content
research-article

Reduction of large-scale graphs: : Effective edge shedding at a controllable ratio under resource constraints

Published: 15 March 2022 Publication History

Abstract

As technology advances, many complicated systems can be represented by networks/graphs. However, when using limited computing resources such as portable computers or personal desktop computers, users are not able to store and mine large-scale graphs due to the unparalleled growth of the amount of data we generate. In order to address this challenge, we present effective edge shedding. Effective edge shedding can reduce the amount of data to be processed and the corresponding storage space while speeding up graph algorithms and queries, thereby supporting interactive analysis, helping knowledge discovery, and eliminating noise. In this paper, to extract the underlying features of a graph, we present two effective edge shedding methods on the basis of preserving the expected vertex degree. Both methods allow users to control the edge shedding process, thus generating a reduced graph of the predefined size based on the computing resource constraint. Using four real-world datasets in different domains, we performed an extensive experimental evaluation of our methods and compared them with the state-of-the-art graph summarization method on seven graph analysis tasks. The results indicate that our methods can achieve up to 58.6% higher accuracy on graph analysis tasks compared with the state-of-the-art method. For very large datasets, our methods consumes only 0.3% of the running time of the competitive method when generating the reduced graph. The above results fully illustrate the advantages of our methods.

References

[1]
Department S.R., Facebook: number of monthly active users worldwide 2008–2021, 2021, https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide.
[2]
[3]
Shi W., Cao J., Zhang Q., Li Y., Xu L., Edge computing: Vision and challenges, IEEE Internet Things J. 3 (5) (2016) 637–646.
[4]
LeFevre K., Terzi E., Grass: Graph structure summarization, in: Proceedings Of The 10th SIAM International Conference On Data Mining, SDM 2010, 2010, pp. 454–465.
[5]
Riondato M., García-Soriano D., Bonchi F., Graph summarization with quality guarantees, Data Mining Knowl. Discov. 31 (2) (2017) 314–349.
[6]
Maccioni A., Abadi D.J., Scalable pattern matching over compressed graphs via dedensification, in: Proceedings Of The ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, 2016, pp. 1755–1764. 13-17-August-2016.
[7]
Fan W., Li J., Wang X., Wu Y., Query preserving graph compression, in: Proceedings Of The ACM SIGMOD International Conference On Management Of Data, 2012, pp. 157–168.
[8]
Ashwin Kumar K., Efstathopoulos P., Utility-driven graph summarization, Proc. VLDB Endow. 12 (4) (2018) 335–347.
[9]
Navlakha S., Rastogi R., Shrivastava N., Graph summarization with bounded error, in: Proceedings Of The ACM SIGMOD International Conference On Management Of Data, 2008, pp. 419–431.
[10]
Ahnert S.E., Power graph compression reveals dominant relationships in genetic transcription networks, Mol. Biosyst. 9 (11) (2013) 2681–2685.
[11]
Lee K., Jo H., Ko J., Lim S., Shin K., Ssumm: Sparse summarization of massive graphs, in: Proceedings Of The ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, 2020, pp. 144–154.
[12]
Shen Z., Ma K.L., Eliassi-Rad T., Visual analysis of large heterogeneous social networks by semantic and structural abstraction, IEEE Trans. Vis. Comput. Graphics 12 (6) (2006) 1427–1439.
[13]
Li C.T., Lin S.D., Egocentric information abstraction for heterogeneous social networks, 2009, pp. 255–260.
[14]
Hu P., Lau W.C., A survey and taxonomy of graph sampling, 2013, pp. 1–34.
[15]
Liu Y., Safavi T., Dighe A., Koutra D., Graph summarization methods and applications: A survey, ACM Comput. Surv. 51 (3) (2018),.
[16]
Lu J., Wang H., Variance reduction in large graph sampling, Inf. Process. Manage. 50 (3) (2014) 476–491,.
[17]
Stumpf M.P., Wiuf C., May R.M., Subnets of scale-free networks are not scale-free: Sampling properties of networks, Proc. Natl. Acad. Sci. 102 (12) (2005) 4221–4224,.
[18]
Zhang L., Jiang H., Wang F., Feng D., Draws: A dual random-walk based sampling method to efficiently estimate distributions of degree and clique size over social networks, Knowl.-Based Syst. 198 (2020),.
[19]
Yoon S.H., Kim K.N., Hong J., Kim S.W., Park S., A community-based sampling method using DPL for online social networks, Inform. Sci. 306 (2015) 53–69,.
[20]
Tong C., Lian Y., Niu J., Xie Z., Zhang Y., A novel green algorithm for sampling complex networks, J. Netw. Comput. Appl. 59 (2016) 55–62.
[21]
A. Vattani, D. Chakrabarti, M. Gurevich, Preserving personalized pagerank in subgraphs, in: Proceedings Of The 28th International Conference On Machine Learning, ICML 2011, ISBN: 9781450306195, 2011, pp. 793–800.
[22]
Parchas P., Gullo F., Papadias D., Bonchi F., The pursuit of a good possible world: Extracting representative instances of uncertain graphs, in: Proceedings Of The ACM SIGMOD International Conference On Management Of Data, 2014, pp. 967–978,.
[23]
Zeng Y., Song C., Ge T., Selective edge shedding in large graphs under resource constraints, Proc. Int. Conf. Data Eng. (2021) 2057–2062. 2021-April.
[24]
Tang N., Chen Q., Mitra P., Graph stream summarization: From big bang to big crunch, 2016, pp. 1481–1496. 26-June-20.
[25]
Gou X., Zou L., Zhao C., Yang T., Fast and accurate graph stream summarization, Proc. Int. Conf. Data Eng. (2019) 1118–1129. 2019-April.
[26]
Rezvanian A., Meybodi M.R., Sampling algorithms for stochastic graphs: A learning automata approach, Knowl.-Based Syst. 127 (2017) 126–144.
[27]
Zhang J.W., Tay Y.C., GSCALER: SYnthetically scaling a given graph, Adv. Database Technol. - EDBT (i) (2016) 53–64. 2016-March.
[28]
Musaafir A., Uta A., Dreuning H., Varbanescu A.L., A sampling-based tool for scaling graph datasets, in: ICPE 2020 - Proceedings Of The ACM/SPEC International Conference On Performance Engineering, 2020, pp. 289–300.
[29]
Bu T., Towsley D., On distinguishing between internet power law topology generators, Proc. - IEEE INFOCOM 2 (c) (2002) 638–647.
[30]
Mahadevan P., Krioukov D., Fall K., Vahdat A., Systematic Topology Analysis and Generation Using Degree Correlations, 2006, p. 135.
[31]
Mihail M., Vishnoi N.K., On generating graphs with prescribed vertex degrees for complex network modeling, in: Position Paper, Approx. And Randomized Algorithms For Communication Networks (ARACNE), Vol. 142, 2002, p. 2865.
[32]
Barthélemy M., Betweenness centrality in large complex networks, Eur. Phys. J. B 38 (2) (2004) 163–168.
[33]
Brandes U., A faster algorithm for betweenness centrality, J. Math. Soc. 25 (2) (2001) 163–177.
[34]
Hougardy S., Linear time approximation algorithms for degree constrained subgraph problems, in: Research Trends In Combinatorial Optimization: Bonn 2008, 2009, pp. 185–200.
[35]
Leskovec J., Sosič R., SNAP: A General-purpose network analysis and graph-mining library, ACM Trans. Intell. Syst. Technol. 8 (1) (2016).
[36]
Zeng Y., Song C., Ge T., Selective edge shedding in large graphs under ResourceConstraints, 2020, https://github.com/ZYLpro/Selective-Edge-Shedding-in-Large-Graphs-Under-Resource-Constraints/.
[37]
Leskovec J., Krevl A., SNAP Datasets: Stanford large network dataset collection, 2014, http://snap.stanford.edu/data.
[38]
Grover A., Leskovec J., Node2vec: Scalable feature learning for networks, in: Proceedings Of The ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, 2016, pp. 855–864. 13-17-Augu.

Cited By

View all

Index Terms

  1. Reduction of large-scale graphs: Effective edge shedding at a controllable ratio under resource constraints
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Knowledge-Based Systems
      Knowledge-Based Systems  Volume 240, Issue C
      Mar 2022
      830 pages

      Publisher

      Elsevier Science Publishers B. V.

      Netherlands

      Publication History

      Published: 15 March 2022

      Author Tags

      1. Degree distribution
      2. Edge shedding
      3. Graph reduction
      4. Limited resources

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media