research-article

A community-based sampling method using DPL for online social networks

Authors:

Sunju ParkAuthors Info & Claims

Information Sciences—Informatics and Computer Science, Intelligent Systems, Applications: An International Journal, Volume 306, Issue C

Pages 53 - 69

https://doi.org/10.1016/j.ins.2015.02.014

Published: 10 June 2015 Publication History

Abstract

In this paper, we propose a new graph sampling method for online social networks that achieves the following. First, a sample graph should reflect the ratio between the number of nodes and the number of edges of the original graph. Second, a sample graph should reflect the topology of the original graph. Third, sample graphs should be consistent with each other when they are sampled from the same original graph. The proposed method employs two techniques: hierarchical community extraction and densification power law. The proposed method partitions the original graph into a set of communities to preserve the topology of the original graph. It also uses the densification power law which captures the ratio between the number of nodes and the number of edges in online social networks. In experiments, we use several real-world online social networks, create sample graphs using the existing methods and ours, and analyze the differences between the sample graph by each sampling method and the original graph.

References

[1]

E. Achtert, S. Goldhofer, H.P. Kriegel, E. Schubert, A. Zimek, Evaluation of clusterings-metrics and visual support, in: Proceedings of the 28th IEEE International Conference on Data Engineering, IEEE, pp. 1285-1288.

[2]

R. Albert, H. Jeong, A.L. Barabasi, Internet: diameter of the world-wide web, Nature, 401 (1999) 130-131.

[3]

D.H. Bae, S.M. Hwang, S.W. Kim, C. Faloutsos, On constructing seminal paper genealogy, IEEE Trans. Cybernet., 44 (2014) 54-65.

[4]

A.L. Barabási, J. Frangos, Linked: The New Science Of Networks, 2002.

[5]

D. Chakrabarti, S. Papadimitriou, D.S. Modha, C. Faloutsos, Fully automatic cross-associations, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 79-88.

[6]

A. Clauset, M.E.J. Newman, C. Moore, Finding community structure in very large networks, Phys. Rev. E, 70 (2004) 066111.

[7]

S. Das, O. Egecioglu, A. El Abbadi, Anónimos: an lp-based approach for anonymizing weighted social network graphs, IEEE Trans. Knowl. Data Eng., 24 (2012) 590-604.

Digital Library

[8]

R. Drezewski, J. Sepielak, W. Filipkowski, The application of social network analysis algorithms in a system supporting money laundering detection, Inform. Sci., 295 (2015) 18-32.

Digital Library

[9]

M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law relationships of the Internet topology, ACM SIGCOMM Comput. Commun. Rev., 29 (1999) 251-262.

Digital Library

[10]

J. Ha, S.W. Kim, S.W. Kim, C. Faloutsos, S. Park, An analysis on information diffusion through BlogCast in a blogosphere, Inform. Sci., 290 (2015) 45-62.

Digital Library

[11]

J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Morgan kaufman, 2006.

[12]

C. Hubler, H.P. Kriegel, K. Borgwardt, Z. Ghahramani, Metropolis algorithms for representative subgraph sampling, in: Proceedings of the 8th IEEE International Conference on Data Mining, 2008, pp. 283-292.

Digital Library

[13]

G. Karypis, E. Han, V. Kumar, Chameleon: hierarchical clustering using dynamic modeling, Computer, 32 (1999) 68-75.

Digital Library

[14]

G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20 (1998) 359-392.

Digital Library

[15]

L. Katzir, E. Liberty, O. Somekh, Estimating sizes of social networks via biased sampling, in: Proceedings of the 20th International Conference on World Wide Web, 2011, pp. 597-606.

[16]

M.G. Kendall, A new measure of rank correlation, Biometrika (1938) 81-93.

[17]

J.M. Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM, 46 (1999) 604-632.

Digital Library

[18]

F. Korn, H.V. Jagadish, C. Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, in: Proceedings ACM SIGMOD International Conference on Management of Data, 1997, pp. 289-300.

[19]

V. Krishnamurthy, M. Faloutsos, M. Chrobak, J. Cui, L. Lao, A.G. Percus, Sampling large Internet topologies for simulation purposes, Comput. Netw., 51 (2007) 4284-4302.

Digital Library

[20]

R. Kumar, J. Novak, A. Tomkins, Structure and evolution of online social networks, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 611-617.

Digital Library

[21]

S.H. Lee, P. Kim, H. Jeong, Statistical properties of sampled networks, Phys. Rev. E, 73 (2006) 016102.

[22]

J. Leskovec, C. Faloutsos, Sampling from large graphs, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 631-636.

Digital Library

[23]

J. Leskovec, D. Huttenlocher, J. Kleinberg, Predicting positive and negative links in online social networks, in: Proceedings of the 19th International Conference on World Wide Web, 2010a, pp. 641-650.

Digital Library

[24]

J. Leskovec, D. Huttenlocher, J. Kleinberg, Signed networks in social media, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2010b, pp. 1361-1370.

Digital Library

[25]

J. Leskovec, J. Kleinberg, C. Faloutsos, Graphs over time: densification laws, shrinking diameters and possible explanations, in: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005, pp. 177-187.

[26]

J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst, Patterns of cascading behavior in large blog graphs, in: Proceedings of the 7th SIAM International Conference on Data Mining, 2007, pp. 551-556.

[27]

Y. Li, M. Qian, D. Jin, P. Hui, A.V. Vasilakos, Revealing the efficiency of information diffusion in online social networks of microblog, Inform. Sci., 293 (2015) 383-389.

[28]

Y.M. Li, H.W. Hsiao, Y.L. Lee, Recommending social network applications via social filtering mechanisms, Inform. Sci., 239 (2013) 18-30.

Digital Library

[29]

S.H. Lim, S.W. Kim, S. Park, J.H. Lee, Determining content power users in a blog network: an approach and its applications, IEEE Trans. Syst. Man Cybernet. Part A: Syst. Hum., 41 (2011) 853-862.

Digital Library

[30]

M.E.J. Newman, Analysis of weighted networks, Phys. Rev. E, 70 (2004) 056131.

[31]

M.E.J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E, 69 (2004) 066133.

[32]

L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web., Technical Report 1999-66, Stanford InfoLab, 1999.

[33]

B. Ribeiro, D. Towsley, Estimating and sampling graphs with multidimensional random walks, in: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, 2010, pp. 390-403.

[34]

M. Richardson, R. Agrawal, P. Domingos, Trust management for the semantic web, in: Proceedings of the 2nd International Semantic Web Conference, 2003, pp. 351-368.

Digital Library

[35]

X. Ying, L. Wu, X. Wu, A spectrum-based framework for quantifying randomness of social networks, IEEE Trans. Knowl. Data Eng., 23 (2011) 1842-1856.

Digital Library

[36]

S.H. Yoon, J.S. Kim, J. Ha, S.W. Kim, M. Ryu, H.J. Choi, Link-based similarity measures using reachability vectors, Sci. World J., 2014 (2014).

[37]

S.H. Yoon, J.H. Shin, S.W. Kim, S. Park, Extraction of a latent blog community based on subject, in: Proceedings of the 18th ACM International Conference on Information and Knowledge Management, 2009, pp. 1529-1532.

Cited By

Jiao BLu XXia JGupta BBao LZhou Q(2023)Hierarchical Sampling for the Visualization of Large Scale-Free GraphsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320156729:12(5111-5123)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TVCG.2022.3201567
Di Tizio GSiu GHutchings AMassacci F(2023)A Graph-Based Stratified Sampling Methodology for the Analysis of (Underground) ForumsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.330442418(5473-5483)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIFS.2023.3304424
Zeng YSong CGe TZhang Y(2022)Reduction of large-scale graphsKnowledge-Based Systems10.1016/j.knosys.2022.108126240:COnline publication date: 15-Mar-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.108126
Show More Cited By

Recommendations

Sampling in online social networks
SAC '14: Proceedings of the 29th Annual ACM Symposium on Applied Computing

In this paper, we propose a new graph sampling method for online social networks that achieves the following. First, a sample graph should reflect the ratio between the number of nodes and the number of edges of the original graph. Second, a sample ...
Unbiased sampling in directed social graph
SIGCOMM '10

Microblogging services, such as Twitter, are among the most important online social networks(OSNs). Different from OSNs such as Facebook, the topology of microblogging service is a directed graph instead of an undirected graph. Recently, due to the ...
Unbiased sampling in directed social graph
SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010 conference

Microblogging services, such as Twitter, are among the most important online social networks(OSNs). Different from OSNs such as Facebook, the topology of microblogging service is a directed graph instead of an undirected graph. Recently, due to the ...

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal

Information Sciences: an International Journal Volume 306, Issue C

June 2015

180 pages

ISSN:0020-0255

Issue’s Table of Contents

Copyright © Elsevier Inc.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 10 June 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiao BLu XXia JGupta BBao LZhou Q(2023)Hierarchical Sampling for the Visualization of Large Scale-Free GraphsIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.320156729:12(5111-5123)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1109/TVCG.2022.3201567
Di Tizio GSiu GHutchings AMassacci F(2023)A Graph-Based Stratified Sampling Methodology for the Analysis of (Underground) ForumsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.330442418(5473-5483)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1109/TIFS.2023.3304424
Zeng YSong CGe TZhang Y(2022)Reduction of large-scale graphsKnowledge-Based Systems10.1016/j.knosys.2022.108126240:COnline publication date: 15-Mar-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.108126
Roohollahi SKhatibi Bardsiri AKeynia F(2022)Sampling in weighted social networks using a levy flight-based learning automataThe Journal of Supercomputing10.1007/s11227-021-03905-278:1(1458-1478)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1007/s11227-021-03905-2
Chen SRandyanto YCheng S(2019)Fuzzy queries processing based on intuitionistic fuzzy social relational networksInformation Sciences: an International Journal10.1016/j.ins.2015.07.054327:C(110-124)Online publication date: 6-Jan-2019
https://dl.acm.org/doi/10.1016/j.ins.2015.07.054
Han TTian YLan YLi FXiao L(2018)Revealing the densest communities of social networks efficiently through intelligent data space reductionExpert Systems with Applications: An International Journal10.1016/j.eswa.2017.10.04794:C(70-80)Online publication date: 15-Mar-2018
https://dl.acm.org/doi/10.1016/j.eswa.2017.10.047
Bhatia VRani R(2018)DFuzzyKnowledge and Information Systems10.1007/s10115-018-1156-357:1(159-181)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1007/s10115-018-1156-3
Rezvanian AMeybodi M(2017)Sampling algorithms for stochastic graphsKnowledge-Based Systems10.1016/j.knosys.2017.04.012127:C(126-144)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1016/j.knosys.2017.04.012
Lu JWang H(2017)Uniform random sampling not recommended for large graph size estimationInformation Sciences: an International Journal10.1016/j.ins.2017.08.030421:C(136-153)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1016/j.ins.2017.08.030
Li YGai KQiu LQiu MZhao H(2017)Intelligent cryptography approach for secure distributed big data storage in cloud computingInformation Sciences: an International Journal10.1016/j.ins.2016.09.005387:C(103-115)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1016/j.ins.2016.09.005

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents