skip to main content
research-article

A community-based sampling method using DPL for online social networks

Published: 10 June 2015 Publication History

Abstract

In this paper, we propose a new graph sampling method for online social networks that achieves the following. First, a sample graph should reflect the ratio between the number of nodes and the number of edges of the original graph. Second, a sample graph should reflect the topology of the original graph. Third, sample graphs should be consistent with each other when they are sampled from the same original graph. The proposed method employs two techniques: hierarchical community extraction and densification power law. The proposed method partitions the original graph into a set of communities to preserve the topology of the original graph. It also uses the densification power law which captures the ratio between the number of nodes and the number of edges in online social networks. In experiments, we use several real-world online social networks, create sample graphs using the existing methods and ours, and analyze the differences between the sample graph by each sampling method and the original graph.

References

[1]
E. Achtert, S. Goldhofer, H.P. Kriegel, E. Schubert, A. Zimek, Evaluation of clusterings-metrics and visual support, in: Proceedings of the 28th IEEE International Conference on Data Engineering, IEEE, pp. 1285-1288.
[2]
R. Albert, H. Jeong, A.L. Barabasi, Internet: diameter of the world-wide web, Nature, 401 (1999) 130-131.
[3]
D.H. Bae, S.M. Hwang, S.W. Kim, C. Faloutsos, On constructing seminal paper genealogy, IEEE Trans. Cybernet., 44 (2014) 54-65.
[4]
A.L. Barabási, J. Frangos, Linked: The New Science Of Networks, 2002.
[5]
D. Chakrabarti, S. Papadimitriou, D.S. Modha, C. Faloutsos, Fully automatic cross-associations, in: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 79-88.
[6]
A. Clauset, M.E.J. Newman, C. Moore, Finding community structure in very large networks, Phys. Rev. E, 70 (2004) 066111.
[7]
S. Das, O. Egecioglu, A. El Abbadi, Anónimos: an lp-based approach for anonymizing weighted social network graphs, IEEE Trans. Knowl. Data Eng., 24 (2012) 590-604.
[8]
R. Drezewski, J. Sepielak, W. Filipkowski, The application of social network analysis algorithms in a system supporting money laundering detection, Inform. Sci., 295 (2015) 18-32.
[9]
M. Faloutsos, P. Faloutsos, C. Faloutsos, On power-law relationships of the Internet topology, ACM SIGCOMM Comput. Commun. Rev., 29 (1999) 251-262.
[10]
J. Ha, S.W. Kim, S.W. Kim, C. Faloutsos, S. Park, An analysis on information diffusion through BlogCast in a blogosphere, Inform. Sci., 290 (2015) 45-62.
[11]
J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, Morgan kaufman, 2006.
[12]
C. Hubler, H.P. Kriegel, K. Borgwardt, Z. Ghahramani, Metropolis algorithms for representative subgraph sampling, in: Proceedings of the 8th IEEE International Conference on Data Mining, 2008, pp. 283-292.
[13]
G. Karypis, E. Han, V. Kumar, Chameleon: hierarchical clustering using dynamic modeling, Computer, 32 (1999) 68-75.
[14]
G. Karypis, V. Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20 (1998) 359-392.
[15]
L. Katzir, E. Liberty, O. Somekh, Estimating sizes of social networks via biased sampling, in: Proceedings of the 20th International Conference on World Wide Web, 2011, pp. 597-606.
[16]
M.G. Kendall, A new measure of rank correlation, Biometrika (1938) 81-93.
[17]
J.M. Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM, 46 (1999) 604-632.
[18]
F. Korn, H.V. Jagadish, C. Faloutsos, Efficiently supporting ad hoc queries in large datasets of time sequences, in: Proceedings ACM SIGMOD International Conference on Management of Data, 1997, pp. 289-300.
[19]
V. Krishnamurthy, M. Faloutsos, M. Chrobak, J. Cui, L. Lao, A.G. Percus, Sampling large Internet topologies for simulation purposes, Comput. Netw., 51 (2007) 4284-4302.
[20]
R. Kumar, J. Novak, A. Tomkins, Structure and evolution of online social networks, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 611-617.
[21]
S.H. Lee, P. Kim, H. Jeong, Statistical properties of sampled networks, Phys. Rev. E, 73 (2006) 016102.
[22]
J. Leskovec, C. Faloutsos, Sampling from large graphs, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, pp. 631-636.
[23]
J. Leskovec, D. Huttenlocher, J. Kleinberg, Predicting positive and negative links in online social networks, in: Proceedings of the 19th International Conference on World Wide Web, 2010a, pp. 641-650.
[24]
J. Leskovec, D. Huttenlocher, J. Kleinberg, Signed networks in social media, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2010b, pp. 1361-1370.
[25]
J. Leskovec, J. Kleinberg, C. Faloutsos, Graphs over time: densification laws, shrinking diameters and possible explanations, in: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005, pp. 177-187.
[26]
J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst, Patterns of cascading behavior in large blog graphs, in: Proceedings of the 7th SIAM International Conference on Data Mining, 2007, pp. 551-556.
[27]
Y. Li, M. Qian, D. Jin, P. Hui, A.V. Vasilakos, Revealing the efficiency of information diffusion in online social networks of microblog, Inform. Sci., 293 (2015) 383-389.
[28]
Y.M. Li, H.W. Hsiao, Y.L. Lee, Recommending social network applications via social filtering mechanisms, Inform. Sci., 239 (2013) 18-30.
[29]
S.H. Lim, S.W. Kim, S. Park, J.H. Lee, Determining content power users in a blog network: an approach and its applications, IEEE Trans. Syst. Man Cybernet. Part A: Syst. Hum., 41 (2011) 853-862.
[30]
M.E.J. Newman, Analysis of weighted networks, Phys. Rev. E, 70 (2004) 056131.
[31]
M.E.J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E, 69 (2004) 066133.
[32]
L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web., Technical Report 1999-66, Stanford InfoLab, 1999.
[33]
B. Ribeiro, D. Towsley, Estimating and sampling graphs with multidimensional random walks, in: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, 2010, pp. 390-403.
[34]
M. Richardson, R. Agrawal, P. Domingos, Trust management for the semantic web, in: Proceedings of the 2nd International Semantic Web Conference, 2003, pp. 351-368.
[35]
X. Ying, L. Wu, X. Wu, A spectrum-based framework for quantifying randomness of social networks, IEEE Trans. Knowl. Data Eng., 23 (2011) 1842-1856.
[36]
S.H. Yoon, J.S. Kim, J. Ha, S.W. Kim, M. Ryu, H.J. Choi, Link-based similarity measures using reachability vectors, Sci. World J., 2014 (2014).
[37]
S.H. Yoon, J.H. Shin, S.W. Kim, S. Park, Extraction of a latent blog community based on subject, in: Proceedings of the 18th ACM International Conference on Information and Knowledge Management, 2009, pp. 1529-1532.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 306, Issue C
June 2015
180 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 10 June 2015

Author Tags

  1. Densification power law
  2. Graph sampling
  3. Online social network

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media