skip to main content
10.1145/3308558.3313626acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Link Prediction in Networks with Core-Fringe Data

Published: 13 May 2019 Publication History

Abstract

Data collection often involves the partial measurement of a larger system. A common example arises in collecting network data: we often obtain network datasets by recording all of the interactions among a small set of core nodes, so that we end up with a measurement of the network consisting of these core nodes along with a potentially much larger set of fringe nodes that have links to the core. Given the ubiquity of this process for assembling network data, it is crucial to understand the role of such a “core-fringe” structure.
Here we study how the inclusion of fringe nodes affects the standard task of network link prediction. One might initially think the inclusion of any additional data is useful, and hence that it should be beneficial to include all fringe nodes that are available. However, we find that this is not true; in fact, there is substantial variability in the value of the fringe nodes for prediction. Once an algorithm is selected, in some datasets, including any additional data from the fringe can actually hurt prediction performance; in other datasets, including some amount of fringe information is useful before prediction performance saturates or even declines; and in further cases, including the entire fringe leads to the best performance. While such variety might seem surprising, we show that these behaviors are exhibited by simple random graph models.

References

[1]
Emmanuel Abbe. 2018. Community Detection and Stochastic Block Models: Recent Developments. Journal of Machine Learning Research 18, 177 (2018), 1-86. http://jmlr.org/papers/v18/16-480.html
[2]
Emmanuel Abbe, Afonso S. Bandeira, and Georgina Hall. 2016. Exact Recovery in the Stochastic Block Model. IEEE Transactions on Information Theory 62, 1 (2016), 471-487.
[3]
Lada A Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social networks 25, 3 (2003), 211-230.
[4]
Lars Backstrom and Jure Leskovec. 2011. Supervised Random Walks: Predicting and Recommending Links in Social Networks. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. ACM, 635-644.
[5]
A.L Barabási, H Jeong, Z Ne´da, E Ravasz, A Schubert, and T Vicsek. 2002. Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications 311, 3-4(2002), 590-614.
[6]
Austin R. Benson and Jon Kleinberg. 2018. Found Graph Data and Planted Vertex Covers. In Advances in Neural Information Processing Systems.
[7]
Stephen P Borgatti and Martin G Everett. 2000. Models of core/periphery structures. Social Networks 21, 4 (2000), 375-395.
[8]
Aaron Clauset, Cristopher Moore, and M. E. J. Newman. 2008. Hierarchical structure and the prediction of missing links in networks. Nature 453, 7191 (2008).
[9]
Nick Craswell, Arjen P de Vries, and Ian Soboroff. 2005. Overview of the TREC 2005 Enterprise Track. In TREC, Vol. 5. 199-205.
[10]
Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. 2011. Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E 84, 6 (2011).
[11]
Patrick Doreian. 1985. Structural equivalence in a psychology journal network. Journal of the American Society for Information Science 36, 6(1985), 411-417.
[12]
Nathan Eagle and Alex (Sandy) Pentland. 2005. Reality mining: sensing complex social systems. Personal and Ubiquitous Computing 10, 4 (2005), 255-268.
[13]
Amir Ghasemian, Homa Hosseinmardi, and Aaron Clauset. 2018. Evaluating overfit and underfit in models of network community structure. arXiv:1802.10582 (2018).
[14]
Ashish Goel, Aneesh Sharma, Dong Wang, and Zhijun Yin. 2013. Discovering similar users on Twitter. In 11th Workshop on Mining and Learning with Graphs.
[15]
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: the who to follow service at Twitter. In Proceedings of the 22nd international conference on World Wide Web. ACM Press.
[16]
Petter Holme. 2005. Core-periphery organization of complex networks. Physical Review E 72, 4 (2005).
[17]
Myunghwan Kim and Jure Leskovec. 2011. The Network Completion Problem: Inferring Missing Nodes and Edges in Networks. In Proceedings of the SIAM Conference on Data Mining. Society for Industrial and Applied Mathematics, 47-58.
[18]
Jon Kleinberg. 2006. Complex Networks and Decentralized Search Algorithms. In Proceedings of the International Congress of Mathematicians.
[19]
Bryan Klimt and Yiming Yang. 2004. The Enron Corpus: A New Dataset for Email Classification Research. In Machine Learning: ECML 2004. Springer Berlin Heidelberg, 217-226.
[20]
Gueorgi Kossinets. 2006. Effects of missing data in social networks. Social Networks 28, 3 (2006), 247-268.
[21]
Edward O Laumann, Peter V Marsden, and David Prensky. 1989. The boundary specification problem in network analysis. Research methods in social network analysis 61 (1989), 87.
[22]
Edward O. Laumann and Franz U. Pappi. 1976. Networks of collective action: A perspective on community influence systems (Quantitative studies in social relations). Academic Press.
[23]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1, 1 (2007), 2-es.
[24]
David Liben-Nowell and Jon Kleinberg. 2007. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology 58, 7 (2007), 1019-1031.
[25]
D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. 2005. Geographic routing in social networks. Proceedings of the National Academy of Sciences 102, 33 (aug 2005), 11623-11628.
[26]
Linyuan Lü and Tao Zhou. 2011. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications 390, 6(2011), 1150-1170.
[27]
Elchanan Mossel, Joe Neeman, and Allan Sly. 2014. Belief propagation, robust reconstruction and optimal recovery of block models. In Conference on Learning Theory. 356-370.
[28]
Anatole Rapoport. 1953. Spread of information through a population with socio-structural bias I: Assumption of transitivity. Bulletin of Mathematical Biophysics 15, 4 (Dec. 1953), 523-533.
[29]
Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Re´. 2017. Snorkel: rapid training data creation with weak supervision. Proceedings of the VLDB Endowment 11, 3 (2017), 269-282.
[30]
Alexander J Ratner, Christopher M De Sa, Sen Wu, Daniel Selsam, and Christopher Re´. 2016. Data programming: Creating large training sets, quickly. In Advances in Neural Information Processing Systems. 3567-3575.
[31]
Puck Rombach, Mason A. Porter, James H. Fowler, and Peter J. Mucha. 2017. Core-Periphery Structure in Networks (Revisited). SIAM Rev. 59, 3 (2017), 619-646.
[32]
Daniel M. Romero, Brian Uzzi, and Jon M. Kleinberg. 2016. Social Networks Under Stress. In Proc. International World Wide Web Conference. 9-20.
[33]
Purnamrita Sarkar, Deepayan Chakrabarti, 2015. The consistency of common neighbors for link prediction in stochastic blockmodels. In Advances in Neural Information Processing Systems. 3016-3024.
[34]
Purnamrita Sarkar, Deepayan Chakrabarti, and Andrew W. Moore. 2011. Theoretical Justification of Popular Link Prediction Heuristics. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence. AAAI Press, 2722-2727.
[35]
Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393(1998), 440-442.
[36]
Hao Yin, Austin R. Benson, Jure Leskovec, and David F. Gleich. 2017. Local Higher-Order Graph Clustering. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 555-564.
[37]
Xiao Zhang, Travis Martin, and M. E. J. Newman. 2015. Identification of core-periphery structure in networks. Physical Review E 91, 3 (2015).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '19: The World Wide Web Conference
May 2019
3620 pages
ISBN:9781450366748
DOI:10.1145/3308558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '19
WWW '19: The Web Conference
May 13 - 17, 2019
CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media