Academia.eduAcademia.edu

Network structure revealed by short cycles

2006

Network Structure Revealed by Short Cycles James Bagrow,1, ∗ Erik Bollt,2, 1, † and Luciano da F. Costa3, ‡ arXiv:cond-mat/0612502v1 [cond-mat.dis-nn] 19 Dec 2006 1 Department of Physics, Clarkson University, Potsdam, NY 13699-5820, USA. 2 Department of Mathematics and Computer Science, Clarkson University, Potsdam, NY 13699-5815, USA. 3 Instituto de Fı́sica de São Carlos. Universidade de São Paulo, São Carlos, SP, PO Box 369, 13560-970, Brazil (Dated: February 6, 2008) This article explores the relationship between communities and short cycles in complex networks, based on the fact that nodes more densely connected amongst one another are more likely to be linked through short cycles. By identifying combinations of 3-, 4- and 5-edge-cycles, a subnetwork is obtained which contains only those nodes and links belonging to such cycles, which can then be used to highlight community structure. Examples are shown using a theoretical model (Sznajd networks) and a real-world network (NCAA football). I. INTRODUCTION Complex networks have attracted growing attention because of their non-uniform connectivity patterns, which may give rise to node degree power laws and hubs, known to play an important role in defining several topological properties of the networks [1, 2, 3]. More recently, the fact that many complex networks include communities, i.e. sets of nodes which connect more intensely amongst one another than with the rest of the network, has become the focus of increasing attention (e.g. [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]). Indeed, because of statistical fluctuations, even random networks [14, 15] can be found to exhibit communities [16, 17]. Although we still lack a clear-cut definition of a community, the problem of identifying communities in complex networks continues to motivate interest from researchers because of the importance that those structures have for better understanding the general organization of such complex structures (e.g. [18]). Another important feature of complex networks are the cycles of different lengths which underlie the connectivity of the several models of networks [19]. Actually, the statistical distribution of cycles has been acknowledged as particularly important for defining not only the topology of the respective networks, but also the dynamics of systems running on such frameworks(e.g. [20]). The latter is a direct consequence of the fact that cycles, through feedback, form the scaffolding of memory in dynamical systems. Generally, the density of cycles tends to increase as more edges are incorporated into a network, with longer cycles being observed earlier than shorter ones (e.g. [21]). Therefore, the density of cycles of different lengths can be used as an indicator of the connectivity between any subset of nodes. In other words, the larger the num- ∗ Electronic address: [email protected] address: [email protected] ‡ Electronic address: [email protected] † Electronic ber of shortest cycles among a subset of nodes, the more connected such nodes are to one another. Longer cycles tend to grow, “coiled up”, alongside these shorter cycles, however, blurring the distinction between nodes based solely on short-cycle participation. We present methods to overcome this. The article starts by presenting the cycle finding algorithm and its application as the core of the community finding algorithm and proceeds by illustrating the application of such a methodology to community finding in a theoretical complex network model (i.e. Sznajd networks [22]) and a real-world football network. II. DESCRIBING SHORT CYCLES For a graph G = {V, E}, n = |V |, m = |E|, we are interested in finding cycles of length 3,4, or 5 containing some starting vertex v ∈ V . To describe these cycles we begin by decomposing G into shells Si about v. We define shell Si to be the set of all vertices (and edges between those vertices) at a distance i from the starting vertex v. Since we are only interested in cycles of length ≤ 5, we need only to keep shells S1 and S2 . It is simple to describe all possible short cycles using these shell decompositions. For example, for every edge eij in S1 about v, there exists a 3-cycle (triangle) v–i–j– v. Similarly, for every path of length 2 or 3 in S1 , there exists a 4- or 5-cycle, respectively. Another 4-cycle and two more 5-cycles exist involving both S1 and S2 . In general, for a cycle of length L ≥ 3, the number of such possible “cases” grows with L. Since it requires 2 edges to visit a shell, an L-cycle can visit at most J shells, where  L L even,  2, J= (1)  L−1 , L odd. 2 If the farthest shell the cycle visits is Sj , j < J, there are at most L − 2j remaining edges that must be distributed between and within the S1 , S2 , ...Sj shells. The 2 number of ways to distribute L − 2j edges over j shells is (L−2j+j−1)! (L−2j)!(j−1)! . However, it is possible for a cycle to “zigzag” between shells, using more than the 2j edges necessary to visit the j shells. Therefore, the total number of possible ways to distribute an L-cycle is at least: Nl (L) = 1+ J J−j X (i + j − 2)! (L − 2i − j − 1)! X , i!(j − 2)! (L − 2j − 2i)!(j − 1)! j=2 i=0 (2) with the outer sum accounting for all the possible shells the cycle can visit, the inner sum for all the optional pairs of edges that can lie between shells and the +1 for the one possible cycle that visits the first shell only. Here, i is the number of pairs of edges between shells beyond the j necessary to visit the j shells. This calculation fails to take into account permutations of the ordering of edges between and within two adjacent shells. A simple upper bound is possible, however, as there are certainly no more than L! possible permutations over the whole network: Nu (L) = 1+ J J−j X X j=2 i=0 (i + j − 2)! (L − 2i − j − 1)! L!, (3) i!(j − 2)! (L − 2j − 2i)!(j − 1)! with Nl (L) ≤ N (L) ≤ Nu (L). III. (4) CYCLES AND COMMUNITIES is the graph G containing only edges that do not participate in j-cycles. Separate communities in G will appear as disconnected components in H. We interpret vertices with degree zero in H as communities of size one. In specifying H, the question of what to choose for j has been left open. For example, choosing just j = {3} will correspond to deleting all edges from G that participate in 3-cycles, generally not a useful result. One may consider j to be a tunable parameter, used to get a desired result when applied to a specific network. One issue that can occur is that longer cycles often overlap shorter cycles. In terms of communities, most inter-community edges contain few (if any) short cycles, but intra-community edges tend to contain both long and short cycles, since a long cycle can “coil” inside the community. If one were to just delete all 5-cycles in a graph, it is very possible to end up deleting all edges. There is quite a bit of leeway in how we choose j and build H, and we can use this to our advantage. For example, pick two cycle lengths s and t, s < t and compute Cs and Ct . Then, build another set of edges, Ct\s Ct\s ≡ Ct \ Cs , (8) containing the set of edges that participate in t-cycles but not s-cycles. The graph H = {V, Ct\s } will contain edges that tend to be between communities and not within, for an appropriate choice of t and s. One can think of this as a “backbone” of the network, and deleting these edges may be a useful pre-processing step for applying other community-detection algorithms, including betweenness [4, 10]. IV. APPLICATION EXAMPLES For a graph G, a cycle C is a subset of the set of edges E containing a continuous path, where the first and last node of the path are the same [23]. Permutations of cycles may be ignored since we will be working exclusively with sets of edges. Throughout this work, we limit ourselves to short cycles, typically those of length l, 3 ≤ l < 6. These shorter cycles may provide the advantage of faster calculation times. Community structure can be studied by comparing the edges covered by these cycles with the original graph. Let We present example applications of the methods presented in Section III to two networks: a network of NCAA Division I-A football games held during the 2005 regular season [30] and a Sznajd network [24]. In addition, we discuss how these methods can break down and ways to overcome that. Cl (i) ≡ the set of edges traversed by all l-cycles starting from vertex i In NCAA football, teams are grouped into conferences based on location. To save on transportation time and cost, more games are played between teams in the same conference than in different conferences. Thus, a graph of the game schedule, where nodes are teams and edges connect teams that have played against each other, naturally exhibits community structure based on these conferences [25]. Figure 1a displays the original network, call it G. As a first pass, let’s use j = {3} and generate G3 = {V, C}, pictured in Figure 1b using the same layout as 1a. This deletes all edges that do not participate in 3-cycles. (5) Starting from all vertices and limiting ourselves to only short j-cycles [29], [[ Cj (i). (6) C ≡ i∈V j Then, for a graph G, we construct a graph H where, H = {V, E \ C} (7) A. Football Network 3 Most deleted edges are between conferences, though some edges remain. This will not split the network into seperate components based on the communities but it may be useful as a preprocessing step for betweenness or another community detection algorithm. In addition, let us build Ct\s , as per Equation 8. For this network, we have chosen t = 5, s = 3. Figure 1c shows G5\3 = {V, Ct\s }, again using the same layout as 1a. For improved clarity, Figure 1d shows G5\3 with a layout emphasizing that all edges are between conferences. We propose that edges in C5\3 comprise the majority of this network’s inter-community structure. To test this, one can compare the distributions of edge betweenness for these backbone and non-backbone edges, as shown in Figure 2a. Backbone edges tend to carry much higher betweenness values than the more common non-backbone edges. B. Sznajd Network which are connected to the nodes i and j are established with probability p. An analogue procedure is considered with respect to edges which are absent. In order to avoid convergence to the trivial ground states where all edges are set on or off, the dynamics also consider as feedback the total number of established edges. Figure 3a shows a Sznajd Network. Edges that do not participate in 3-cycles are indicated. As can be seen, many of these edges fall “outside” of the more dense regions of the network. This is a good first pass, and may be used to initialize another algorithm, similar to our football result, but it will not give detailed information on the hierarchical community structure. Figure 3b shows the same network as 3a, but with the edges of C5\3 highlighted. One can imagine removing both the C3 and C5\3 edges to further enhance the separation. V. CONCLUDING REMARKS One particularly interesting category of complex networks are the so-called geographical models (e.g. [27, 28]), whose nodes have well-defined positions in an embedding metric space S. Typically, the connectivity in such networks is affected by the adjacency and/or the distance between pairs of nodes, with nodes which are closer one another having higher probability of being connected. As an immediate consequence of such an organizing principle, communities in traditional geographical communites are closely related to the presence of spatial clusters of nodes, i.e. groups of nodes which are closer one another than with the rest of the network. Introduced recently, the family of geographical networks known as Sznajd networks [22] allow rich community structure as a consequence of running the Sznajd opinion formation dynamics [24] among the network edges instead of considering the states associated to each network node. Starting with a traditional geographical network (called the underlying network Γ) where the connections are defined with probability proportional to the distances between pairs of nodes, a percentage of edges of Γ are removed, yielding the initial condition for the Sznajd dynamics. Then, edges from Γ are chosen randomly and used to influence the respective surrounding connectivity. For instance, in case the chosen edge (i, j) is on (i.e. it does correspond to a link in the current growth stage), the edges in Γ The identification and characterization of the communities present in complex networks stands out as one of the most important approaches for understanding their structure and possible formation and evolution. At the same time, the distribution of cycles of various lengths in a complex network has important implications for the connectivity, resilience and dynamics of the respectively studied networks. The current work brought together these two important trends, in the sense of applying short cycle detection as the means to help the identification of communities in complex networks. The suggested methodology has been applied with promising results to the identification of communities in a theoretical network model, more specifically a Sznajd geometrical networks, as well as to a real-world network (NCAA). The relationship between the cycles and communities in the football network has been further investigated in terms of the betweeness centrality measurement, confirming that the obtained backbone edges tend to exhibit higher betweeness values. [1] S.-H. Yook, H. Jeong, and A.-L. Barabási, Proc Natl Acad Sci USA 99, 13382 (2002). [2] S. N. Dorogovtsev and J. F. F. Mendes, Advances in Physics 51, 1079 (2002), cond-mat/0106144. [3] S. Bornholdt and H. G. Schuster, eds., Handbook of Graphs and Networks: From the Genome to the Internet (John Wiley & Sons, Inc., New York, NY, USA, 2003), ISBN 3527403361. [4] M. Girvan and M. E. J. Newman, Proc Natl Acad Sci USA 99, 7821 (2002). [5] M. E. J. Newman and M. Girvan, in Statistical Mechanics of Complex Networks, edited by R. Pastor-Satorras, J. Rubi, and A. Diaz-Guilera (Springer, Berlin, 2003). [6] M. E. J. Newman, The European Physical Journal B 38, Acknowledgments: L. da F. Costa thanks FAPESP (05/00587-5) and CNPq (308231/03-1) for financial support. 4 T T T T T C C S S T W T C S P W P S S S A B I A B S S B A A B B B A B B A A A B A A A U S X X U B M X B B A A B M A A E X U X X M X X X E M M X X (a) E M M X X X E M M U X M X M U E E M M X U M E E M X U E I U E I M M M A A A A E E M M X A I A B E U X S P P P P P M M X U I B M X A B M U U E P P A B B E P P B E P W S B I B T T W C C C B I T W S S P A I B B P P P P S P P S S P T W W C C S W W S S C C C C T W W C W C C C C T W W C C S W W S S S C C C S C T T T W W C T T (b) T T T T T C C S C S T W W T C C S W W P W S A P B I S S B A B B B A A B B A A B A A A A U M E U X X U M W P E E B M X M X P A X B U M W U B P S W S S T A C B C X A P M E P C T B (c) W A B C A I M B U I S M X S C W X P X W M X T E A S X E M M M U X E M M X X B C X I W X M M W B E M X U A U E M T T E C E S P S M U U C T X S E E X I I B B X M I C A P A I B P P A B E P P P X P A P P S S T W C C C S B M T T W S S S S C C C W W C U A S (d) FIG. 1: (color online) The NCAA Div I-A 2005 regular season with all edges (a), with 3-cycles only (b), and with just C5\3 edges (c). (d) is the same graph as (c) but with a layout emphasizing that no edges within conferences remain (degree zero nodes omitted). As per [26], the conferences are: A = Atlantic Coast, B = Big 12, C = Conference USA, E = Big East, I = Independent, M = Mid-American, P = Pacific Ten, S = Southeastern, T = Western Athletic, U = Sun Belt, W = Mountain West, X = Big Ten. 321 (2004). [7] M. E. J. Newman and M. Girvan, Physical Review E 69, 026113 (2004). [8] M. E. J. Newman, Phys. Rev. E 69, 066133 (2004). [9] A. Clauset, M. E. J. Newman, and C. Moore, Phys. Rev. E 70, 066111 (2004). [10] J. P. Bagrow and E. M. Bollt, Phys. Rev. E 72, 046108 (2005), cond-mat/0412482. 5 (a) (b) FIG. 2: (color online) Histogram of edge betweenness for non-backbone edges (red) and backbone edges (blue) for the NCAA 2005 football network (a) and the Sznajd network shown in Figure 3 (b). For the football network, the mean (unnormalized) betweenness is 42.8 for non-backbone edges and 132.9 for backbone edges. Note that backbone and non-backbone histograms use the same bins; the front-most bins have been narrowed for clarity. The Sznajd non-backbone bins have also been scaled down by a factor of 25 for clarity. (a) (b) FIG. 3: A Sznajd network. Edges that do not participate in 3-cycles are dashed (a). Edges in C5\3 are bold (b). Note that nodes of degree zero have been omitted for clarity. [11] M. E. J. Newman, Proc Natl (2006). [12] M. E. J. Newman, Phys. Rev. [13] M. A. Porter, P. J. Mucha, A. J. Friend, submitted to physics/0602033. Acad Sci USA 103, 8577 E 74, 036104 (2006). M. E. J. Newman, and Social Networks (2006), [14] P. Erdös and A. Rényi, Publ. Math. 6, 290 (1959). [15] B. Bollobás, Random Graphs (Academic Press, London, 1985). [16] R. Guimera, M. Sales-Pardo, and L. A. N. Amaral, Phys. Rev. E 70, 025101 (2004). [17] J. Reichardt and S. Bornholdt (2006), cond- 6 mat/0606295. [18] R. Guimera and L. A. N. Amaral, Nature 433, 895 (2005). [19] H. D. Rozenfeld, J. E. Kirk, E. M. Bollt, and D. ben Avraham, J. Phys. A 38, 4589 (2005), condmat/0403536. [20] A. Arenas, A. Diaz-Guilera, and C. J. Perez-Vicente, Physical Review Letters 96, 114102 (2006), condmat/0511730. [21] L. da Fontoura Costa, Physical Review E 70, 056106 (2004), cond-mat/0312712. [22] L. da F. Costa, Intl. J. Mod. Phys. C 16, 1001 (2005). [23] B. Bollobás, Modern Graph Theory (Springer, New York, 1998). [24] K. Sznajd-Weron and J. Sznajd, Intl. J. Mod. Phys. C 11, 1a57 (2000). [25] T. Callaghan, M. Porter, and P. Mucha, accepted in American Mathematical Monthly (2003), physics/0310148. [26] J. Park and M. E. J. Newman, J. Stat. Mech. P10014 (2005), physics/0505169. [27] M. T. Gastner and M. E. J. Newman, The European Physical Journal B 49, 247 (2006). [28] L. da F. Costa and L. A. Diambra, Physical Review E 71, 021901 (2005). [29] Indeed, here we specify short cycles as those of length 3, 4, or 5 but this is not a set rule and, in certain circumstances, it may prove advantageous to consider 4- or 5-cycles, or even just 5-cycles. [30] Data taken from published schedule at http://www.ncaa.org