Academia.eduAcademia.edu

Vertex Cover in Complex Networks

2013, International Journal of Modern Physics C

A Minimum Vertex Cover is the smallest set of vertices whose removal completely disconnects a graph. In this paper, we perform experiments on a number of graphs from standard complex networks databases addressing the problem of finding a "good" vertex cover (finding an optimum is a NP-Hard problem). In particular, we take advantage of the ubiquitous power law distribution present on many complex networks. In our experiments, we show that running a greedy algorithm in a power law graph we can obtain a very small vertex cover typically about 1.02 times the theoretical optimum. This is an interesting practical result since theoretically we know that: (1) In a general graph, on n vertices a greedy approach cannot guarantee a factor better than ln n; (2) The best approximation algorithm known at the moment is very involved and has a much larger factor of [Formula: see text]. In fact, in the context of approximation within a constant factor, it is conjectured that there is no (2...

International Journal of Modern Physics C c World Scientific Publishing Company ⃝ VERTEX COVER IN COMPLEX NETWORKS MARIANA O. DA SILVA DAINF, Universidade Tecnológica Federal do Paraná Rua Sete de Setembro, 3164, Curitiba-PR, CEP 80230-901, Brazil [email protected] GUSTAVO A. GIMENEZ-LUGO DAINF, Universidade Tecnológica Federal do Paraná Rua Sete de Setembro, 3164, Curitiba-PR, CEP 80230-901, Brazil [email protected] MURILO V. G. DA SILVA DAINF, Universidade Tecnológica Federal do Paraná Rua Sete de Setembro, 3164, Curitiba-PR, CEP 80230-901, Brazil [email protected] Received Day Month Year Revised Day Month Year A Minimum Vertex Cover is the smallest set of vertices whose removal completely disconnects a graph. In this paper we perform experiments on a number of graphs from standard complex networks databases addressing the problem of finding a “good” vertex cover (finding an optimum is a NP-Hard problem). In particular, we take advantage of the ubiquitous power law distribution present on many complex networks. In our experiments we show that running a greedy algorithm in a power law graph we can obtain a very small vertex cover typically about 1.02 times the theoretical optimum. This is an interesting practical result since theoretically we know that: (1) In a general graph on n vertices a greedy approach cannot guarantee a factor better than ln n; (2) The best approximation algorithm known at the moment is very involved and has a much 1 ). In fact, in the context of approximation within a constant larger factor of 2 − Θ( √log n factor, it is conjectured that there is no (2 − ϵ)-approximation algorithm for the problem; (3) Even restricted to power law graphs and probabilistic guarantees, the best known approximation rate is 1.5. Keywords: Complex networks; power law graphs; vertex cover; greedy algorithms. PACS Nos.: 11.25.Hf, 123.1K 1. Introduction The study of large real world graphs, also commonly called complex networks, grew enormously in last decade bringing together physicists, mathematicians, computer scientists and many other researchers 1 . In this area many experimental work 1 2 M. O. DA SILVA; G. A. GIMENEZ-LUGO; M. V. G. DA SILVA has been performed by experimentalists 2,3,4,5,6,7 as well as a number of analytical models have been proposed by the community akin to mathematics and theoretical computer science 8 . A crucial point in this field is that many networks from different domains (social, economic, technological and biological networks, etc) share some common properties. Some of the most notable common properties among a variety of complex networks are small diameter (the “small world phenomenon”), the power law distribution (see Table 1 for a list with few examples of power law graphs according to 10 ), high clustering coefficient among a few others 10,9 . In this paper, generally speaking, we are interested in the following (quite broad) question: can we explore these properties common to most real world networks in order to obtain more efficient algorithms for combinatorial optimization problems? This is an interesting practical question that still has to be explored more thoroughly. In our work we pick a particular optimization problem, the Minimum Vertex Cover, and conduct experiments taking advantage of one particular property of complex networks: the Power Law Distribution. We define these concepts in the next section. Table 1. Examples of power law networks according to Barabasi and Bonabeau 8 . Networks Nodes # Links Hollywood Internet Protein regulatory network Research collaborations Sexual relationships World Wide Web Celular Metabolism Actors Routers Proteins Scientists People Web pages Molecules involved in burning food for energy Appearance in the same movie. Optical and physical connections. Interactions among proteins. Co-autorship of papers. Sexual contact. URLs. Participation in the same biochemical reaction The main point in our work is that since power law graphs contain a few very large “hubs”, i.e., vertices of very high degree, naturally one might expect that a greedy approach (i.e., choosing first these hubs for the covering) would perform well. Although this idea is obviously quite natural, the authors are not aware of any experimental or analytical work to confirm this expectation. More importantly, a question that has also not being answered – as far as we are aware – is how well (quantitatively) a greedy algorithm would perform on real world power law networks. We have run such experiments on a number of graphs from different domains from standard complex networks databases and we obtained coverings very close to the optimum, more precisely around 1.02 times the optimum (the larger obtained value was 1.05). We present this and other related results on Section 4. In Sections 2 and 3 we define the problem and discuss how the experimental results obtained in this paper relate to the theoretical guarantees for approximation algorithms for the vertex cover problem. Vertex Cover in Complex Networks:Experimental Results 3 2. Power Law Graphs and Combinatorial Optimization Barabasi and Albert 10 and also others 8 observed that in many real-world graphs, the vertex degree sequence has a power law distribution, i.e., the fraction of vertices with degree d is proportional to d−λ , where λ is a constant independent of the size of the graph. The authors modelled such real world networks using a preferential attachment scheme which became well known in the field. Other researchers proposed a number of other different approaches to model such networks 8 . Experimental work in the field had pointed the λ mentioned above is 2.9 ± 0.1 and Bollobás et al 11 in 2001 presented an analytical argument pointing that such parameter is in fact 3. In our paper we will use the common designation “Power Law Graph” for graphs with such distribution for the vertex degree sequence. In Figure 1 we give two examples the of vertex degree distribution from networks used in our experiments. (a) Cpan authors (log-log plot) (b) Cpan authors (binned log-log plot) (c) Oclinks (log-log plot) (d) Oclinks (binned log-log plot) Fig. 1. Vertex degree distribution. The x-axis is the degree distribution from 0 to maximum value and the y-axis is the number of vertices of such degree. In the light of the fact that many networks from real world are in fact power law graphs, many researchers in the field have been exploring this property structurally and algorithmically 12,13,14 . Recently, some work also has been done on power law 4 M. O. DA SILVA; G. A. GIMENEZ-LUGO; M. V. G. DA SILVA graphs aiming to lower the average factor of approximation algorithms for NPhard combinatorial problems 15,16,17 . To be more precise, in the case of the vertex cover problem, although it is believed that 2 is the smallest constant factor for an approximation algorithm, in the particular case of power law graphs, Gast and Hauptmann proposes an algorithm which outputs an expected vertex cover not larger than 1.5 times the optimum 15 . These analytical results are exactly in line with what we propose experimentally in this paper. In particular, we show that in practice simpler algorithms can achieve much smaller factors of approximation. 3. The vertex cover problem A cover in a graph is a set of vertices whose removal completely disconnects a grapha . A Minimum Vertex Cover is a smallest set with such property. Figure 2 depicts an example of a minimum vertex cover in a graph. This problem is a classical NP-Hard problem 18 which remains intractable even for cubic graphs and planar graphs with maximum degree at most three 18 . Therefore approximation algorithms for finding “good” solutions in polynomial time are of great interest. We discuss in the next section the two most common approximation approaches in this direction. (a) Graph G with 15 vertices and 20 edges (b) The minimum vertex cover of G is the set of highlighted vertices. Fig. 2. Example of a Minimum Vertex Cover of a graph G: If the set of vertices highlighted in (b) are removed from G, the resulting graph has no edges. a A completely disconnected graph is a graph with no edges. Alternatively a vertex cover S is a set of vertices such that every edge in the graph has an end in S. Vertex Cover in Complex Networks:Experimental Results 5 3.1. Simple ideas for approximation algorithms • Greedy Algorithm: Given a graph G, pick the vertex v with the highest degree and insert it in the cover S and remove v from G. Repeat the same strategy in the remaining graph iteratively until there are no more edges. Such approach always finds a vertex cover S such that |S| is at most ln n times the optimum, where n is the number of vertices in input graph G. For a general graph this upper bound is tight 19 . • 2-Approximation Algorithm: Given a graph G, pick an arbitrary edge uv, insert uv in the set M and remove both u and v from G. Repeat the same strategy in the remaining graph iteratively until there are no more edges b . In the next step include in the vertex cover S both ends of each edge in the set X. This approach leads to a cover S of size at most twice the optimum 19 (i.e., this is an approximation algorithm with factor 2). The 2-approximation algorithm presented above is known for about 40 years and it is one of the best known until now for general graphs 20 . It is known that no algorithm can guarantee a factor better than 1.36 21 (unless P=NP). In fact it is conjectured an even stronger assertion: For any constant ϵ, there is no (2 − ϵ)-approximation algorithm for the problem 22 . Furthermore, even for non constant guarantees, very small improvements on the factor 2 require very involved algorithms. At the moment the best known approximation algorithm, proposed by 1 Karakostas 23 , has a factor of 2 − Θ( √log ) which is much larger than the numbers n that we obtained experimentally using the greedy approach. In the domain of power law graphs, Gast and Hauptmann recently proposed an algorithm and proved that it outputs on average a vertex cover of size 1.5 times the optimum on the (α, β)-model of power law graphs proposed by Aiello 15 . This result is the only one that we found on the topic of vertex cover in power law graphs. 4. Results for Real Networks and Discussion In this section we present our experimental results addressing the minimum vertex cover problem on 25 power law complex networks from well known graph databases 24,25,26 . The size of the networks range from a few hundred to about 75000 vertices. In 15 of these networks we had the value of the optimum cover for comparison. In these 15 networks our experiments show that a greedy approach outputs a typical vertex cover of size around 1.02 times the optimum. The size of the coverings found by the greedy algorithm are shown in the column named Greedy in Table 2 and the rate of approximation in the column named Greedy/Opt in Table 3. Theoretically a greedy approach can output results with a factor as bad as ln n, so the point here b Note that at the end of this process M is a maximal matching in G, i.e., M is a set of edges in G such that for every distinct e, e′ ∈ M , edges e and e′ have no endnodes in common and there is no other set of edges M ′ in G such that M ⊂ M ′ . 6 M. O. DA SILVA; G. A. GIMENEZ-LUGO; M. V. G. DA SILVA is of course that such “obstructions” which would bring to a larger approximation factor are not typical in power law graphs. So for that reason in practice the greedy approach does very well. We also implemented the 2-approximation algorithm discussed in the section 3.1 and obtained significantly worst results than the greedy algorithm. A typical vertex covering obtained by the 2-approximation algorithm was around 1.56 times the optimum. Again, these results were obtained from the 15 graphs for which we had the optimum for comparison. These results are presented in Tables 2 and 3. We note that we ran both the greedy algorithm and the 2-approximation algorithm on all 25 networks (i.e., not only on the networks where we had the optimum for comparison) and the results pointed that the greedy approach obtained a typical vertex cover of size 0.66 times the size of the competing algorithm. These results are presented in the column named Greedy / 2-App in Table 3. We included Table 4 to refer to the well known databases from where the networks used in our experiments were obtained. For the sake of completeness we also included the number of edges for each network. In order to obtain the optimum value for the vertex cover, we used the well known SAGE Library 27 . We managed to obtain minimum coverings for graphs up to about 10000 vertices, but for the 10 largest graphs we were not able to obtain these values due to the exponential nature of the algorithm. It is still somewhat surprising that the optimum value could be computed for graphs sizing almost 10000 vertices. So it is very important to point out that although the optimum algorithm is exponential, due to the intricate process in which particular graphs are explored, successful executions for relatively large data were possible c . Finally, we would like to add two remarks: (1) In the literature of complex networks, in particular in the context of network robustness, some authors refer to the sequence of vertices obtained by the greedy strategy described in this paper as high degree adaptive attack 28 . (2) In a final note, as we have mentioned before, in our experiments we also computed the ratio between the size of the cover obtained by the greedy algorithm divided by the size of the cover cover obtained by the 2approximation algorithm and the results show that the cover obtained by the greedy algorithm is about 66% of the size of the other approach (Table 3). It should also be pointed that this number seems to be invariant to the size of the graph. This fact might give us some indication that even for the 10 largest graphs (for which the optimum values were not available for comparison) the size of the coverings obtained by the greedy algorithm might still be very close to the optimum as well. c We ran our experiments on a R710 DELL sever running a Intel Xenon Quad Core 5500 y 5600 and a 60GB Swap Memory + 12GB Physical Memory. The smallest graph for which the SAGE exponential approach “crashed” was running for about 12 hours and stopped due to lack of memory Vertex Cover in Complex Networks:Experimental Results Table 2. Algorithms for Vertex Cover in Complex Networks: (1) First column is the ID of the graph. We use it to refer to the same graphs on other tables. (2) Second column is the name of the file available on the databases. (3) Third column is the number of vertices of the network. (4) Forth column is the size of the minimum vertex cover. We used the algorithm from SAGE Library for computing these values. Data for larger graphs could not be obtained due to the exponential nature of the algorithm (see section 4 for details). (5) Fifth column is the vertex cover obtained by the greedy algorithm implemented by us. (6) Sixth column is the vertex cover obtained by the 2-approximation algorithm implemented by us. (7) Seventh column indicates the type of the network. Net ID Network file # of vertices Opt Greedy 2-Aprox Network type 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Airlines US Air Codeminer Cpan authors EuroSis Oclinks YeastS CA-GrQc p2p-Gnutella08 Wiki-Vote1 p2p-Gnutella09 p2p-Gnutella06 p2p-Gnutella05 CA-HepTh p2p-Gnutella04 CA-AstroPh p2p-Gnutella25 CA-CondMat p2p-Gnutella24 Cit-HepTh p2p-Gnutella30 Email-Enron Brightkite-edges p2p-Gnutella31 soc-Epinions1 235 332 724 839 1285 1899 2284 5241 6301 7115 8114 8717 8846 9875 10876 18771 22687 23133 26518 27769 36682 36692 58228 62586 75879 96 149 191 116 597 749 763 2783 2054 2249 2574 3405 3428 4981 4348 n.a n.a n.a n.a n.a n.a n.a n.a n.a n.a 97 151 196 117 608 763 773 2795 2070 2370 2589 3484 3475 5003 4428 12044 6055 13561 7250 18225 9321 14477 22177 15864 22418 146 230 334 196 896 1100 1240 3960 3366 3460 4238 5352 5412 7240 6624 15194 9800 18150 11624 23396 15096 20674 34814 25582 35964 Technological Technological Social Net. Social Net. Information Information Biologic Citation Communication Social Net. Information Information Information Citation Collaboration Collaboration Information Collaboration Information Citation Information Communication Social Net. Information Social Net. 7 8 M. O. DA SILVA; G. A. GIMENEZ-LUGO; M. V. G. DA SILVA Table 3. (1) First column is the graph ID. (2) Second column shows the rate of approximation obtained by the greedy algorithm (i.e., the value obtained by the algorithm divided by the optimum). (3) Third column shows the rate of approximation obtained by the 2-approximation. (4) Forth column shows the size of the cover obtained by the greedy algorithm divided by the size of the cover obtained by the 2-approximation. In the bottom of the table we include the average and the standard deviation for each column. ID Greedy /Opt 2-App /Opt Greedy /2-App 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 1,01 1,01 1,03 1,01 1,02 1,02 1,01 1,00 1,01 1,05 1,01 1,02 1,01 1,00 1,02 n.a n.a n.a n.a n.a n.a n.a n.a n.a n.a 1,52 1,54 1,75 1,69 1,50 1,47 1,63 1,42 1,64 1,54 1,65 1,57 1,58 1,45 1,52 n.a n.a n.a n.a n.a n.a n.a n.a n.a n.a 0,66 0,66 0,59 0,60 0,68 0,69 0,62 0,71 0,61 0,68 0,61 0,65 0,64 0,69 0,67 0,79 0,62 0,75 0,62 0,78 0,62 0,70 0,64 0,62 0,62 average st. dev. 1,02 0,01 1,56 0,09 0,66 0,05 Table 4. In this table we indicate the databases from where networks were obtained. For the sake of completeness we also include the size of the vertex set and edge set for each network. ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Database Gephi Pajek Gephi Gephi Gephi Gephi Pajek Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Stanford Vertices 235 332 724 839 1285 1899 2284 5241 6301 7115 8114 8717 8846 9875 10876 18771 22687 23133 26518 27769 36682 36692 58228 62586 75879 Edges 1295 2126 1015 2112 6462 13821 6646 14484 20776 100729 26013 31523 31837 25973 39993 198050 54705 93439 65368 352285 88328 183811 214023 147890 404953 References 1. D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press (2010) 2. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. S. Tomkins, and J. Wiener. Graph structure in the Web. Computer networks, 33(1-6):309320, June 2000. 3. J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan and A. S. Tomkins. The Web as a graph: measurements, models and methods. Proceedings of the 5th annual international Conference on Computing and Combinatorics,1-17, 1999. Vertex Cover in Complex Networks:Experimental Results 9 4. M. Faloutsos, P. Faloutsos and C. Faloutsos. On power-law relationships of the internet topology. ACM SIGCOMM Computer Communication Review, 29(4):251-262, 1999. 5. M. Jovanović, F. S. Annexstein and K. A. Berman. Modeling peer-topeer network topologies through “small-world” models and power laws. In IX Telecommunications Forum, TELFOR, 2001 6. N. Guelzim, S. Bottani, P. Bourgine and F. Kps. Topological and causal structure of the yeast transcriptional regulatory network. Nature Genetics, 31(1):60-63, 2002. 7. S. Eubank, V. S. A. Kumar, M. V. Marathe, A. Srinivasan and N. Wang. Structural and algorithmic aspects of massive social networks. In Proceedings of the 15th annual ACM-SIAM Symposium on Discrete Algorithms, pages 718-727. SIAM, 2004. 8. L. Lu and F. Chung. Complex Graphs and Networks. American Math. Society (2006). 9. A.L. Barabási and E. Bonabeau . Scale-Free Networks. Scientific American 288, 50-59, 2003. 10. A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509, 1999. 11. B. Bollobás, O. Riordan, J. Spencer, G. Tusndy The Degree Sequence of a Scale-Free Random Graph Process. Random Structures & Algorithms. Vol. 18 Issue 3, 279-290 (2001) 12. K. Park and H. Lee. On the effectiveness of route-based packet filtering for distributed DoS attack prevention in power-law internets. ACM SIGCOMM Computer Communication Review, 31(4):15-26, October 2001. 13. C. Gkantsidis, M. Mihail and A. Saberi. Conductance and congestion in power law graphs. SIGMETRICS Performance Evaluation Review, 31:148-159, 2003. 14. M. Mihail, C. H. Papadimitriou and A. Saberi. On certain connectivity properties of the internet topology. Journal of Computer and System Sciences, 72(2):239-251, 2006. 15. M. Gast and M. Hauptmann. Approximability of the Vertex Cover Problem in Power Law Graphs. Computing Research Repository (CoRR), arXiv:1204.0982, 2012. 16. M. Hauptmann, M. Gast and M. Karpinski. Inapproximability of Dominating Set in Power Law Graphs. Computing Research Repository (CoRR), arXiv:1212.3517, 2012. 17. M. Hauptmann, M. Gast and M. Karpinski Improved Approximation Lower Bounds for Vertex Cover on Power Law Graphs and Some Generalizations. Computing Research Repository (CoRR), arXiv:1210.2698, 2012. 18. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co. New York, NY (1979). 19. C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and complexity. Prentice Hall (1982). 20. S. Dasgupta, C.H. Papadimitriou and U.V. Vazirani. Algorithms. MG-Hill, 2008. 21. I. Dinur and S. Safra. On the hardness of approximating vertex cover. Annals of Mathematics, Pages 439-485 from Volume 162 (2005). 22. S. Khot, O. Regev. Vertex cover might be hard to approximate to within 2 − ϵ. Journal of Computer and System Sciences Volume 74, Issue 3, May 2008, Pages 335-349. 23. G. Karakostas. A better approximation ratio for the vertex cover problem. ACM Transactions on Algorithms, Volume 5 Issue 4, Article No. 41, 2009. 24. GEPHI Datasets, http://wiki.gephi.org/index.php/Datasets 25. PAJEK Datasets, http://vlado.fmf.uni-lj.si/pub/networks/data 26. Stanford Large Network Dataset Collec. http://snap.stanford.edu/data 27. SAGE, http://www.sagemath.org/library.html 28. C. M. Schneider, A. A. Moreira, J. S. Andrade Jr., S. Havlin and H. J. Herrmann. Onion-like Network Topology Enhances Robustness against Malicious Attacks. Journal of Statistical Mechanics: Theory and Experiment (2011) P01027 View publication stats