International Journal of Modern Physics C
c World Scientific Publishing Company
⃝
VERTEX COVER IN COMPLEX NETWORKS
MARIANA O. DA SILVA
DAINF, Universidade Tecnológica Federal do Paraná
Rua Sete de Setembro, 3164, Curitiba-PR, CEP 80230-901, Brazil
[email protected]
GUSTAVO A. GIMENEZ-LUGO
DAINF, Universidade Tecnológica Federal do Paraná
Rua Sete de Setembro, 3164, Curitiba-PR, CEP 80230-901, Brazil
[email protected]
MURILO V. G. DA SILVA
DAINF, Universidade Tecnológica Federal do Paraná
Rua Sete de Setembro, 3164, Curitiba-PR, CEP 80230-901, Brazil
[email protected]
Received Day Month Year
Revised Day Month Year
A Minimum Vertex Cover is the smallest set of vertices whose removal completely disconnects a graph. In this paper we perform experiments on a number of graphs from
standard complex networks databases addressing the problem of finding a “good” vertex
cover (finding an optimum is a NP-Hard problem). In particular, we take advantage of
the ubiquitous power law distribution present on many complex networks. In our experiments we show that running a greedy algorithm in a power law graph we can obtain
a very small vertex cover typically about 1.02 times the theoretical optimum. This is
an interesting practical result since theoretically we know that: (1) In a general graph
on n vertices a greedy approach cannot guarantee a factor better than ln n; (2) The
best approximation algorithm known at the moment is very involved and has a much
1
). In fact, in the context of approximation within a constant
larger factor of 2 − Θ( √log
n
factor, it is conjectured that there is no (2 − ϵ)-approximation algorithm for the problem;
(3) Even restricted to power law graphs and probabilistic guarantees, the best known
approximation rate is 1.5.
Keywords: Complex networks; power law graphs; vertex cover; greedy algorithms.
PACS Nos.: 11.25.Hf, 123.1K
1. Introduction
The study of large real world graphs, also commonly called complex networks,
grew enormously in last decade bringing together physicists, mathematicians, computer scientists and many other researchers 1 . In this area many experimental work
1
2
M. O. DA SILVA; G. A. GIMENEZ-LUGO; M. V. G. DA SILVA
has been performed by experimentalists 2,3,4,5,6,7 as well as a number of analytical
models have been proposed by the community akin to mathematics and theoretical
computer science 8 .
A crucial point in this field is that many networks from different domains (social,
economic, technological and biological networks, etc) share some common properties. Some of the most notable common properties among a variety of complex
networks are small diameter (the “small world phenomenon”), the power law distribution (see Table 1 for a list with few examples of power law graphs according
to 10 ), high clustering coefficient among a few others 10,9 .
In this paper, generally speaking, we are interested in the following (quite broad)
question: can we explore these properties common to most real world networks in
order to obtain more efficient algorithms for combinatorial optimization problems?
This is an interesting practical question that still has to be explored more thoroughly. In our work we pick a particular optimization problem, the Minimum Vertex Cover, and conduct experiments taking advantage of one particular property
of complex networks: the Power Law Distribution. We define these concepts in the
next section.
Table 1.
Examples of power law networks according to Barabasi and Bonabeau 8 .
Networks
Nodes
# Links
Hollywood
Internet
Protein regulatory network
Research collaborations
Sexual relationships
World Wide Web
Celular Metabolism
Actors
Routers
Proteins
Scientists
People
Web pages
Molecules involved in burning
food for energy
Appearance in the same movie.
Optical and physical connections.
Interactions among proteins.
Co-autorship of papers.
Sexual contact.
URLs.
Participation in the same
biochemical reaction
The main point in our work is that since power law graphs contain a few very
large “hubs”, i.e., vertices of very high degree, naturally one might expect that a
greedy approach (i.e., choosing first these hubs for the covering) would perform
well. Although this idea is obviously quite natural, the authors are not aware of
any experimental or analytical work to confirm this expectation. More importantly,
a question that has also not being answered – as far as we are aware – is how
well (quantitatively) a greedy algorithm would perform on real world power law
networks. We have run such experiments on a number of graphs from different
domains from standard complex networks databases and we obtained coverings very
close to the optimum, more precisely around 1.02 times the optimum (the larger
obtained value was 1.05). We present this and other related results on Section
4. In Sections 2 and 3 we define the problem and discuss how the experimental
results obtained in this paper relate to the theoretical guarantees for approximation
algorithms for the vertex cover problem.
Vertex Cover in Complex Networks:Experimental Results
3
2. Power Law Graphs and Combinatorial Optimization
Barabasi and Albert 10 and also others 8 observed that in many real-world graphs,
the vertex degree sequence has a power law distribution, i.e., the fraction of vertices
with degree d is proportional to d−λ , where λ is a constant independent of the size
of the graph. The authors modelled such real world networks using a preferential attachment scheme which became well known in the field. Other researchers proposed
a number of other different approaches to model such networks 8 . Experimental
work in the field had pointed the λ mentioned above is 2.9 ± 0.1 and Bollobás et al
11
in 2001 presented an analytical argument pointing that such parameter is in fact
3. In our paper we will use the common designation “Power Law Graph” for graphs
with such distribution for the vertex degree sequence. In Figure 1 we give two examples the of vertex degree distribution from networks used in our experiments.
(a) Cpan authors (log-log plot)
(b) Cpan authors (binned log-log plot)
(c) Oclinks (log-log plot)
(d) Oclinks (binned log-log plot)
Fig. 1. Vertex degree distribution. The x-axis is the degree distribution from 0 to maximum value
and the y-axis is the number of vertices of such degree.
In the light of the fact that many networks from real world are in fact power law
graphs, many researchers in the field have been exploring this property structurally
and algorithmically 12,13,14 . Recently, some work also has been done on power law
4
M. O. DA SILVA; G. A. GIMENEZ-LUGO; M. V. G. DA SILVA
graphs aiming to lower the average factor of approximation algorithms for NPhard combinatorial problems 15,16,17 . To be more precise, in the case of the vertex
cover problem, although it is believed that 2 is the smallest constant factor for
an approximation algorithm, in the particular case of power law graphs, Gast and
Hauptmann proposes an algorithm which outputs an expected vertex cover not
larger than 1.5 times the optimum 15 . These analytical results are exactly in line
with what we propose experimentally in this paper. In particular, we show that in
practice simpler algorithms can achieve much smaller factors of approximation.
3. The vertex cover problem
A cover in a graph is a set of vertices whose removal completely disconnects a grapha .
A Minimum Vertex Cover is a smallest set with such property. Figure 2 depicts an
example of a minimum vertex cover in a graph. This problem is a classical NP-Hard
problem 18 which remains intractable even for cubic graphs and planar graphs with
maximum degree at most three 18 . Therefore approximation algorithms for finding
“good” solutions in polynomial time are of great interest. We discuss in the next
section the two most common approximation approaches in this direction.
(a) Graph G with 15 vertices and 20 edges
(b) The minimum vertex cover of G is the set of
highlighted vertices.
Fig. 2. Example of a Minimum Vertex Cover of a graph G: If the set of vertices highlighted in
(b) are removed from G, the resulting graph has no edges.
a A completely disconnected graph is a graph with no edges. Alternatively a vertex cover S is a set
of vertices such that every edge in the graph has an end in S.
Vertex Cover in Complex Networks:Experimental Results
5
3.1. Simple ideas for approximation algorithms
• Greedy Algorithm: Given a graph G, pick the vertex v with the highest
degree and insert it in the cover S and remove v from G. Repeat the same
strategy in the remaining graph iteratively until there are no more edges.
Such approach always finds a vertex cover S such that |S| is at most ln n
times the optimum, where n is the number of vertices in input graph G.
For a general graph this upper bound is tight 19 .
• 2-Approximation Algorithm: Given a graph G, pick an arbitrary edge
uv, insert uv in the set M and remove both u and v from G. Repeat the
same strategy in the remaining graph iteratively until there are no more
edges b . In the next step include in the vertex cover S both ends of each
edge in the set X. This approach leads to a cover S of size at most twice
the optimum 19 (i.e., this is an approximation algorithm with factor 2).
The 2-approximation algorithm presented above is known for about 40 years
and it is one of the best known until now for general graphs 20 . It is known
that no algorithm can guarantee a factor better than 1.36 21 (unless P=NP). In
fact it is conjectured an even stronger assertion: For any constant ϵ, there is no
(2 − ϵ)-approximation algorithm for the problem 22 . Furthermore, even for non constant guarantees, very small improvements on the factor 2 require very involved
algorithms. At the moment the best known approximation algorithm, proposed by
1
Karakostas 23 , has a factor of 2 − Θ( √log
) which is much larger than the numbers
n
that we obtained experimentally using the greedy approach.
In the domain of power law graphs, Gast and Hauptmann recently proposed an
algorithm and proved that it outputs on average a vertex cover of size 1.5 times the
optimum on the (α, β)-model of power law graphs proposed by Aiello 15 . This result
is the only one that we found on the topic of vertex cover in power law graphs.
4. Results for Real Networks and Discussion
In this section we present our experimental results addressing the minimum vertex
cover problem on 25 power law complex networks from well known graph databases
24,25,26
. The size of the networks range from a few hundred to about 75000 vertices.
In 15 of these networks we had the value of the optimum cover for comparison. In
these 15 networks our experiments show that a greedy approach outputs a typical
vertex cover of size around 1.02 times the optimum. The size of the coverings found
by the greedy algorithm are shown in the column named Greedy in Table 2 and the
rate of approximation in the column named Greedy/Opt in Table 3. Theoretically a
greedy approach can output results with a factor as bad as ln n, so the point here
b Note
that at the end of this process M is a maximal matching in G, i.e., M is a set of edges in
G such that for every distinct e, e′ ∈ M , edges e and e′ have no endnodes in common and there is
no other set of edges M ′ in G such that M ⊂ M ′ .
6
M. O. DA SILVA; G. A. GIMENEZ-LUGO; M. V. G. DA SILVA
is of course that such “obstructions” which would bring to a larger approximation
factor are not typical in power law graphs. So for that reason in practice the greedy
approach does very well.
We also implemented the 2-approximation algorithm discussed in the section
3.1 and obtained significantly worst results than the greedy algorithm. A typical
vertex covering obtained by the 2-approximation algorithm was around 1.56 times
the optimum. Again, these results were obtained from the 15 graphs for which we
had the optimum for comparison. These results are presented in Tables 2 and 3.
We note that we ran both the greedy algorithm and the 2-approximation algorithm
on all 25 networks (i.e., not only on the networks where we had the optimum for
comparison) and the results pointed that the greedy approach obtained a typical
vertex cover of size 0.66 times the size of the competing algorithm. These results are
presented in the column named Greedy / 2-App in Table 3. We included Table 4 to
refer to the well known databases from where the networks used in our experiments
were obtained. For the sake of completeness we also included the number of edges
for each network.
In order to obtain the optimum value for the vertex cover, we used the well
known SAGE Library 27 . We managed to obtain minimum coverings for graphs up
to about 10000 vertices, but for the 10 largest graphs we were not able to obtain
these values due to the exponential nature of the algorithm. It is still somewhat surprising that the optimum value could be computed for graphs sizing almost 10000
vertices. So it is very important to point out that although the optimum algorithm
is exponential, due to the intricate process in which particular graphs are explored,
successful executions for relatively large data were possible c .
Finally, we would like to add two remarks: (1) In the literature of complex
networks, in particular in the context of network robustness, some authors refer to
the sequence of vertices obtained by the greedy strategy described in this paper as
high degree adaptive attack 28 . (2) In a final note, as we have mentioned before, in
our experiments we also computed the ratio between the size of the cover obtained
by the greedy algorithm divided by the size of the cover cover obtained by the 2approximation algorithm and the results show that the cover obtained by the greedy
algorithm is about 66% of the size of the other approach (Table 3). It should also
be pointed that this number seems to be invariant to the size of the graph. This
fact might give us some indication that even for the 10 largest graphs (for which
the optimum values were not available for comparison) the size of the coverings
obtained by the greedy algorithm might still be very close to the optimum as well.
c We
ran our experiments on a R710 DELL sever running a Intel Xenon Quad Core 5500 y 5600
and a 60GB Swap Memory + 12GB Physical Memory. The smallest graph for which the SAGE
exponential approach “crashed” was running for about 12 hours and stopped due to lack of memory
Vertex Cover in Complex Networks:Experimental Results
Table 2. Algorithms for Vertex Cover in Complex Networks: (1) First column is the ID of
the graph. We use it to refer to the same graphs on other tables. (2) Second column is the
name of the file available on the databases. (3) Third column is the number of vertices of the
network. (4) Forth column is the size of the minimum vertex cover. We used the algorithm
from SAGE Library for computing these values. Data for larger graphs could not be obtained
due to the exponential nature of the algorithm (see section 4 for details). (5) Fifth column is
the vertex cover obtained by the greedy algorithm implemented by us. (6) Sixth column is
the vertex cover obtained by the 2-approximation algorithm implemented by us. (7) Seventh
column indicates the type of the network.
Net ID
Network file
# of vertices
Opt
Greedy
2-Aprox
Network type
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Airlines
US Air
Codeminer
Cpan authors
EuroSis
Oclinks
YeastS
CA-GrQc
p2p-Gnutella08
Wiki-Vote1
p2p-Gnutella09
p2p-Gnutella06
p2p-Gnutella05
CA-HepTh
p2p-Gnutella04
CA-AstroPh
p2p-Gnutella25
CA-CondMat
p2p-Gnutella24
Cit-HepTh
p2p-Gnutella30
Email-Enron
Brightkite-edges
p2p-Gnutella31
soc-Epinions1
235
332
724
839
1285
1899
2284
5241
6301
7115
8114
8717
8846
9875
10876
18771
22687
23133
26518
27769
36682
36692
58228
62586
75879
96
149
191
116
597
749
763
2783
2054
2249
2574
3405
3428
4981
4348
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
97
151
196
117
608
763
773
2795
2070
2370
2589
3484
3475
5003
4428
12044
6055
13561
7250
18225
9321
14477
22177
15864
22418
146
230
334
196
896
1100
1240
3960
3366
3460
4238
5352
5412
7240
6624
15194
9800
18150
11624
23396
15096
20674
34814
25582
35964
Technological
Technological
Social Net.
Social Net.
Information
Information
Biologic
Citation
Communication
Social Net.
Information
Information
Information
Citation
Collaboration
Collaboration
Information
Collaboration
Information
Citation
Information
Communication
Social Net.
Information
Social Net.
7
8
M. O. DA SILVA; G. A. GIMENEZ-LUGO; M. V. G. DA SILVA
Table 3. (1) First column is the graph ID. (2)
Second column shows the rate of approximation
obtained by the greedy algorithm (i.e., the value
obtained by the algorithm divided by the optimum). (3) Third column shows the rate of approximation obtained by the 2-approximation. (4)
Forth column shows the size of the cover obtained by the greedy algorithm divided by the
size of the cover obtained by the 2-approximation.
In the bottom of the table we include the average and the standard deviation for each column.
ID
Greedy
/Opt
2-App
/Opt
Greedy
/2-App
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1,01
1,01
1,03
1,01
1,02
1,02
1,01
1,00
1,01
1,05
1,01
1,02
1,01
1,00
1,02
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
1,52
1,54
1,75
1,69
1,50
1,47
1,63
1,42
1,64
1,54
1,65
1,57
1,58
1,45
1,52
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
n.a
0,66
0,66
0,59
0,60
0,68
0,69
0,62
0,71
0,61
0,68
0,61
0,65
0,64
0,69
0,67
0,79
0,62
0,75
0,62
0,78
0,62
0,70
0,64
0,62
0,62
average
st. dev.
1,02
0,01
1,56
0,09
0,66
0,05
Table 4. In this table we indicate the
databases from where networks were
obtained. For the sake of completeness
we also include the size of the vertex set and edge set for each network.
ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Database
Gephi
Pajek
Gephi
Gephi
Gephi
Gephi
Pajek
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Stanford
Vertices
235
332
724
839
1285
1899
2284
5241
6301
7115
8114
8717
8846
9875
10876
18771
22687
23133
26518
27769
36682
36692
58228
62586
75879
Edges
1295
2126
1015
2112
6462
13821
6646
14484
20776
100729
26013
31523
31837
25973
39993
198050
54705
93439
65368
352285
88328
183811
214023
147890
404953
References
1. D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly
Connected World. Cambridge University Press (2010)
2. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. S.
Tomkins, and J. Wiener. Graph structure in the Web. Computer networks, 33(1-6):309320, June 2000.
3. J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan and A. S. Tomkins. The
Web as a graph: measurements, models and methods. Proceedings of the 5th annual
international Conference on Computing and Combinatorics,1-17, 1999.
Vertex Cover in Complex Networks:Experimental Results
9
4. M. Faloutsos, P. Faloutsos and C. Faloutsos. On power-law relationships of the internet
topology. ACM SIGCOMM Computer Communication Review, 29(4):251-262, 1999.
5. M. Jovanović, F. S. Annexstein and K. A. Berman. Modeling peer-topeer network
topologies through “small-world” models and power laws. In IX Telecommunications
Forum, TELFOR, 2001
6. N. Guelzim, S. Bottani, P. Bourgine and F. Kps. Topological and causal structure of
the yeast transcriptional regulatory network. Nature Genetics, 31(1):60-63, 2002.
7. S. Eubank, V. S. A. Kumar, M. V. Marathe, A. Srinivasan and N. Wang. Structural
and algorithmic aspects of massive social networks. In Proceedings of the 15th annual
ACM-SIAM Symposium on Discrete Algorithms, pages 718-727. SIAM, 2004.
8. L. Lu and F. Chung. Complex Graphs and Networks. American Math. Society (2006).
9. A.L. Barabási and E. Bonabeau . Scale-Free Networks. Scientific American 288, 50-59,
2003.
10. A.L. Barabási and R. Albert. Emergence of scaling in random networks. Science,
286(5439):509, 1999.
11. B. Bollobás, O. Riordan, J. Spencer, G. Tusndy The Degree Sequence of a Scale-Free
Random Graph Process. Random Structures & Algorithms. Vol. 18 Issue 3, 279-290
(2001)
12. K. Park and H. Lee. On the effectiveness of route-based packet filtering for distributed
DoS attack prevention in power-law internets. ACM SIGCOMM Computer Communication Review, 31(4):15-26, October 2001.
13. C. Gkantsidis, M. Mihail and A. Saberi. Conductance and congestion in power law
graphs. SIGMETRICS Performance Evaluation Review, 31:148-159, 2003.
14. M. Mihail, C. H. Papadimitriou and A. Saberi. On certain connectivity properties of
the internet topology. Journal of Computer and System Sciences, 72(2):239-251, 2006.
15. M. Gast and M. Hauptmann. Approximability of the Vertex Cover Problem in Power
Law Graphs. Computing Research Repository (CoRR), arXiv:1204.0982, 2012.
16. M. Hauptmann, M. Gast and M. Karpinski. Inapproximability of Dominating Set in
Power Law Graphs. Computing Research Repository (CoRR), arXiv:1212.3517, 2012.
17. M. Hauptmann, M. Gast and M. Karpinski Improved Approximation Lower Bounds for
Vertex Cover on Power Law Graphs and Some Generalizations. Computing Research
Repository (CoRR), arXiv:1210.2698, 2012.
18. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory
of NP-Completeness. W. H. Freeman & Co. New York, NY (1979).
19. C. H. Papadimitriou and K. Steiglitz. Combinatorial optimization: algorithms and
complexity. Prentice Hall (1982).
20. S. Dasgupta, C.H. Papadimitriou and U.V. Vazirani. Algorithms. MG-Hill, 2008.
21. I. Dinur and S. Safra. On the hardness of approximating vertex cover. Annals of Mathematics, Pages 439-485 from Volume 162 (2005).
22. S. Khot, O. Regev. Vertex cover might be hard to approximate to within 2 − ϵ. Journal
of Computer and System Sciences Volume 74, Issue 3, May 2008, Pages 335-349.
23. G. Karakostas. A better approximation ratio for the vertex cover problem. ACM Transactions on Algorithms, Volume 5 Issue 4, Article No. 41, 2009.
24. GEPHI Datasets, http://wiki.gephi.org/index.php/Datasets
25. PAJEK Datasets, http://vlado.fmf.uni-lj.si/pub/networks/data
26. Stanford Large Network Dataset Collec. http://snap.stanford.edu/data
27. SAGE, http://www.sagemath.org/library.html
28. C. M. Schneider, A. A. Moreira, J. S. Andrade Jr., S. Havlin and H. J. Herrmann.
Onion-like Network Topology Enhances Robustness against Malicious Attacks. Journal
of Statistical Mechanics: Theory and Experiment (2011) P01027
View publication stats