research-article

Discovering top-k non-redundant clusterings in attributed graphs

Authors:

Gustavo Paiva Guedes,

Eduardo Ogasawara,

Eduardo Bezerra,

Geraldo XexeoAuthors Info & Claims

Neurocomputing, Volume 210, Issue C

Pages 45 - 54

https://doi.org/10.1016/j.neucom.2015.10.145

Published: 19 October 2016 Publication History

Abstract

Many graph clustering algorithms focus on producing a single partition of the vertices in the input graph. Nevertheless, a single partition may not provide sufficient insight about the underlying data. In this context, it would be interesting to explore alternative clustering solutions. Many areas, such as social media marketing demand exploring multiple clustering solutions in social networks to allow for behavior analysis to find, for example, potential customers or influential members according to different perspectives. Additionally, it would be desirable to provide not only multiple clustering solutions, but also to present multiple non-redundant ones, in order to unleash the possible many facets from the underlying dataset. In this paper, we propose RM-CRAG, a novel algorithm to discover the top-k non-redundant clustering solutions in attributed graphs, i.e., a ranking of clusterings that share the least amount of information, in the information theoretic sense. We also propose MVNMI, an evaluation criterion to assess the quality of a set of clusterings. Experimental results using different datasets show the effectiveness of the proposed algorithm.

References

[1]

M. Newman, Networks: An Introduction, Oxford University Press, Inc., New York, NY, USA, 2010.

Digital Library

[2]

A. Srivastava, A.J. Soto, E. Milios, Text clustering using one-mode projection of document-word bipartite graphs, in: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, ACM, New York, NY, USA, 2013, pp. 927-932.

Digital Library

[3]

Y. Zhou, H. Cheng, J.X. Yu, Graph clustering based on structural/attribute similarities, Proc. VLDB Endow. 2 (2009) 718-729.

Digital Library

[4]

D.F. Nettleton, Data mining of social networks represented as graphs, Comput. Sci. Rev., 7 (2013) 1-34.

Digital Library

[5]

M. McPherson, L. Smith-Lovin, J.M. Cook, Birds of a feather, Annu. Rev. Sociol., 27 (2001) 415-444.

[6]

S. Gupta, M. Juneja, D. Batra, Article, Int. J. Comput. Appl., 74 (2013) 1-5.

[7]

D.J. Watts, P.S. Dodds, M.E.J. Newman, Identity and search in social networks, Science, 296 (2002) 1302-1305.

[8]

S.E. Schaeffer, Survey, Comput. Sci. Rev., 1 (2007) 27-64.

Digital Library

[9]

D. Kempe, J. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, in: KDD '03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2003, pp. 137-146.

Digital Library

[10]

D. Kempe, J. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, ACM, New York, NY, USA, 2003, pp. 137-146.

Digital Library

[11]

R. Bhatt, V. Chaoji, R. Parekh, Predicting product adoption in large-scale social networks, in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM '10, ACM, New York, NY, USA, 2010, pp. 1039-1048.

Digital Library

[12]

S. Wasserman, K. Faust, Social Network Analysis: Methods and Applications, vol. 8, Cambridge University Press, 1994.

[13]

F. Stonedahl, W. Rand, U. Wilensky, Evolving viral marketing strategies, in: GECCO '10: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, 2010.

Digital Library

[14]

G.P. Guedes, E. Bezerra, E. Ogasawara, G. Xexéo, Exploring multiple clusterings in attributed graphs, in: Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC '15, ACM, New York, NY, USA, 2015, pp. 915-918.

Digital Library

[15]

A. Shimbel, Structural parameters of communication networks, Bull. Math. Biophys., 15 (1953) 501-507.

[16]

N. Christakis, J. Fowler, Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives, Little, Brown, 2009.

[17]

E. Müller, S. Günnemann, I. Färber, T. Seidl, Discovering multiple clustering solutions: grouping objects in different views of the data, in: Tutorial at IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, 2012, pp. 1207-1210.

Digital Library

[18]

E. Müller, S. Günnemann, I. Färber, T. Seidl, Discovering multiple clustering solutions: grouping objects in different views of the data, in: G.I. Webb, B. Liu, C. Zhang, D. Gunopulos, X. Wu (Eds.), ICDM, IEEE Computer Society, 2010, p. 1220.

Digital Library

[19]

Y. Cui, X.Z. Fern, J.G. Dy, Learning multiple nonredundant clusterings, ACM Trans. Knowl. Discov. Data, 4 (2010) 15:1-15:32.

Digital Library

[20]

D. Niu, J.G. Dy, M.I. Jordan, Multiple non-redundant spectral clustering views, in: J. Fürnkranz, T. Joachims (Eds.), ICML, Omnipress, 2010, pp. 831-838.

[21]

E. Bae, J. Bailey, Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity, in: ICDM, IEEE Computer Society, 2006, pp. 53-62.

Digital Library

[22]

I. Bifulco, C. Fedullo, F. Napolitano, G. Raiconi, R. Tagliaferri, Global optimization, meta clustering and consensus clustering for class prediction, in: IJCNN, IEEE, 2009, pp. 332-339.

Digital Library

[23]

R. Caruana, M. Elhawary, N. Nguyen, Meta clustering, in: Proceedings of IEEE International Conference on Data Mining.

Digital Library

[24]

S. Chaudhuri, L. Gravano, Evaluating top-k selection queries, in: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999, pp. 397-410.

Digital Library

[25]

I.F. Ilyas, G. Beskales, M.A. Soliman, A survey of top-k query processing techniques in relational database systems, ACM Comput. Surv., 40 (2008) 11:1-11:58.

Digital Library

[26]

G. Li, S. Günnemann, M.J. Zaki, Stochastic subspace search for top-k multi-view clustering, in: Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering, in Conjunction with KDD 2013, Chicago, IL, USA, August 11, 2013, p. 3.

Digital Library

[27]

D. Pfitzner, R. Leibbrandt, D. Powers, Characterization and evaluation of similarity measures for pairs of clusterings, Knowl. Inf. Syst., 19 (2009) 361-394.

Digital Library

[28]

S. Wagner, D. Wagner, Comparing Clusterings - An Overview (2007).

[29]

D. Pfitzner, R. Leibbrandt, D. Powers, Characterization and evaluation of similarity measures for pairs of clusterings, Knowl. Inf. Syst., 19 (2009) 361-394.

Digital Library

[30]

I. Färber, S. Günnemann, H. Kriegel, P. Kröger, E. Müller, E. Schubert, T. Seidl, A. Zimek, On using class-labels in evaluation of clusterings, in: MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, 2010.

[31]

A. Strehl, J. Ghosh, Cluster ensembles - a knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res., 3 (2003) 583-617.

Digital Library

[32]

J. Wang, Mean-variance analysis: a new document ranking theory in information retrieval, in: M. Boughanem, C. Berrut, J. Moth, C. Soul-Dupuy (Eds.), ECIR, Lecture Notes in Computer Science, vol. 5478, Springer, 2009, pp. 4-16.

Digital Library

[33]

H. Markowitz, Portfolio selection, J. Finance, 7 (1952) 77-91.

[34]

Z. He, X. Xu, S. Deng, k-anmi, Inf. Fusion, 9 (2008) 223-233.

Digital Library

[35]

H. Izakian, W. Pedrycz, Agreement-based fuzzy c-means for clustering data with blocks of features, Neurocomputing, 127 (2014) 266-280.

Digital Library

[36]

R. Koch, The 80/20 Principle, Doubleday, 1999.

[37]

N.J.N.P.E. Hart, B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Trans. Syst. Sci. Cybern., 4 (1968) 100-107.

[38]

C.E. Izard, D.Z. Libero, P. Putnam, O.M. Haynes, Stability of emotion experiences and their relations to traits of personality, J. Pers. Soc. Psychol., 64 (1993) 847-860.

[39]

MQD, Mqd500b dataset {https://sourceforge.net/p/gpca/wiki/MQD500B/}, 2014.

[40]

M.Q. Diario, Meu querido diário, 2009.

[41]

MQD, Mqd500c dataset {https://sourceforge.net/p/gpca/wiki/MQD500B/}, 2015.

[42]

M.Q. Diario, Dblp3000 dataset, 2015.

[43]

C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J., 27 (1948) 379-423.

[44]

A.C.G.V. Lazo, P.N. Rathie, On the entropy of continuous probability distributions (corresp.), IEEE Trans. Inf. Theory, 24 (1978) 120-122.

Digital Library

[45]

H. Munaga, M.D.R.M. Sree, J.V.R. Murthy, Dentrac, Int. J. Comput. Appl., 41 (2012) 17-21.

[46]

P. Jaccard, The distribution of the flora in the alpine zone, New Phytol., 11 (1912) 37-50.

Discovering top-k non-redundant clusterings in attributed graphs
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning

Recommendations

Exploring multiple clusterings in attributed graphs
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Many graph clustering algorithms aim at generating a single partitioning (clustering) of the data. However, there can be many ways a dataset can be clustered. From a exploratory analisys perspective, given a dataset, the availability of many different ...
Towards subjectifying text clustering
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Although it is common practice to produce only a single clustering of a dataset, in many cases text documents can be clustered along different dimensions. Unfortunately, not only do traditional text clustering algorithms fail to produce multiple ...
Spectral Clustering of Attributed Multi-relational Graphs
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Graph clustering aims at discovering a natural grouping of the nodes such that similar nodes are assigned to a common cluster. Many different algorithms have been proposed in the literature: for simple graphs, for graphs with attributes associated to ...

Comments

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 210, Issue C

October 2016

303 pages

ISSN:0925-2312

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 19 October 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents