skip to main content
research-article

Discovering top-k non-redundant clusterings in attributed graphs

Published: 19 October 2016 Publication History

Abstract

Many graph clustering algorithms focus on producing a single partition of the vertices in the input graph. Nevertheless, a single partition may not provide sufficient insight about the underlying data. In this context, it would be interesting to explore alternative clustering solutions. Many areas, such as social media marketing demand exploring multiple clustering solutions in social networks to allow for behavior analysis to find, for example, potential customers or influential members according to different perspectives. Additionally, it would be desirable to provide not only multiple clustering solutions, but also to present multiple non-redundant ones, in order to unleash the possible many facets from the underlying dataset. In this paper, we propose RM-CRAG, a novel algorithm to discover the top-k non-redundant clustering solutions in attributed graphs, i.e., a ranking of clusterings that share the least amount of information, in the information theoretic sense. We also propose MVNMI, an evaluation criterion to assess the quality of a set of clusterings. Experimental results using different datasets show the effectiveness of the proposed algorithm.

References

[1]
M. Newman, Networks: An Introduction, Oxford University Press, Inc., New York, NY, USA, 2010.
[2]
A. Srivastava, A.J. Soto, E. Milios, Text clustering using one-mode projection of document-word bipartite graphs, in: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC '13, ACM, New York, NY, USA, 2013, pp. 927-932.
[3]
Y. Zhou, H. Cheng, J.X. Yu, Graph clustering based on structural/attribute similarities, Proc. VLDB Endow. 2 (2009) 718-729.
[4]
D.F. Nettleton, Data mining of social networks represented as graphs, Comput. Sci. Rev., 7 (2013) 1-34.
[5]
M. McPherson, L. Smith-Lovin, J.M. Cook, Birds of a feather, Annu. Rev. Sociol., 27 (2001) 415-444.
[6]
S. Gupta, M. Juneja, D. Batra, Article, Int. J. Comput. Appl., 74 (2013) 1-5.
[7]
D.J. Watts, P.S. Dodds, M.E.J. Newman, Identity and search in social networks, Science, 296 (2002) 1302-1305.
[8]
S.E. Schaeffer, Survey, Comput. Sci. Rev., 1 (2007) 27-64.
[9]
D. Kempe, J. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, in: KDD '03: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, 2003, pp. 137-146.
[10]
D. Kempe, J. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, ACM, New York, NY, USA, 2003, pp. 137-146.
[11]
R. Bhatt, V. Chaoji, R. Parekh, Predicting product adoption in large-scale social networks, in: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM '10, ACM, New York, NY, USA, 2010, pp. 1039-1048.
[12]
S. Wasserman, K. Faust, Social Network Analysis: Methods and Applications, vol. 8, Cambridge University Press, 1994.
[13]
F. Stonedahl, W. Rand, U. Wilensky, Evolving viral marketing strategies, in: GECCO '10: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, 2010.
[14]
G.P. Guedes, E. Bezerra, E. Ogasawara, G. Xexéo, Exploring multiple clusterings in attributed graphs, in: Proceedings of the 30th Annual ACM Symposium on Applied Computing, SAC '15, ACM, New York, NY, USA, 2015, pp. 915-918.
[15]
A. Shimbel, Structural parameters of communication networks, Bull. Math. Biophys., 15 (1953) 501-507.
[16]
N. Christakis, J. Fowler, Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives, Little, Brown, 2009.
[17]
E. Müller, S. Günnemann, I. Färber, T. Seidl, Discovering multiple clustering solutions: grouping objects in different views of the data, in: Tutorial at IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, 2012, pp. 1207-1210.
[18]
E. Müller, S. Günnemann, I. Färber, T. Seidl, Discovering multiple clustering solutions: grouping objects in different views of the data, in: G.I. Webb, B. Liu, C. Zhang, D. Gunopulos, X. Wu (Eds.), ICDM, IEEE Computer Society, 2010, p. 1220.
[19]
Y. Cui, X.Z. Fern, J.G. Dy, Learning multiple nonredundant clusterings, ACM Trans. Knowl. Discov. Data, 4 (2010) 15:1-15:32.
[20]
D. Niu, J.G. Dy, M.I. Jordan, Multiple non-redundant spectral clustering views, in: J. Fürnkranz, T. Joachims (Eds.), ICML, Omnipress, 2010, pp. 831-838.
[21]
E. Bae, J. Bailey, Coala: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity, in: ICDM, IEEE Computer Society, 2006, pp. 53-62.
[22]
I. Bifulco, C. Fedullo, F. Napolitano, G. Raiconi, R. Tagliaferri, Global optimization, meta clustering and consensus clustering for class prediction, in: IJCNN, IEEE, 2009, pp. 332-339.
[23]
R. Caruana, M. Elhawary, N. Nguyen, Meta clustering, in: Proceedings of IEEE International Conference on Data Mining.
[24]
S. Chaudhuri, L. Gravano, Evaluating top-k selection queries, in: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999, pp. 397-410.
[25]
I.F. Ilyas, G. Beskales, M.A. Soliman, A survey of top-k query processing techniques in relational database systems, ACM Comput. Surv., 40 (2008) 11:1-11:58.
[26]
G. Li, S. Günnemann, M.J. Zaki, Stochastic subspace search for top-k multi-view clustering, in: Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering, in Conjunction with KDD 2013, Chicago, IL, USA, August 11, 2013, p. 3.
[27]
D. Pfitzner, R. Leibbrandt, D. Powers, Characterization and evaluation of similarity measures for pairs of clusterings, Knowl. Inf. Syst., 19 (2009) 361-394.
[28]
S. Wagner, D. Wagner, Comparing Clusterings - An Overview (2007).
[29]
D. Pfitzner, R. Leibbrandt, D. Powers, Characterization and evaluation of similarity measures for pairs of clusterings, Knowl. Inf. Syst., 19 (2009) 361-394.
[30]
I. Färber, S. Günnemann, H. Kriegel, P. Kröger, E. Müller, E. Schubert, T. Seidl, A. Zimek, On using class-labels in evaluation of clusterings, in: MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings Held in Conjunction with KDD 2010, Washington, DC, 2010.
[31]
A. Strehl, J. Ghosh, Cluster ensembles - a knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res., 3 (2003) 583-617.
[32]
J. Wang, Mean-variance analysis: a new document ranking theory in information retrieval, in: M. Boughanem, C. Berrut, J. Moth, C. Soul-Dupuy (Eds.), ECIR, Lecture Notes in Computer Science, vol. 5478, Springer, 2009, pp. 4-16.
[33]
H. Markowitz, Portfolio selection, J. Finance, 7 (1952) 77-91.
[34]
Z. He, X. Xu, S. Deng, k-anmi, Inf. Fusion, 9 (2008) 223-233.
[35]
H. Izakian, W. Pedrycz, Agreement-based fuzzy c-means for clustering data with blocks of features, Neurocomputing, 127 (2014) 266-280.
[36]
R. Koch, The 80/20 Principle, Doubleday, 1999.
[37]
N.J.N.P.E. Hart, B. Raphael, A formal basis for the heuristic determination of minimum cost paths, IEEE Trans. Syst. Sci. Cybern., 4 (1968) 100-107.
[38]
C.E. Izard, D.Z. Libero, P. Putnam, O.M. Haynes, Stability of emotion experiences and their relations to traits of personality, J. Pers. Soc. Psychol., 64 (1993) 847-860.
[39]
MQD, Mqd500b dataset {https://sourceforge.net/p/gpca/wiki/MQD500B/}, 2014.
[40]
M.Q. Diario, Meu querido diário, 2009.
[41]
MQD, Mqd500c dataset {https://sourceforge.net/p/gpca/wiki/MQD500B/}, 2015.
[42]
M.Q. Diario, Dblp3000 dataset, 2015.
[43]
C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J., 27 (1948) 379-423.
[44]
A.C.G.V. Lazo, P.N. Rathie, On the entropy of continuous probability distributions (corresp.), IEEE Trans. Inf. Theory, 24 (1978) 120-122.
[45]
H. Munaga, M.D.R.M. Sree, J.V.R. Murthy, Dentrac, Int. J. Comput. Appl., 41 (2012) 17-21.
[46]
P. Jaccard, The distribution of the flora in the alpine zone, New Phytol., 11 (1912) 37-50.
  1. Discovering top-k non-redundant clusterings in attributed graphs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Neurocomputing
    Neurocomputing  Volume 210, Issue C
    October 2016
    303 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 19 October 2016

    Author Tags

    1. 00-01
    2. 99-00
    3. Attributed graphs
    4. Multiple clusterings
    5. Non-redundant clusterings
    6. Spectral clustering
    7. Top-k clusterings
    8. Top-k non-redundant clusterings

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Jan 2025

    Other Metrics

    Citations

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media