research-article

Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities

Authors:

Hong Cheng,

Yang Zhou,

Jeffrey Xu YuAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 5, Issue 2

Article No.: 12, Pages 1 - 33

https://doi.org/10.1145/1921632.1921638

Published: 01 February 2011 Publication History

Get Access

Abstract

Social networks, sensor networks, biological networks, and many other information networks can be modeled as a large graph. Graph vertices represent entities, and graph edges represent their relationships or interactions. In many large graphs, there is usually one or more attributes associated with every graph vertex to describe its properties. In many application domains, graph clustering techniques are very useful for detecting densely connected groups in a large graph as well as for understanding and visualizing a large graph. The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Many existing graph clustering methods mainly focus on the topological structure for clustering, but largely ignore the vertex properties, which are often heterogenous. In this article, we propose a novel graph clustering algorithm, SA-Cluster, which achieves a good balance between structural and attribute similarities through a unified distance measure. Our method partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values. An effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity. Theoretical analysis is provided to show that SA-Cluster is converging quickly through iterative cluster refinement. Some optimization techniques on matrix computation are proposed to further improve the efficiency of SA-Cluster on large graphs. Extensive experimental results demonstrate the effectiveness of SA-Cluster through comparisons with the state-of-the-art graph clustering and summarization methods.

References

[1]

Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (SIGMOD’98). 94--105.

Abstract

References

Cited By

Index Terms

Recommendations

Clustering Large Attributed Graphs: An Efficient Incremental Approach

Parallel Edge Contraction for Large Nonplanar Graph Clustering

Effective and Scalable Clustering on Massive Attributed Graphs

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations