skip to main content
10.1145/967900.967931acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A new framework for clustering algorithm evaluation in the domain of functional genomics

Published: 14 March 2004 Publication History

Abstract

Clustering algorithms are widely used in the computational analysis of microarray data. However, due to the lack of domain knowledge, it is often difficult to judge their performance. In this paper, we introduce a new framework for the evaluation of clustering algorithms in application to regulatory pathway reconstruction. A pilot study was conducted on the hierarchical clustering algorithm for which we obtained qualitative characterizations of the number of samples needed as well as the denseness of the subnetwork required to achieve accurate partition. For experimental scientists, this evaluation framework provides a method to select and calibrate clustering algorithms. It can also provide a confidence measure to the results of a clustering algorithm when certain restrictions on the experimental setup, such as the number of samples available, are known in advance.

References

[1]
A.A. Alizadeh. Distinct types of diffuse large b-cell lymphoma identified by gene expression profilling. Nature, (403):503--511, 2000.
[2]
M. Bittner. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, (406):536--540, 2000.
[3]
G. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, (9):309--347, 1992.
[4]
M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA, (95):14863--8, 1998.
[5]
N. Friedman, I. N. M. Linial, and D. Pe'er. Using bayesian networks to analyze expression data. In Proceedings of the Fourth Annual Inter. Conf. on Computational Molecular Biology (RECOMB), 2000.
[6]
A. Hartemink, D. Gifford, T. Jaakkola, and R. Young. Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. In Proceedings of Pacific Symposium on Biocomputing, 2001.
[7]
D. Heckerman. A tutorial on learning with bayesian networks. In Learning in Graphical Models, 1999.
[8]
W. Lam and F. Bacchus. Learning bayesian belief networks: An approach based on the mdl principle. Comp. Int, (10):268--293, 1994.
[9]
S. Lauritzen. Propagation of probabilities, means, and variances in mixed graphical association models. Journal of the American Statistical Association, (87):1098--1108, 1992.
[10]
H. McAdams and A. Arkin. Stochastic mechanisms in gene expression. Proc Natl. Acad. Sci., (94):814--819, 1997.
[11]
J. Pearl. Fusion, propagation, and structuring in belief networks. Artifical Intelligence, (29):241--288, 1986.
[12]
D. Pe'er, A. Regev, G. Elidan, and N. Friedman. Inferring subnetworks from perturbed expression profiles. In Proceedings of the Ninth International Conference on Intelligent Systems for Molecular Biology (ISMB), 2001.
[13]
R. Shachter. Probabilistic inference and influence diagrams. Operations Research, (36):589--604, 1988.
[14]
Z. Szallasi. Genetic network analysis in light of massively parallel biological data acquisition. In Proceedings of Pacific Symposium on Biocomputing, 1999.
[15]
C. Yoo, V. Thorsson, and G. Cooper. Discovery of causal relationships in a gene-regulation pathway from a mixture of experimental and observational dna microarray data. In Proceedings of Pacific Symposium on Biocomputing, 2002.

Cited By

View all

Index Terms

  1. A new framework for clustering algorithm evaluation in the domain of functional genomics

      Recommendations

      Reviews

      Adrian Pasculescu

      Clustering algorithms, a special aspect of data mining, are the focus of this paper. Its main objective is to describe a way to evaluate different clustering algorithms. The immediate application that the author has in mind is in the domain of regulatory pathway reconstruction in functional genetics. This endeavor is inspired by the difficulties encountered by researchers in judging the best clustering, especially when such methods are used in domains where very little is known. The proposed framework bases its evaluation of different clustering algorithms on how well they partition nodes of Bayesian network models of gene interactions. The main reasons given for choosing this well-studied statistical method are that it has been adopted in the reconstruction of gene regulatory and interaction networks, and captures, albeit not perfectly, the stochastic process that gene regulation is believed to be. The main contribution of the paper, according to its author, is in establishing a framework by which scientists can select and calibrate clustering algorithms. The initial studies were conducted for hierarchical clustering algorithms, using correlation as a distance metric. It is the author's intent to further extend the framework, and the study, to k-means, self-organizing maps, support vector machines, and even to time clustering analysis of time series data. The author also promises to test the validity of the findings for larger networks (those with much more than 40 nodes). To fully understand the statistical and genetic background of the paper, readers should consult the specified references. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SAC '04: Proceedings of the 2004 ACM symposium on Applied computing
      March 2004
      1733 pages
      ISBN:1581138121
      DOI:10.1145/967900
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 March 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Bayesian network
      2. evaluation
      3. hierarchical clustering

      Qualifiers

      • Article

      Conference

      SAC04
      Sponsor:
      SAC04: The 2004 ACM Symposium on Applied Computing
      March 14 - 17, 2004
      Nicosia, Cyprus

      Acceptance Rates

      Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

      Upcoming Conference

      SAC '25
      The 40th ACM/SIGAPP Symposium on Applied Computing
      March 31 - April 4, 2025
      Catania , Italy

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media