skip to main content
10.1145/1353343.1353423acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

OrthoCluster: a new tool for mining synteny blocks and applications in comparative genomics

Published: 25 March 2008 Publication History

Abstract

By comparing genomes among both closely and distally related species, comparative genomics analysis characterizes structures and functions of different genomes in both conserved and divergent regions. Synteny blocks, which are conserved blocks of genes on chromosomes of related species, play important roles in comparative genomics analysis. Although a few tools have been designed to identify synteny blocks, most of them cannot handle some challenging application requirements, particularly the strandedness of genes, gene inversions, gene duplications, and comparison of more than two genomes. We developed a data mining tool, Ortho-Cluster, which can handle all those challenges. It is publicly available at http://genome.sfu.ca/projects/orthocluster. OrthoCluster takes the annotated gene sets of candidate genomes and pairwise orthologous relationships as input and efficiently identifies the complete set of synteny blocks. In addition, OrthoCluster identifies four types of genome rearrangement events namely inversion, transposition, insertion/deletion, and reciprocal translocation. To be fleexible in various application scenarios, OrthoCluster comes with a systematic set of parameters such as the synteny block size, number of mismatches allowed, whether the strandedness is enforced, whether gene ordering is preserved. Furthermore, OrthoCluster can be used to identify segmental duplication in a genome. In this paper, we introduce the major technical ideas, and present some interesting findings using OrthoCluster.

References

[1]
M. D. Adams, et al. The genome sequence of drosophila melanogaster. Science, 287(5461):2185--95, 2000.
[2]
S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research, 25:3389--3402, 1997.
[3]
G. Bejerano, et al. Ultraconserved elements in the human genome. Science, 304(5675):1321--5, 2004.
[4]
E. Birney, et al. Identification and analysis of functional elements in 1genome by the encode pilot project. Nature, 447(7146):799--816, 2007.
[5]
T. Blumenthal. Operons in eukaryotes. Brief Funct Genomic Proteomic, 3(3):199--211, 2004.
[6]
P. P. Calabrese, S. Chakravarty, and T. J. Vision. Fast identification and statistical evaluation of segmental homologies in comparative maps. Bioinformatics, 19(Supplement 1):i74--80, 2003.
[7]
S. B. Cannon, et al. Diaghunter and genopix2d: programs for genomic comparisons, large-scale homology discovery and visualization. Genome Biology, 4(10):R68, 2003.
[8]
N. Chen, T. W. Harris, I. Antoshechkin, C. Bastiani, T. Bieri, D. Blasiar, K. Bradnam, P. Canaran, and J. Chan C. K. Chen, et al. Wormbase: a comprehensive data resource for caenorhabditis biology and genomics. Nucleic acids research, 33:D383--389, 2005.
[9]
N. Chen and L. D. Stein. Conservation and functional significance of gene topology in the genome of caenorhabditis elegans. Genome Res, 16(5):606--17, 2006.
[10]
A. Coghlan and K. H. Wolfe. Fourfold faster rate of genome rearrangement in nematodes than in drosophila. Genome research, 12:857--867, 2002.
[11]
Consortium. Genome sequence of the nematode c. elegans: a platform for investigating biology. Science, 282(5396):2012--8, 1998.
[12]
J. Couzin. Human genome. hapmap launched with pledges of $100 million. Science, 298(5595):941--2, 2002.
[13]
R. D. Fleischmann, et al. Whole-genome random sequencing and assembly of haemophilus influenzae rd. Science, 269(5223):496--512, 1995.
[14]
A. Goffeau, et al. Life with 6000 genes. Science, 274(5287):546, 563--7, 1996.
[15]
B. J. Haas, A. L. Delcher, J. R. Wortman, and S. L. Salzberg. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics, 20(18):3643--6, 2004.
[16]
R. C. Hardison. Comparative genomics. PLoS Biol, 1(2):E58, 2003.
[17]
L. D. Hurst, C. Pal, and M. J. Lercher. The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet, 5(4):299--310, 2004.
[18]
F. Jacob, et al. Operon: a group of genes with the expression coordinated by an operator. C R Hebd Seances Acad Sci, 250:1727--9, 1960.
[19]
E. S. Lander, et al. Initial sequencing and analysis of the human genome. Nature, 409(6822):860--921, 2001.
[20]
S. Levy, et al. The diploid genome sequence of an individual human. PLoS Biol, 5(10):e254, 2007.
[21]
N. Luc, et al. Gene teams: a new formalization of gene clusters for comparative genomics. Computational Biology and Chemistry, 27(1):59--67, 2003.
[22]
W. Miller, et al. Comparative genomics. Annu Rev Genomics Hum Genet, 5:15--56, 2004.
[23]
E. W. Myers, et al. A whole-genome assembly of drosophila. Science, 287(5461):2196--204, 2000.
[24]
K. P. O'Brien, M. Remm, and E. L. Sonnhammer. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic acids research, 33:D476--480, 2005.
[25]
S. Ohno. Evolution by Gene Duplication. Springer-Verlag, New York, 1970.
[26]
R. Rymon. Search through systematic set enumeration. In Proc. 1992 Int. Conf. Principle of Knowledge Representation and Reasoning (KR'92), pages 539--550, Cambridge, MA, 1992.
[27]
D. Sankoff. Comparative mapping and genome rearrangement. From Jay Lush to genomics: Visions for animal breeding andgenetics, pages 124--134, 1999.
[28]
J. Sebat. Major changes in our dna lead to major changes in our thinking. Nat Genet, 39(7 Suppl):S3--5, 2007.
[29]
J. Sebat, et al. Strong association of de novo copy number mutations with autism. Science, 316(5823):445--9, 2007.
[30]
C. Semple and K. H. Wolfe. Gene duplication and gene conversion in the caenorhabditis elegans genome. Journal of molecular evolution, 48:555--564, 1999.
[31]
A. U. Sinha and J. Meller. Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms. BMC Bioinformatics, 8, 2007.
[32]
C. Soderlund, et al. Symap: A system for discovering and viewing syntenic regions of fpc maps. Genome Research, 16(9):1159--68, 2006.
[33]
J. E. Stajich, D. Block, K. Boulez, S. E. Brenner, S. A. Chervitz, C. Dagdigian, G. Fuellen, J. G. Gilbert, I. Korf, and H. Lapp, et al. The bioperl toolkit: Perl modules for the life sciences. Genome research, 12:1611--1618, 2002.
[34]
L. D. Stein, Z. Bao, D. Blasiar, T. Blumenthal, M. R. Brent, N. Chen, A. Chinwalla, L. Clarke, C. Clee, and A. Coghlan, et al. The genome sequence of caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol, 1:E45, 2003.
[35]
L. D. Stein, C. Mungall, S. Shu, M. Caudy, M. Mangone, A. Day, E. Nickerson, J. E. Stajich, T. W. Harris, and A. Arva, et al. The generic genome browser: a building block for a model organism system database. Genome research, 12:1599--1610, 2002.
[36]
G. Tesler. GRIMM: genome rearrangements web server. Bioinformatics, 18(3):492--493, 2002.
[37]
K. Vandepoele, et al. The automatic detection of homologous regions (adhore) and its application to microcolinearity between arabidopsis and rice. Genome Research, 12(11):179--1801, 2002.
[38]
J. C. Venter, et al. The sequence of the human genome. Science, 291(5507):1304--51, 2001.
[39]
X. Xie, et al. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of ctcf insulator sites. Proc Natl Acad Sci U S A, 104(17):7145--50, 2007.

Cited By

View all

Index Terms

  1. OrthoCluster: a new tool for mining synteny blocks and applications in comparative genomics

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology
          March 2008
          762 pages
          ISBN:9781595939265
          DOI:10.1145/1353343
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 25 March 2008

          Permissions

          Request permissions for this article.

          Check for updates

          Qualifiers

          • Research-article

          Conference

          EDBT '08

          Acceptance Rates

          Overall Acceptance Rate 7 of 10 submissions, 70%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)60
          • Downloads (Last 6 weeks)13
          Reflects downloads up to 14 Sep 2024

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media