skip to main content
10.1145/1569901.1569933acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Modeling evolutionary fitness for DNA motif discovery

Published: 08 July 2009 Publication History

Abstract

The motif discovery problem consists of finding over-represented patterns in a collection of sequences. Its difficulty stems partly from the large number of possibilities to define both the motif space to be searched and the notion of over-representation. Since the size of the search space is generally exponential in the motif length, many heuristic methods, including evolutionary algorithms, have been developed. However, comparatively little attention has been devoted to the adequate evaluation of motif quality, especially when comparing motifs of different lengths. We propose an evolution strategy to solve the motif discovery problem based on a new fitness function that simultaneously takes into account (1) the number of motif occurrences, (2) the motif length, and (3) its information content. Experimental results show that the proposed method succeeds in uncovering the correct motif positions and length with high accuracy.

References

[1]
T. L. Bailey and C. Elkan. Fitting a mixture model by expectation maximization to discover motifs in biopolymer. In ISMB' 94, pages 28--36, 1994.
[2]
H.-G. Beyer and H.-P. Schwefel. Evolution strategies -- a comprehensive introduction. Natural Computing: an international journal, 1(1): 3--52, 2002.
[3]
T.-M. Chan, K.-S. Leung, and K.-H. Lee. TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics, 24(3):341--349, 2007.
[4]
C. Congdon, J. Aman, G. Nava, H. Gaskins, and C. Mattingly. An evaluation of information content as a metric for the inference of putative conserved noncoding regions in DNA sequences using a genetic algorithms approach. IEEE/ACM Trans. Comput. Biol. Bioinf., 5(1):1--14, 2008.
[5]
T. M. Cover and J. Thomas. Elements of Information Theory. John Wiley&Sons, New York, 1991.
[6]
F. Fauteux, M. Blanchette, and M. V. Stromvik. Seeder: discriminative seeding DNA motif discovery. Bioinformatics, 24(20):2303---2307, 2008.
[7]
G. B. Fogel, D. G. Weekes, G. Varga, E. R. Dow, H. B. Harlow, J. E. Onyia, and C. Su. Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucleic Acids Res, 32(13): 3826--3835, 2004.
[8]
E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. MotifCut: regulatory motifs finding with maximum density subgraphs. Bioinformatics, 22(14):e150--e157, 2006.
[9]
S. T. Jensen, X. S. Liu, Q. Zhou, and J. S. Liu. Computational discovery of gene regulatory binding motifs: a Bayesian perspective. Statistical Science, 19(1):188---204, 2004.
[10]
J. Kalinowski et al. The complete Corynebacterium glutamicum ATCC 13032 genome sequence and its impact on the production of l-aspartate-derived amino acids and vitamins. Journal of Biotechnology, 104(1-3):5--25, 2003.
[11]
M. Kaya. Motif discovery using multi-objective genetic algorithm in biosequences. Advances in Intelligent Data Analysis VII, 4723:320--331, 2007.
[12]
T. A. Kohl, J. Baumbach, B. Jungwirth, A. Pühler, and A. Tauch. The GlxR regulon of the amino acid producer Corynebacterium glutamicum: In silico and in vitro detection of DNA binding sites of a global transcription regulator. Journal of Biotechnology, 135(4): 340--350, 2008.
[13]
M. Lones and A. Tyrrell. Regulatory motif discovery using a population clustering evolutionary algorithm. IEEE/ACM Trans. Comput. Biol. Bioinf., 4(3):403--414, 2007.
[14]
M. A. Lones and A. M. Tyrrell. The evolutionary computation approach to motif discovery in biological sequences. In GECCO'05, pages 1--11. ACM, 2005.
[15]
T. K. Paul and H. Iba. Identification of weak motifs in multiple biological sequences using genetic algorithm. In GECCO'06, pages 271--278. ACM, 2006.
[16]
G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res, 32(Web Server Issue):W199--W203, 2004.
[17]
S. Rahmann, T. Müller, and M. Vingron. On the power of profiles for transcription factor binding site detection. Statistical Applications in Genetics and Molecular Biology, 2(1):Article 7, 2003.
[18]
K. Robinson, A. McGuire, and G. Church. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K 12 genome. Journal of Molecular Biology, 284:241--254, 1998.
[19]
A. Sandelin, W. Alkema, P. G. Engström, W. W. Wasserman, and B. Lenhard. JASPAR: an open access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res, 32(1):D91--D94, 2004.
[20]
G. Sandve, O. Abul, V. Walseng, and F. Drabløs. Improved benchmarks for computational motif discovery. BMC Bioinformatics, 8(1):193, 2007.
[21]
G. Sandve and F. Drabløs. A survey of motif discovery methods in an integrated framework. Biology Direct, 1:Article 11, 2006.
[22]
T. Schneider and R. Stephens. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res, 18:6097--6100, 1990.
[23]
T. Schneider, G. Stromo, L. Gold, and A. Ehrenfeucht. Information content of binding sites on nucleotide sequences. Journal of Molecular Biology, 188(3):415--431, 1986.
[24]
S. Sinha and M. Tompa. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res, 31(13):3586--3588, 2003.
[25]
G. D. Stormo. DNA binding sites: representation and discovery. Bioinformatics, 16:16--23, 2000.
[26]
M. Tompa et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology, 23:137--144, 2005.
[27]
Z. Wei and S. T. Jensen. GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics, 22(13):1577--1584, 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation
July 2009
2036 pages
ISBN:9781605583259
DOI:10.1145/1569901
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computational biology
  2. dna
  3. ea
  4. es
  5. evolution strategies
  6. evolutionary algorithms
  7. local search
  8. motif discovery
  9. transcription factor

Qualifiers

  • Research-article

Conference

GECCO09
Sponsor:
GECCO09: Genetic and Evolutionary Computation Conference
July 8 - 12, 2009
Québec, Montreal, Canada

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media