skip to main content
10.1145/332306.332566acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article
Free access

PDB_ISL: an intermediate sequence library for protein structure assignment

Published: 08 April 2000 Publication History

Abstract

For large scale structural assignment to sequences, as in computational structural genomics, a fast yet sensitive homology search procedure is essential. A new approach using intermediate sequences was tested as a shortcut to iterative multiple sequence search methods such as PSI-BLAST and hidden Markov models. A library containing potential intermediate sequences for proteins of known structure (PDB_ISL) was constructed. The sequences in the library were collected from a large sequence database using the sequences of the domains of proteins of known structure as the query sequences and the program PSI-BLAST. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using any pairwise sequence comparison methods to find homologues in PDB_ISL. Searches of PDB_ISL were calibrated, and the number of correct matches found at a given error rate was the same as that found by PSI_BLAST. The advantage of this library is that it uses pairwise sequence comparison methods, such as FASTA or BLAST2, and can, therefore, be searched easily and, in many cases, much more quickly than an iterative multiple sequence comparison method. The procedure is roughly twenty times faster than PSI-BLAST for small genomes and several hundred times for large genomes such as C. elegans.
Sequences can be submitted to the PDB_ISL servers at
http://stash.mrc-lmb.cam.ac.uk/PDB_ISL/
ftp://ftp.ebi.ac.uk/pub/databases/pdb_isl/

References

[1]
Altschul, S F, Madden, TL, Schaf'fer, A A, Zhang, J, Zhang, Z, Miller, W., and Lipman, D I, (1997), Gapped BLAST and PSI-BLAST a new generation of protein database search programs Nucleic Acids Res, 25, 3389-3402
[2]
Bateman, A, Burney, E, Durbin, R, Eddy, S R, Finn, R D and Sonnhammer, ELL (1999) Pfam 31 1313 multiple alignments and profile HMMs match the majority of proteins Nucleic Acids Res, 27, 260-262
[3]
Brenner, S E, Chothia C and Hubbard TIP (1998) Assessing sequence comparison methods with reliable structurally identified evolutionary relationships Proc Natl Acad Sci USA, 95, 6073-6078.
[4]
Eddy, S R (1996) Hidden Markov Models Curr Op Struc Biol, 3. 361-365
[5]
Elofsson, A and Sonnhammer, E L (1999) A comparison of sequence and structure protein domain families as a basis for structural genomics. Bioinformatics, 15(6), 480-500
[6]
Fischer, D and Eisenberg, D (1997) Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium Proc Natl Acad Sci USA, 94, 11929-11934
[7]
Fraser, C M, Gocayne, J D, White, O, el al, (1995) The minimal gene complement of Mycoplasma genitalium Science, 270, 397-403
[8]
Gerstein. M (1997) A Structural Census of Genomes Comparing Bacterial, Eukaryotic and Archaeal Genomes in Terms of Protein Structure J Mol Biol, 274, 562-576
[9]
Grundy, W N (1998) Homology detection via family pairwise search J Comput Biol, 5(3), 479-491
[10]
Grundy, W N and Bailey, T L (1999) Family pairwise search with embedded motif models Biomformatics, 15(6), 463- 470
[11]
Holm, L. and Sander, C (1998) Removing near-neighbour redundancy from large protein sequence collections Bioinformatics, 14, 423-429.
[12]
Huynen. M. Doerks, T, Eisenhaber, F, Orengo, C, Sunyaev, S. Yuan, Y and Bork, P (1998) Homology-based Fold Predictions for Mycoplasma genitalium proteins J Mol Biol, 280, 323-326
[13]
Janccek. S and Bateman, A (1996) The parallel (a/b)8 bairel Perhaps the most universal and the most puzzling fold Biologia, 51, 613-628.
[14]
Karplus, K, Barrett, C. and Itughcy, R (1999) Hidden Markov Models for Detecting Remote Protein Homologies Bioinformatics, 14, 846-856
[15]
Krogh, A. Blown, M, Mian, IS, Sjolander, K., and Haussler. D (1994) Hidden Markov-Models in Computational Biology- Applications in Protein Modelling J Mol Biol, 235, 1501-1531
[16]
Murzin, A G, Brenner, S E, Hubbard, T and Chothia, C (1995) SCOP A structural classification of proteins database for the investigation of sequences and structures dJ Mol Biol, 247, 536-540
[17]
Park. I. Terchmann, S A, Hubbard, T, and Chothia, C, (1997), Intermediate sequences increase the detection of distant sequence homologies J Mol Biol 273, 349-354.
[18]
Park, I and Teichmann, SA (1998) DIVCLUS an automatic method in the GEANFAMMER package that finds homologous dommns in single- and multi-domain proteins Bioinformatics, 14, 144-150
[19]
Park, I, Karplus, K, Barrett, C, Hughey, R., Haussler, D., Hubbard, T and Chothia, C. (1998) Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods d Moi Biol, 284, 1201-1210.
[20]
Park, J. Holm, L, and Chothia, C (1999) Sequence search algorithms assessment and testing toolkit' SAT Bioinformatics, submitted.
[21]
Pearson. W R. and Lipman, D J, (1988), Improved tools for biological sequence comparison Proc Natl Acad Sci USA, 85(8). 2444-2448
[22]
Rost, B (1998) Marrying structure and genomics Structure, 6, 259-63
[23]
Rychlewski, L, Zhang, B and Godzik, A (1998) Fold and function predictions for Mycoplasma genitalium proteins Folding and design, 3. 229-238.
[24]
Salamov. A A, Suwa, M, Orengo, CA, and Swindells, M B (1999) Genome analysis assigning protein coding regions to three-dimensional structures Protein Sci 8, 771-777
[25]
Sanchez, R and Sali, A (1998) Large-scale protein structure modeling of the Saccharomyces cerevisiae genome Proc Natl Acad Sci USA, 95,13597-13602
[26]
Shapiro. L. and Lima, C D (1998)The Argonne Structural Genomics Workshop: Lamaze class for the birth of a new science Structure. 6, 265-267
[27]
Tatusov, R L., Altschul, S F and Kooini, E V (1994) Detection of conserved segments in proteins, iterative scanning of sequence databases with alignment blocks Proc Natl Acad Sci USA, 91, 12091-12095.
[28]
Teichmann, S A, Park, I and Chothia, C (1998) Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplication and domain rearrangement Proc Natl Acad Sci USA, 95, 14658-14663.
[29]
Teichmann, S A, Chothia. C. and Gerstein, M (1999) Advances in Structural Genomics. Curr Op Struc Biol, 9, 390-399
[30]
Wolf, Y I, Brenner, S.E, Bash, PA and Koomn, EV (1999) Distribution of protein folds in the three superkingdoms of life Genome Res, 9, 17-26.
[31]
Wootton, JC and Federhen, S (1993) Statistics of local complexity in amino acid sequences and sequences databases Computers and Chemistry 17, 149-163.
[32]
Wilmanns, M, Hyde, C.C., Davies, D R, Kirschner, K and Jansonius, J N (1991) Structural conservation in parallel beta/alpha-barrel enzymes that catlyze three sequential reaction in the pathway of tryptophan biosynthesis Biochemistry, 30, 9161- 9169

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RECOMB '00: Proceedings of the fourth annual international conference on Computational molecular biology
April 2000
329 pages
ISBN:1581131860
DOI:10.1145/332306
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 April 2000

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. PDB_ISL
  2. PSI-BLAST
  3. homology
  4. structure assignment

Qualifiers

  • Article

Conference

RECOMB00
Sponsor:

Acceptance Rates

Overall Acceptance Rate 148 of 538 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 277
    Total Downloads
  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)22
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media