Chance and statistical significance in protein and DNA sequence analysis

Science. 1992 Jul 3;257(5066):39-49. doi: 10.1126/science.1621093.

Abstract

Statistical approaches help in the determination of significant configurations in protein and nucleic acid sequence data. Three recent statistical methods are discussed: (i) score-based sequence analysis that provides a means for characterizing anomalies in local sequence text and for evaluating sequence comparisons; (ii) quantile distributions of amino acid usage that reveal general compositional biases in proteins and evolutionary relations; and (iii) r-scan statistics that can be applied to the analysis of spacings of sequence markers.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence*
  • Animals
  • Bacillus subtilis / genetics
  • Base Sequence*
  • DNA / chemistry
  • DNA / genetics*
  • Drosophila / genetics
  • Escherichia coli / genetics
  • Humans
  • Mathematics
  • Models, Genetic*
  • Models, Statistical*
  • Proteins / chemistry
  • Proteins / genetics*
  • Saccharomyces cerevisiae / genetics
  • Viruses / genetics

Substances

  • Proteins
  • DNA