Genome Taxonomy Database

Last updated

Genome Taxonomy Database
Content
Data types
captured
Proposed prokaryotic nomenclature, phylogenomic data
Contact
Research center Australian Centre for Ecogenomics, University of Queensland
Authors
  • Phil Hugenholtz
  • Maria Chuvochina
  • Christian Rinke
Primary citation PMID   30148503
Release date2018
Access
Website gtdb.ecogenomic.org
Download URL gtdb.ecogenomic.org/downloads
Web service URL gtdb.ecogenomic.org/tree
Miscellaneous
License CC BY-SA 4.0
Version09-RS220 (24th April 2024)
Curation policymixed

The Genome Taxonomy Database (GTDB) is an online database that maintains information on a proposed nomenclature of prokaryotes, following a phylogenomic approach based on a set of conserved single-copy proteins. In addition to resolving paraphyletic groups, this method also reassigns taxonomic ranks algorithmically, updating names in both cases. [1] Information for archaea was added in 2020, [2] along with a species classification based on average nucleotide identity. [3] Each update incorporates new genomes as well as automated and manual curation of the taxonomy. [4]

Contents

An open-source tool called GTDB-Tk is available to classify draft genomes into the GTDB hierarchy. [5] The GTDB system, via GTDB-Tk, has been used to catalogue not-yet-named bacteria in the human gut microbiome and other metagenomic sources. [6] [7]

The GTDB is incorporated into the Bergey's Manual of Systematics of Archaea and Bacteria in 2019 as its phylogenomic resource. [8]

Methodology

The genomes used to construct the phylogeny are obtained from NCBI (RefSeq and Genbank), and GTDB releases are indexed to RefSeq releases, starting with release 76. Importantly and increasingly, this dataset includes draft genomes of uncultured microorganisms obtained from metagenomes and single cells, ensuring improved genomic representation of the microbial world. All genomes are independently quality controlled using CheckM before inclusion in GTDB. [9]

Genomes first undergo gene calling to extract genes. The taxonomy is based on trees inferred with FastTree from an aligned concatenated set of 120 single copy marker proteins for Bacteria under a WAG model, and with IQ-TREE from a concatenated set of 53 (since RS207; 122 before) marker proteins for Archaea under the PMSF model. Additional marker sets are also used to cross-validate tree topologies including concatenated ribosomal proteins and ribosomal RNA genes. [9] The relative evolutionary divergence (RED) metric, which determines the taxonomic ranks used, is derived from the two main trees by the PhyloRank program. [1]

Species are deliminated using average nucleotide identity and alignment fraction, both calculated by skani. For species existing in a previous release, GTDB compares the quality and position of two genomes and may decide to switch to a new species representative genome. [9]

Taxomony comes from the following sources:

GTDB personnel curates the taxonomy from the aforementioned sources by checking them against the results of PhyloRank and the tree.

For the each new taxon, the curators try to find a proposed name in literature for it. If there is no name proposed, the taxon is given a placeholder name by adding a suffix to the original name, e.g. Lactobacillus gasseri_A. After "Z" comes "AA". [1]

Contents of the database

Each release contains: [10]

The web interface displays a tree based on the taxonomy (not the entire Newick file), down to the genome assembly level. Each genome assembly has a page detailing its metadata and a history of how it's classified in each GTDB release. There is a search functionality.

Effects on the accepted taxonomy

GTDB "has now become an important resource for prokaryotic taxonomy". Both its species tree and elements of its methodology are used by taxonomists to improve the current, accepted taxonomy under the Prokaryotic Code. For example, a taxonomist may make references to the GTDB tree on top of their own phylogenetic tree to further support a taxonomic proposal. [11]

There has been even more ambitious proposals to import large parts of the database into the accepted taxonomy. A 2022 article in the IJSEM, written by third-party authors, proposes to assign names based on meaningless Latin syllable to over 65 thousand GTDB taxa, [12] though none of these names have made their way into the LPSN. A 2023 article by the GTDB team proposes to import 223 higher-order taxa into the Prokaryotic Code system and 49 under the SeqCode system. [13] Many of the names published under the under the Prokaryotic Code have already been validated. [14] (The SeqCode requires registration of the names for valid publication, which has also been done.)

See also

Related Research Articles

The Geobacteraceae are a family within the Thermodesulfobacteriota.

<span class="mw-page-title-main">Desulfovibrionales</span> Order of bacteria

Desulfovibrionales are a taxonomic order of bacteria belonging to the phylum Thermodesulfobacteriota, with four families. They are Gram-negative. The majority are sulfate-reducing, with the exception of Lawsonia and Bilophila. All members of this order are obligately anaerobic. Most species are mesophilic, but some are moderate thermophiles.

The Syntrophobacterales are an order of Thermodesulfobacteriota. All genera are strictly anaerobic. Many of the family Syntrophobacteraceae are sulfate-reducing. Some species are motile by using one polar flagellum.

The Syntrophobacteraceae are a family of Thermodesulfobacteriota.

<span class="mw-page-title-main">Desulfovibrionaceae</span> Family of bacteria

Desulfovibrionaceae is a family of bacteria belonging to the phylum Thermodesulfobacteriota.

Desulfohalobiaceae is a family of bacteria belonging to the phylum Thermodesulfobacteriota.

The Myxococcota are a phylum of bacteria known as the fruiting gliding bacteria. All species of this group are Gram-negative. They are predominantly aerobic genera that release myxospores in unfavorable environments.

The "Aigarchaeota" are a proposed archaeal phylum of which the main representative is Caldiarchaeum subterraneum. It is not yet clear if this represents a new phylum or a Nitrososphaerota order, since the genome of Caldiarchaeum subterraneum encodes several Nitrososphaerota-like features. The name "Aigarchaeota" comes from the Greek αυγή, avgí, meaning "dawn" or "aurora", for the intermediate features of hyperthermophilic and mesophilic life during the evolution of its lineage.

<span class="mw-page-title-main">Parvarchaeota</span> Phylum of archaea

Parvarchaeota is a phylum of archaea belonging to the DPANN archaea. They have been discovered in acid mine drainage waters and later in marine sediments. The cells of these organisms are extremely small consistent with small genomes. Metagenomic techniques allow obtaining genomic sequences from non-cultured organisms, which were applied to determine this phylum.

The candidate division SR1 and gracilibacteria code is used in two groups of uncultivated bacteria found in marine and fresh-water environments and in the intestines and oral cavities of mammals among others. The difference to the standard and the bacterial code is that UGA represents an additional glycine codon and does not code for termination. A survey of many genomes with the codon assignment software Codetta, analyzed through the GTDB taxonomy system shows that this genetic code is limited to the Patescibacteria order BD1-5, not what are now termed Gracilibacteria, and that the SR1 genome assembly GCA_000350285.1 for which the table 25 code was originally defined is actually using the Absconditibacterales genetic code and has the associated three special recoding tRNAs. Thus this code may now be better named the "BD1-5 code".

Syntrophus is a Gram negative bacterial genus from the family of Syntrophaceae.

<span class="mw-page-title-main">Candidate phyla radiation</span> A large evolutionary radiation of bacterial candidate phyla and superphyla

The candidate phyla radiation is a large evolutionary radiation of bacterial lineages whose members are mostly uncultivated and only known from metagenomics and single cell sequencing. They have been described as nanobacteria or ultra-small bacteria due to their reduced size (nanometric) compared to other bacteria.

<span class="mw-page-title-main">NC10 phylum</span> Phylum of bacteria

NC10 is a bacterial phylum with candidate status, meaning its members remain uncultured to date. The difficulty in producing lab cultures may be linked to low growth rates and other limiting growth factors.

There are several models of the branching order of bacterial phyla, one of these is the Genome Taxonomy Database (GTDB).

Bdellovibrionota is a phylum of bacteria.

The Enterosoma genetic code translates AGG to methionine, as determined by the codon assignment software Codetta; it was further shown that this recoding is associated with a special tRNA with the appropriate anticodon and tRNA identity elements. The code is found in a small clade of species within the Enterosoma genus, according to the GTDB taxonomy system release 220. Codetta called the Enterosoma code for the following genome assemblies: GCA_002431755.1, GCA_002439645.1, GCA_002436825.1, GCA_002451385.1, GCA_002297105.1, GCA_002297045.1, GCA_002404995.1, and GCA_900549915.1.

The Anaerococcus and Onthovivens genetic code translates CGG to tryptophan, as determined by the codon assignment software Codetta; it was further shown that this recoding is associated with a special tRNA with the appropriate anticodon and tRNA identity elements appropriate for such decoding. As currently known, this code is limited to two distinct clades, the genus Anaerococcus in the class Clostridia and the genus Onthovivens in the class Bacilli, as defined by the GTDB taxonomy system release 220. Codetta originally called the Anaerococcus and Onthovivens code for the following genome assemblies: GCA_000024105.1, GCA_900445285.1, GCA_902500265.1, GCA_900258475.1, GCA_002399785.1, GCA_004558005.1, GCA_900540365.1, GCA_900540395.1, GCA_900545015.1.

The Absconditabacterales genetic code translates UGA to glycine, and CGG and GCA to tryptophan, as determined by the codon assignment software Codetta; it was further shown that these recodings are associated with three special tRNAs with appropriate anticodons and tRNA identity elements. Codetta called the Absconditibacterales code for the following genome assemblies: GCA_002792495.1, GCA_001007975.1, GCA_003488625.1, GCA_003260355.1, GCA_003242865.1, GCA_000350285.1, GCA_002746475.1, GCA_007116275.1, GCA_007115995.1, GCA_002361595.1, GCA_000503875.1, GCA_003543185.1, GCA_002441085.1, and GCA_002791215.1. Review of the GTDB taxonomy system for the order Absconditabacterales left two questionable genome assemblies ; spot-checking these two genomes shows that they both have all three special tRNAs, suggesting that the code is universal across the order.

References

  1. 1 2 3 4 5 6 7 8 Parks, DH; Chuvochina, M; Waite, DW; Rinke, C; Skarshewski, A; Chaumeil, PA; Hugenholtz, P (November 2018). "A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life" (PDF). Nature Biotechnology. 36 (10): 996–1004. bioRxiv   10.1101/256800 . doi:10.1038/nbt.4229. PMID   30148503. S2CID   52093100.
  2. Rinke, Christian; Chuvochina, Maria; Mussig, Aaron J.; Chaumeil, Pierre-Alain; Davín, Adrián A.; Waite, David W.; Whitman, William B.; Parks, Donovan H.; Hugenholtz, Philip (21 June 2021). "A standardized archaeal taxonomy for the Genome Taxonomy Database" (PDF). Nature Microbiology. 6 (7): 946–959. doi:10.1038/s41564-021-00918-8. ISSN   2058-5276. PMID   34155373. S2CID   235595884.
  3. 1 2 Parks, DH; Chuvochina, M; Chaumeil, PA; Rinke, C; Mussig, AJ; Hugenholtz, P (September 2020). "A complete domain-to-species taxonomy for Bacteria and Archaea". Nature Biotechnology. 38 (9): 1079–1086. bioRxiv   10.1101/771964 . doi:10.1038/s41587-020-0501-8. PMID   32341564. S2CID   216560589.
  4. For information on each update, see relevant change logs. For notable, paper-worthy changes, see "Cite GTDB" section on the About page.
  5. Chaumeil, PA; Mussig, AJ; Hugenholtz, P; Parks, DH (15 November 2019). "GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database". Bioinformatics. 36 (6): 1925–1927. doi: 10.1093/bioinformatics/btz848 . PMC   7703759 . PMID   31730192.
  6. Almeida, Alexandre; Nayfach, Stephen; Boland, Miguel; Strozzi, Francesco; Beracochea, Martin; Shi, Zhou Jason; Pollard, Katherine S.; Sakharova, Ekaterina; Parks, Donovan H.; Hugenholtz, Philip; Segata, Nicola; Kyrpides, Nikos C.; Finn, Robert D. (20 July 2020). "A unified catalog of 204,938 reference genomes from the human gut microbiome". Nature Biotechnology. 39 (1): 105–114. doi: 10.1038/s41587-020-0603-3 . PMC   7801254 . PMID   32690973.
  7. Nayfach, Stephen; et al. (9 November 2020). "A genomic catalog of Earth's microbiomes". Nature Biotechnology. 39 (4): 499–509. doi: 10.1038/s41587-020-0718-6 . PMC   8041624 . PMID   33169036.
  8. "Incorporation of Phylogenomics into BMSAB". Bergey's Manual Trust.
  9. 1 2 3 "METHODS.txt (GTDB release 220)". data.gtdb.ecogenomic.org. 2024.
  10. "220.0/FILE_DESCRIPTIONS.txt".
  11. Gupta, Radhey S.; Patel, Sudip; Saini, Navneet; Chen, Shu (1 November 2020). "Robust demarcation of 17 distinct Bacillus species clades, proposed as novel Bacillaceae genera, by phylogenomics and comparative genomic analyses: description of Robertmurraya kyonggiensis sp. nov. and proposal for an emended genus Bacillus limiting it only to the members of the Subtilis and Cereus clades of species". International Journal of Systematic and Evolutionary Microbiology. 70 (11): 5753–5798. doi:10.1099/ijsem.0.004475.
  12. Pallen, MJ; Rodriguez-R, LM; Alikhan, NF (September 2022). "Naming the unnamed: over 65,000 Candidatus names for unnamed Archaea and Bacteria in the Genome Taxonomy Database" (PDF). International Journal of Systematic and Evolutionary Microbiology. 72 (9). doi: 10.1099/ijsem.0.005482 . PMID   36125864.
  13. Chuvochina, M; Mussig, AJ; Chaumeil, PA; Skarshewski, A; Rinke, C; Parks, DH; Hugenholtz, P (17 January 2023). "Proposal of names for 329 higher rank taxa defined in the Genome Taxonomy Database under two prokaryotic codes". FEMS microbiology letters. 370. doi:10.1093/femsle/fnad071. PMID   37480240.
  14. Oren, Aharon; Göker, Markus (1 February 2024). "Validation List no. 215. Valid publication of new names and new combinations effectively published outside the IJSEM". International Journal of Systematic and Evolutionary Microbiology. 74 (1). doi:10.1099/ijsem.0.006173.

Further reading