Content | |
---|---|
Description | Pathogen-Host Interactions database |
Data types captured | phenotypes of microbial mutants |
Organisms | ~290 fungal, bacterial and protist pathogens of agronomic and medical importance tested on ~240 hosts |
Contact | |
Research center | Rothamsted Research |
Primary citation | PMID 34788826 |
Release date | May 2005 |
Access | |
Data format | XML, FASTA |
Website | phibase |
Tools | |
Web | PHI-base Search PHIB-BLAST PHI-Canto (Author curation) |
Miscellaneous | |
License | Creative Commons Attribution-NoDerivatives 4.0 International License |
Versioning | Yes |
Data release frequency | 6 monthly |
Version | 4.16 (Nov 2023) |
Curation policy | Manual Curation |
The Pathogen-Host Interactions database (PHI-base) [1] is a biological database that contains manually curated information on genes experimentally proven to affect the outcome of pathogen-host interactions. The database has been maintained by researchers at Rothamsted Research and external collaborators since 2005. [2] [3] [4] [5] PHI-base has been part of the UK node of ELIXIR, the European life-science infrastructure for biological information, since 2016. [1]
The Pathogen-Host Interactions database was developed to utilise the growing number of verified genes that mediate an organism's ability to cause disease and/or trigger host responses. [6]
The web-accessible database catalogues experimentally verified pathogenicity, virulence, and effector genes from bacterial, fungal, and oomycete pathogens which infect animal, plant, and fungal hosts. PHI-base was the first online resource devoted to the identification and presentation of information on fungal and oomycete pathogenicity genes and their host interactions. PHI-base is a resource for the discovery of candidate targets in medically and agronomically important fungal and oomycete pathogens for intervention with synthetic chemistries and natural products (fungicides). [7] [8]
Each entry in PHI-base is curated by domain experts and supported by strong experimental evidence (gene disruption experiments) as well as literature references in which the experiments are described. Each gene in PHI-base is presented with its nucleotide and deduced amino acid sequence as well as a detailed structured description of the predicted protein's function during the host infection process. To facilitate data interoperability, genes are annotated using controlled vocabularies (Gene Ontology terms, EC Numbers, etc.), and links to other external data sources such as UniProt, EMBL, and the NCBI taxonomy services.
Version 4.17 (May 2024) of PHI-base [1] provides information on 9973 genes from 296 pathogens and 249 hosts and their impact on 22408 interactions as well on efficacy information on ~20 drugs and the target sequences in the pathogen. PHI-base currently focuses on plant pathogenic and human pathogenic organisms including fungi, oomycetes, and bacteria. The entire contents of the database can be downloaded in a tab delimited format. Since the launch of version 4, the PHI-base is also searchable using the PHIB-BLAST search tool, which uses the BLAST algorithm to compare a user's sequence against the sequences available from PHI-base. [9]
In 2016 the plant portion of PHI-base was used to establish a Semantic PHI-base search tool. [10]
PHI-base has been aligned with Ensembl Genomes since 2011, FungiDB since 2016, and Global Biotic Interactions (GloBI) since 2018. [11] All new PHI-base releases are integrated by these independent databases.
PHI-base is a resource for many applications including:
› The discovery of conserved genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention
› Comparative genome analyses
› Annotation of newly sequenced pathogen genomes
› Functional interpretation of RNA sequencing and microarray experiments
› The rapid cross-checking of phenotypic differences between pathogenic species when writing articles for peer review
PHI-base use has been cited in over 500 peer-reviewed articles. [1]
Since 2015, the website has linked to an online literature curation tool called PHI-Canto, enabling community-driven literature curation for various pathogenic species. [12] PHI-Canto employs a community curation framework that not only offers a curation tool but also includes a phenotype ontology and controlled vocabularies using unified languages and rules used in biology experiments. The central concept of this framework is the introduction of a 'Metagenotype', which allows the annotation and assignment of phenotypes to specific pathogen mutant-host interactions. PHI-Canto extends the single species curation tool developed for PomBase [13] (https://www.pombase.org), the model organism database for fission yeast.
PHI-base is a National Capability funded by the Biotechnology and Biological Sciences Research Council (BBSRC), a UK research council. [6]
Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
In academia, computational immunology is a field of science that encompasses high-throughput genomic and bioinformatics approaches to immunology. The field's main aim is to convert immunological data into computational problems, solve these problems using mathematical and computational approaches and then convert these results into immunologically meaningful interpretations.
Pfam is a database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models. Last version of Pfam, 36.0, was released in September 2023 and contains 20,795 families. It is currently provided through InterPro database.
The Biological General Repository for Interaction Datasets (BioGRID) is a curated biological database of protein-protein interactions, genetic interactions, chemical interactions, and post-translational modifications created in 2003 (originally referred to as simply the General Repository for Interaction Datasets by Mike Tyers, Bobby-Joe Breitkreutz, and Chris Stark at the Lunenfeld-Tanenbaum Research Institute at Mount Sinai Hospital. It strives to provide a comprehensive curated resource for all major model organism species while attempting to remove redundancy to create a single mapping of data. Users of The BioGRID can search for their protein, chemical or publication of interest and retrieve annotation, as well as curated data as reported, by the primary literature and compiled by in house large-scale curation efforts. The BioGRID is hosted in Toronto, Ontario, Canada and Dallas, Texas, United States and is partnered with the Saccharomyces Genome Database, FlyBase, WormBase, PomBase, and the Alliance of Genome Resources. The BioGRID is funded by the NIH and CIHR. BioGRID is an observer member of the International Molecular Exchange Consortium.
The Saccharomyces Genome Database (SGD) is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast. Further information is located at the Yeastract curated repository.
Melampsora lini is a species of fungus and plant pathogen found in Ireland and commonly known as flax rust.
The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences and their protein products. RefSeq was introduced in 2000. This database is built by National Center for Biotechnology Information (NCBI), and, unlike GenBank, provides only a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotes.
In molecular biology, STRING is a biological database and web resource of known and predicted protein–protein interactions.
Plant disease resistance protects plants from pathogens in two ways: by pre-formed structures and chemicals, and by infection-induced responses of the immune system. Relative to a susceptible plant, disease resistance is the reduction of pathogen growth on or in the plant, while the term disease tolerance describes plants that exhibit little disease damage despite substantial pathogen levels. Disease outcome is determined by the three-way interaction of the pathogen, the plant and the environmental conditions.
Bacterial small RNAs are small RNAs produced by bacteria; they are 50- to 500-nucleotide non-coding RNA molecules, highly structured and containing several stem-loops. Numerous sRNAs have been identified using both computational analysis and laboratory-based techniques such as Northern blotting, microarrays and RNA-Seq in a number of bacterial species including Escherichia coli, the model pathogen Salmonella, the nitrogen-fixing alphaproteobacterium Sinorhizobium meliloti, marine cyanobacteria, Francisella tularensis, Streptococcus pyogenes, the pathogen Staphylococcus aureus, and the plant pathogen Xanthomonas oryzae pathovar oryzae. Bacterial sRNAs affect how genes are expressed within bacterial cells via interaction with mRNA or protein, and thus can affect a variety of bacterial functions like metabolism, virulence, environmental stress response, and structure.
This microRNA database and microRNA targets databases is a compilation of databases and web portals and servers used for microRNAs and their targets. MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (ncRNAs) that regulate gene expression by targeting messenger RNAs.
The Vaccine Investigation and OnLine Information Network (VIOLIN) is the largest web-based vaccine database and analysis system. VIOLIN currently contains over 3,000 vaccines or vaccine candidates for over 190 pathogens. The vaccine information in the database is collected by manual curation from over 1,600 peer-reviewed papers. Different from most existing vaccine databases, VIOLIN focuses on vaccine research data. Different types of information is curated, including vaccine name, license status, antigens used, vaccine adjuvants, vaccine vectors, vaccination procedure, host immune response, challenge procedure, vaccine efficacy, adverse events, etc. All vaccine information in the VIOLIN vaccine database is supported by quoted references. The data generated by a curator is published only after a careful review and approval by a vaccine domain expert.
PhytoPath was a joint scientific project between the European Bioinformatics Institute and Rothamsted Research, running from January 2012 to May 30, 2017. The project aimed to enable the exploitation of the growing body of “-omics” data being generated for phytopathogens, their plant hosts and related model species. Gene mutant phenotypic information is directly displayed in genome browsers.
The Eukaryotic Pathgen Database, or EuPathDB, is a database of bioinformatic and experimental data related to a variety of eukaryotic pathogens. It was established in 2006 under a National Institutes of Health program to create Bioinformatics Resource Centers to facilitate research on pathogens that may pose biodefense threats. EuPathDB stores data related to its organisms of interest and provides tools for searching through and analyzing the data. It currently consists of 14 component databases, each dedicated to a certain research topic. EuPathDB includes:
PomBase is a model organism database that provides online access to the fission yeast Schizosaccharomyces pombe genome sequence and annotated features, together with a wide range of manually curated functional gene-specific data. The PomBase website was redeveloped in 2016 to provide users with a more fully integrated, better-performing service.
Model organism databases (MODs) are biological databases, or knowledgebases, dedicated to the provision of in-depth biological data for intensively studied model organisms. MODs allow researchers to easily find background information on large sets of genes, plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. They allow users to analyse results and interpret datasets, and the data they generate are increasingly used to describe less well studied species. Where possible, MODs share common approaches to collect and represent biological information. For example, all MODs use the Gene Ontology (GO) to describe functions, processes and cellular locations of specific gene products. Projects also exist to enable software sharing for curation, visualization and querying between different MODs. Organismal diversity and varying user requirements however mean that MODs are often required to customize capture, display, and provision of data.
Canto is a web-based tool to support the curation of gene-specific scientific data, by both professional biocurators and publication authors. Canto was developed as part of the PomBase project, and is funded by the Wellcome Trust.
Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.