733422
BBI0010.1177/1177932217733422Bioinformatics and Biology InsightsCarels et al
research-article2017
A Metagenomic Analysis of Bacterial Microbiota in
the Digestive Tract of Triatomines
Nicolas Carels1, Marcial Gumiel2, Fabio Faria da Mota3,
Carlos José de Carvalho Moreira4 and Patricia Azambuja2,5
Bioinformatics and Biology Insights
Volume 11: 1–19
© The Author(s) 2017
Reprints and permissions:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/1177932217733422
https://doi.org/10.1177/1177932217733422
1Laboratório de Modelagem de Sistemas Biológicos, National Institute for Science and
Technology on Innovation in Neglected Diseases (INCT-IDN), Centro de Desenvolvimento
Tecnológico em Saúde (CDTS), Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Brazil.
2Laboratório de Bioquímica e Fisiologia de Insetos, Instituto Oswaldo Cruz, Fundação Oswaldo
Cruz (IOC/FIOCRUZ), Rio de Janeiro, Brazil. 3Laboratório de Biologia Computacional e Sistemas,
Instituto Oswaldo Cruz, Fundação Oswaldo Cruz (IOC/FIOCRUZ), Rio de Janeiro, Brazil.
4Laboratório de Doenças Parasitárias, Fundação Oswaldo Cruz (IOC, FIOCRUZ), Rio de Janeiro,
Brazil. 5Departamento de Entomologia Molecular, Instituto Nacional de Entomologia Molecular
(INCT-EM), Rio de Janeiro, Brazil.
ABSTRACT: The digestive tract of triatomines (DTT) is an ecological niche favored by microbiota whose enzymatic profile is adapted to the
specific substrate availability in this medium. This report describes the molecular enzymatic properties that promote bacterial prominence in
the DTT. The microbiota composition was assessed previously based on 16S ribosomal DNA, and whole sequenced genomes of bacteria from
the same genera were used to calculate the GC level of rare and prominent bacterial species in the DTT. The enzymatic reactions encoded by
coding sequences of both rare and common bacterial species were then compared and revealed key functions explaining why some genera
outcompete others in the DTT. Representativeness of DTT microbiota was investigated by shotgun sequencing of DNA extracted from bacteria
grown in liquid Luria-Bertani broth (LB) medium. Results showed that GC-rich bacteria outcompete GC-poor bacteria and are the dominant
components of the DTT microbiota. In addition, oxidoreductases are the main enzymatic components of these bacteria. In particular, nitrate
reductases (anaerobic respiration), oxygenases (catabolism of complex substrates), acetate-CoA ligase (tricarboxylic acid cycle and energy
metabolism), and kinase (signaling pathway) were the major enzymatic determinants present together with a large group of minor enzymes
including hydrogenases involved in energy and amino acid metabolism. In conclusion, despite their slower growth in liquid LB medium, bacteria
from GC-rich genera outcompete the GC-poor bacteria because their specific enzymatic abilities impart a selective advantage in the DTT.
KEYWORDS: GC content, genome size, gene number, EC number, ecological niche, midgut
RECEIVED: December 23, 2016. ACCEPTED: April 10, 2017.
PEER REVIEW: Five peer reviewers contributed to the peer review report. Reviewers’
reports totaled 878 words, excluding any conidential comments to the academic editor.
INCT-IDN. The funders had no role in study design, data collection and analysis, decision
to publish, or preparation of the manuscript.
DECLARATION OF CONFLICTING INTERESTS: The author(s) declared no potential
conlicts of interest with respect to the research, authorship, and/or publication of this article.
TYPE: Original Research
FUNDING: The author(s) disclosed receipt of the following inancial support for the
research, authorship, and/or publication of this article: N.C. received a grant (PAPES-V)
from Fiocruz/CNPq (http://portal.iocruz.br/pt-br; http://cnpq.br/) and M.G. a PhD
scholarship from CAPES (http://www.capes.gov.br/). The publication fees were paid by
CORRESPONDING AUTHOR: Nicolas Carels, Laboratório de Modelagem de Sistemas
Biológicos, National Institute for Science and Technology on Innovation in Neglected
Diseases (INCT-IDN), Centro de Desenvolvimento Tecnológico em Saúde (CDTS),
Fundação Oswaldo Cruz (FIOCRUZ), Av. Brasil 4036, Manguinhos, 21040-361 - RJ, Rio
de Janeiro, Brazil. Email:
[email protected]
Introduction
Chagas disease remains a serious health concern in South
American countries, with approximately 8 million people in
the chronic phase of this parasitosis. Trypanosoma cruzi, the
causative agent, is mainly transmitted to humans by insects
from the Triatominae subfamily distributed throughout the
American continent.1
The host-parasite relationship between T cruzi and vertebrate
hosts has been extensively studied and studies continue to develop
new drugs and vaccines.2,3 In contrast, there are still few studies on
the host-parasite relationship involving T cruzi and its interaction
with the microbiota in the triatomine vector gut. Pioneer work by
Azambuja et al4 showed that Serratia marcescens, belonging to the
family Enterobacteriaceae, is a major component of the bacterial
microbiota in the digestive tract of triatomines (DTT) that may
kill T cruzi through mannose-sensitive fimbriae5,6 and could thus
affect the epidemiology of Chagas disease. An investigation of the
bacterial composition in the DTT was only recently undertaken at
a molecular level through 16S ribosomal DNA (rDNA) characterization by da Mota et al7 and Gumiel et al.8 These 2 investigations found that the bacterial microbiota diversity is low (less than
10 major species) and varies in composition depending on the species of host triatomine. Apart from the intracellular endosymbiont
genera, Arsenophonus, Wolbachia, and Candidatus Rohrkolberia, the
major bacterial species found in the DTT were from Serratia genera and from the suborder Corynebacterineae (Mycobacterium,
Rhodococcus, Gordonia, Corynebacterium, and Dietzia). Another
microbiota with a low level of diversity has been described in
female mosquitoes (Anopheles gambiae and Aedes aegypti), which
are also hematophagous insects.9
This low number of major bacterial species found in the
DTT provides an opportunity to investigate their molecular
determinants. Aside from the major bacterial species mentioned above, da Mota et al7 and Gumiel et al8 also found several bacterial species that previously have not been reported to
Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial
4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without
further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
Bioinformatics and Biology Insights
2
reproduce significantly in DTT. These species belong to the
following genera: Acinetobacter, Actinomyces, Adhaeribacter,
Bradyrhizobium, Chryseobacterium, Comamonas, Diaphorobacter,
Enterococcus, Erwinia, Geobacillus, Haemophilus, Hydrogenophilus,
Janthinobacterium, Marinomonas, Microvirga, Pectobacterium,
Propionibacterium, Providencia, Pseudomonas, Shinella,
Sphingomonas, Staphylococcus, Stenotrophomonas, Streptococcus,
Streptophyta, Williamsia, and Xanthobacter. If some bacterial
species reproduce optimally in the DTT, there must be a biochemical basis and it needs elucidation due to the possibility of
controlling Chagas disease through paratransgenesis to reduce
vector competence with genetically modified symbionts.10,11
The aim of this investigation was to identify enzymatic determinants resulting in the success of bacterial genera in the DTT,
such as Serratia and the members of Corynebacterineae, in contrast to minor genera, including Staphylococcus, Streptococcus,
Haemophilus, or Enterococcus. Thus, the biochemical differences
were analyzed between the GC-rich (rich in guanine + cytosine)
and GC-poor bacterial species found in the DTT, emphasizing
specific enzymatic functions that could explain the success of the
former compared with the latter bacteria.12–16
Materials and Methods
The successive methodological steps followed to analyze the
genome properties and enzymatic determinants in the particular niche of DTT were as follows:
1. To set up a metagenomic model of wild microbiota in DTT
from species of bacteria having their genome completely
sequenced and being as close as possible (from the same
genera) to those identified by 16S rDNA sequencing;
2. To characterize the gross genomic features (genome size,
gene number, GC level, enzymatic annotation) from
bacteria of the metagenomic model;
3. To set up a working hypothesis based on the metagenomic
model to justify the enzymatic determinants that make
certain bacteria outcompete others in the DTT niche;
4. To validate the inference drawn from the metagenomic
model through a bench experiment using a quantitative
marker. The bench experiment was a shotgun sequencing
characterization of the bacterial population from DTTs
after Luria-Bertani broth (LB) culture, whereas the
quantitative marker was the ratio of the enzyme annotations that are overrepresented in GC-rich compared
with GC-poor bacterial species relative to the DNAencoded protein samples from these bacteria.
Nacional de Experimentação Animal/Ministério de Ciência e
Tecnologia. Triatomines were captured under the license L14323-7
given by the Sistema de Autorização e Informação em Biodiversidade
(SISBIO) of the Instituto Chico Mendes de Conservação da
Biodiversidade/Ministério do Meio Ambiente (MMA).
Triatomine colonies, gut dissection, and bacterial
cultures
Triatoma infestans, Triatoma vitticeps, Panstrongylus megistus
and Rhodnius neglectus are of epidemiologic importance.17
Dipetalogaster maximus has no epidemiologic importance, but
presents good susceptibility to T cruzi, is used in xenodiagnosis,
and is limited to Southern California and Mexico where it lives
in a rocky habitat in association with lizards.18 The male and
female triatomines used were in the fifth instar and maintained
on chicken blood over approximately 20 generations in the
Laboratório de Doenças Parasitárias (Instituto Oswaldo Cruz—
IOC, Fundação Oswaldo Cruz—FIOCRUZ. Triatomines were
dissected 7 to 10 days after feeding by opening the dorsal side
from the posterior end of the abdomen to the last thoracic segment. Meticulous dissection of the midgut (stomach and intestine) and hindgut (rectum) was performed using a sterile
ultrafine insulin syringe needle. Feces were obtained by abdominal compression or spontaneous ejections immediately after
feeding. Guts and feces were collected together in sterile
Eppendorf tubes and maintained at −20°C until use. All steps
were performed under aseptic conditions.
Three guts and their feces of each T infestans, T vitticeps, D
maximus, P megistus, and R neglectus were then incubated in 2-mL
Eppendorf tubes filled with liquid LB (Sigma-Aldrich Brasil
Ltda., Sao Paulo, Brazil) at 30°C without agitation for circa
48 hours until turbidity, due to bacterial growth, became evident.
DNA extraction from bacterial cultures
After incubation in LB medium for 48 hours at 30°C without
agitation in 2-mL Eppendorf tubes, the triatomine digestive
tracts were removed, and the DNA from the remaining bacterial
suspensions was extracted with the Fast DNA Spin Kit for Soil
(BIO 101 Systems; Qbiogene, Carlsbad, CA, USA) according to
the manufacturer’s protocol. DNA concentrations were determined using a NanoDrop spectrophotometer (Thermo Fisher
Scientific Inc., Waltham, MA, USA). About 1 µg DNA for each
sample of the 5 triatomine species was then amplified with a
Nextera DNA Library Preparation Kit (Illumina, San Diego,
CA, USA) and sequenced through 454 Titanium technology.
Ethics statement
The animals used for blood feeding the triatomines at FIOCRUZ
were treated according to the Ethical Principles in Animal
Experimentation approved by the Ethics Committee in Animal
Experimentation (CEUA/FIOCRUZ) under the license numbers LW-24/2013 and following the protocol from Conselho
Microbial composition of triatomine digestive tract,
sequence databases, and GC content
The predominant bacterial genera identified by denaturing
gradient gel electrophoresis (DGGE) in the digestive tract of T
infestans, T vitticeps, D maximus, P megistus, and R neglectus
Carels et al
were Serratia, Erwinia, Candidatus Rohrkolberia, Providencia,
Pectobacterium, and Arsenophonus.7 Of these 5 triatomine species, the genus Triatoma had a more diverse microbiota.
A previous more detailed study of the microbiota composition from digestive tracts of Triatoma brasiliensis collected in the
field8 showed that the most abundant bacterial genera found by
454 sequencing of 16S rDNA were Gordonia sp. (36%), Serratia
sp. (18%), Mycobacterium sp. (18%), Corynebacterium (6%), and
Rhodococcus sp. (6%). Serratia was the most widely distributed
genus among the triatomine species investigated here (Table 1).
Complete genomes were sequenced for at least one species
in most of the bacterial genera identified in this work (Table 1)
and can be downloaded from ftp://ftp.ncbi.nih.gov/genomes/
genbank/bacteria/. The coding sequences (CDS) were retrieved
from the sequences of these genomes, available in *.fna files (see
Table 1), by homologous comparison (tBLASTn) with their
protein sequences, available in *.faa files (“*” stands for the name
of the bacterial species under consideration). The average GC
content was then calculated for (1) whole CDS or for (2) the first
(GC1), second (GC2), and third (GC3) codon positions of each
genome set, using a Perl script. When a complete genome
sequence was not available for a given bacterial genus, as in the
case of Arsenophonus sp. and Dietzia sp., a CDS sample was
retrieved from GenBank (release 208—June 15, 2015) using the
Infobiogen server (see http://www.infobiogen.fr) and the
ACNUC/QUERY retrieval system19 with the options t = cds.
The CDS samples used here for the GCx (x = 1, 2, 3, or the average of them) calculations are from bacterial species that, in most
cases, were not the same as those diagnosed by da Mota et al7
and Gumiel et al,8 although the GC content obtained from
these species was considered representative of the genus to which
they belonged. Reference was made to Takahashi et al13 who
stated that the construction of phylogenetic trees based on oligonucleotide frequency of bacterial species with similar GC contents led to topologies that were congruent at genus and family
levels with those constructed from homologous genes. The bacterial genomes in GC-poor and GC-rich species were divided
according to whether their GC3 was lower or higher than 50%.
Enzymatic profiling of bacterial genomes used as
references
For this study, the classification Enzyme Commission (EC)
numbers from the Nomenclature Committee of the
International Union of Biochemistry and Molecular Biology
(NC-IUBMB) was used. If different enzymes catalyze the
same reaction, then they receive the same EC number.
Approximately 5500 enzyme reactions have already been classified according to a 4-digit hierarchy that is used to progressively refine classification descriptions. Briefly, the first digit
reports on the type of reaction considered, which is divided
into 6 main categories: Oxidoreductases, transferases, hydrolases,
lyases, isomerases, and ligases. The second digit of the EC number describes the type of chemical object the reaction is acting
3
on, whereas the third digit often describes the type of donor or
acceptor group. Finally, the fourth digit associates the enzyme
with its reaction name (see a complete description at http://
www.chem.qmul.ac.uk/iubmb/enzyme/).
Protein sequences of files (1) NC_018221.faa (Enterococcus
faecalis), NC_010519.faa (Haemophilus somnus), NC_007350.faa
(Staphylococcus saprophyticus), NC_015291.faa (Streptococcus oralis) and (2) NC_010612.faa (Mycobacterium marinum),
NC_012522.faa (Rhodococcus opacus), NC_016906.faa (Gordonia
polyisoprenivorans), NC_020064.faa (Serratia marcescens), and
NC_021663.faa (Corynebacterium terpenotabidum) were considered representative of GC-poor and GC-rich bacterial species
found in the DTT, respectively.8 By taking only the best hits (E
value < 0.0001) into account, the protein sequences (BLASTp)
of each of the files outlined above were compared with the
enzyme sequences from the database of the Kyoto Encyclopedia
of Genes and Genomes (KEGG version from June 2015; http://
www.genome.jp/kegg/) where the EC numbers are available. A
homologous hit was considered significant when its identity rate
was at least 60% for at least 33 amino acids. For each file of the
homology comparison, the EC numbers (http://www.enzymedatabase.org/class.php) were grouped according to their first (6
classes), second (67 subclasses), third (264 subclasses), and fourth
digits (the whole EC number set of 5549 approved enzymes as
available from BRENDA—online release as of June, 30, 2015;
http://www.brenda-enzymes.org/all_enzymes.php; see the
“Results” section). Finally, the relative frequency of EC numbers
per functional category were compared between GC-poor and
GC-rich genomes of bacterial species considered representative
of the bacterial genera diagnosed in DTT.
Shotgun sequencing and analysis of DNA from
bacterial cultures of triatomine guts
The shotgun sequencing of bacteria from DTT incubated in
LB medium was performed to determine the validity of the
model proposed above. At the time of the experiment, a representative amplification by polymerase chain reaction would
have needed an amount of DNA that was not compatible with
that obtained from direct extraction of triatomine feces. Thus,
a culture step was introduced prior to shotgun library construction knowing that it would introduce a bias due to the different
growth conditions in LB and DTT.
The 723 543 reads obtained by 454 sequencing were mounted
into 16 435 contigs using Velvet20 according to http://ged.msu.
edu/angus/tutorials-2011/short-read-assembly-velvet.html
(k = 31, ie, 31mers were looking for overlaps between reads) and
further assembled with CAP321 to finally obtain 14 269 nonredundant sequences (supplementary file S1). We then extracted
the 738 297 open reading frames (ORF) from both positive and
negative strands of these 14 269 sequences and filtered them out
for CDS ORFs (cORFs) larger than 99 bp (base pairs) using the
universal feature method (UFM),22,23 ending up with 35 105
cORFs compatible with the purine bias found in the CDSs
Bioinformatics and Biology Insights
4
Table 1. Sequence materials for GC and enzymatic characterization.
GENERA
FREq.
WhOLE GENOME SEqUENCES
GC
Acinetobactera
2
NA
Actinomycesa
1
NA
Adhaeribactera
1
NA
Arsenophonusb
Endo
GenBank
Poor
Bradyrhizobiuma
1
Bradyrhizobium_japonicum_USDA_6_uid158851/NC_017249.fna
Rich
Chryseobacteriuma
1
NA
Comamonasa
1
Comamonas_testosteroni_CNB_2_uid62961/NC_013446.fna
Rich
Corynebacteriuma
2015
Corynebacterium_terpenotabidum_Y_11_uid210639/NC_021663.fna
Rich
Diaphorobactera
2
NA
Dietziaa
5008
GenBank
Rich
Enterococcusa
6
Enterococcus_faecalis_D32_uid171261/NC_018221.fna
Poor
Erwiniab
Low
Erwinia_amylovora_ATCC_49946_uid46943/NC_013971.fna
Rich
Geobacillusa
2
NA
Gordoniaa
11825
Gordonia_polyisoprenivorans_Vh2_uid86651/NC_016906.fna
Rich
Haemophilusa
1
haemophilus_somnus_2336_uid57979/NC_010519.fna
Poor
Hydrogenophilusa
15
NA
Janthinobacteriuma
1
NA
Marinomonasa
1
Marinomonas_posidonica_IVIA_Po_181_uid67323/NC_015559.fna
Microvirgaa
2
NA
Mycobacteriuma
5737
Mycobacterium_marinum_M_uid59423/NC_010612.fna
Rich
Pectobacteriumb
Low
Pectobacterium_carotovorum_PC1_uid59295/NC_012917.fna
Rich
Propionibacteriuma
5
Propionibacterium_propionicum_F0230a_uid170533/NC_018142.fna
Rich
Pseudomonasa
4
Pseudomonas_aeruginosa_RP73_uid209328/NC_021577.fna
Rich
Rhodococcusa
1855
Rhodococcus_opacus_B4_uid13791/NC_012522.fna
Rich
Serratiaa
4917
Serratia_marcescens_FGI94_uid185180/NC_020064.fna
Rich
Shinellaa
2
NA
Sphingomonasa
1
Sphingomonas_wittichii_RW1_uid58691/NC_009511.fna
Rich
Staphylococcusa
3
Staphylococcus_saprophyticus_ATCC_15305_uid58411/NC_007350.fna
Poor
Stenotrophomonasa
1
Stenotrophomonas_maltophilia_D457_uid162199/NC_017671.fna
Rich
Streptococcusa
1
Streptococcus_oralis_Uo5_uid65449/NC_015291.fna
Poor
Streptophytaa
2
NA
Williamsiaa
15
NA
Wolbachiab
Endo
Wolbachia_endosymbiont_of_Culex_quinquefasciatus_Pel_uid61645/
NC_010981.fna
Poor
Xanthobactera
1
Xanthobacter_autotrophicus_Py2_uid58453/NC_009720.fna
Rich
Poor
Abbreviation: NA, not available.
aBacterial species detected by Gumiel et al.8
bBacterial species detected by da Mota et al.7
Freq. is for the numbers in Table 2 of Gumiel et al8 indicating the maximum absolute number of times a genus was detected over each of 4 samples. Low is for the low but
uncharacterized level of detection of a genus by DGGE in da Mota et al.7 Endo is for the high but uncharacterized level of detection of an endosymbiont by DGGE in da Mota et al.7
Carels et al
5
(supplementary file S2). These sequences were then compared
(BLASTx) with a data set composed by the protein sequences of
GC-rich bacteria from the whole genomes of C terpenotabidum
(NC_021663.fna), G polyisoprenivorans (NC_016906.fna), M
marinum (NC_010612.fna), R opacus (NC_012522.fna), and S
marcescens (NC_020064.fna), downloaded from ftp://ftp.ncbi.
nih.gov/genomes/Bacteria/ and filtered out the homologies for
identity rates≥60% over ≥33 amino acids. We retrieved the
sequence subset corresponding to these homologies with a Perl
script and compared (BLASTx) them with the subset of protein
sequences from KEGG corresponding to the list of EC numbers that are more frequent in GC-rich bacteria compared with
GC-poor forms according to their first 3 digits. Finally, we did
the same exercise with the protein sequences from the whole
genomes of GC-poor bacteria, ie, S saprophyticus (NC_007350.
fna), H somnus (NC_010519.fna), E faecalis (NC_018221.fna),
and S oralis (NC_015291.fna), and compared the results.
Statistics
Due to the different environmental conditions, differential growth
of the GC-poor and GC-rich bacteria in DTT and LB were to be
expected. Thus, a marker is needed to verify that the model
matches the bench experiment (shotgun sequences). Therefore, as
a marker, the ratio (proportion) of (1) enzymatic functions that are
overrepresented in GC-rich bacteria compared with GC-poor
ones relative to (2) the whole DNA–encoded protein sample for
the type of bacteria considered (GC-poor or GC-rich) were chosen. Because the relative ratio of these enzymatic functions is different in GC-rich and GC-poor bacteria, then rejection of the
null hypothesis of equality of both proportions (1 for GC-rich and
1 for GC-poor bacteria) in the bench experiment is to be expected
if it mirrors the model (where both proportions are different).
There are at least 4 different methods to test the equality
of 2 proportions, but 1 based on the Z score (https://onlinecourses.science.psu.edu/stat414/node/268) is presented here.
According to this method, the hypothesis of equality of 2 proportions H0: p1 = p2 can be rejected if a quantity, Z, is larger than
a theoretical value (1.96) of reference for a probability risk
α = 0.05. The quantity Z is calculated using formula (1):
( p 1 − p 2 )
Z=
p
( 1 − p ) n1 + n1
1
(1)
2
where,
Y + Y2
p = 1
n1 + n2
(2)
is the proportion of successes in the 2 samples combined (Y1
and Y2 are the absolute frequency of success in samples 1 and 2,
respectively. The sample sizes of Y1 and Y2 are referred to as n1
and n2, respectively).
Results
In genera of bacteria isolated from Triatoma spp., those with
GC-rich genomes surpass in relative number (75%) the genera
of bacteria with GC-poor genomes (25%).8 In addition, among
the genera described by Gumiel et al,8 the most widely represented include species with GC-rich genomes, whereas the
genera only marginally represented include bacterial species
with GC-poor genomes (Table 2). Obviously, being GC-rich
is not sufficient for a bacterium to outperform others present in
the gut of triatomines because 45% of the other minor bacterial
species were also GC-rich.8 However, because all outperforming bacteria (6 belonging to 6 genera in 6 different families
with 5 from Actinomycetales and 1 from Enterobacteriales)
were GC-rich, the possibility of the GC level being a key factor
for these bacteria in the DTT cannot be ignored (Table 2).
On comparing GC3 with genome size and gene number for
the species of the genera identified by Gumiel et al8 for which
a complete genome sequence was available, a significant positive correlations was found for GC3 vs genome size (r = .66,
P < .01), GC3 vs gene number (r = .61, P < .01), and genome size
vs gene number (r = .99, P < .01) (Table 3).
If the last correlation may seem trivial in bacteria, the first
one is not (the second is a consequence of the first given the
third). As a consequence of the positive correlation between
GC3 and genome size, GC-rich genomes of DTT bacterial
microbiota have a potentially more complex metabolism than
that of GC-poor genomes, which seems to be an advantage in
this system. A more careful analysis of Table 3 shows that
Corynebacterium has a small genome (at least in the species
considered here), but this fact is not necessarily a contradiction
because several other Corynebacterineae in DTT have large
genomes. It simply suggests that Corynebacterium (at least the
species considered here) may be in a process of genome reduction on the basis of the enzymatic apparatus of the family.
When comparing enzymatic activities in GC-poor and
GC-rich bacteria through the evaluation of their relative frequency according to the first digit of the EC numbers (Table
4), oxidoreductases might explain the success of GC-rich bacteria because they were 2 times more frequent, on average, than
in GC-poor ones.
According to this observation, the enzymatic comparison of
the second digit (Table 5) revealed a larger number of subcategories (6, ie, acting on the CH-CH group of donors—EC:1.3.-.-, acting on the CH-NH2 group of donors—EC:1.4.-.-, acting on single
donors with incorporation of molecular oxygen—EC:1.13.-.-, acting
on paired donors with incorporation or reduction of molecular oxygen—EC:1.14.-.-, acting on iron-sulfur proteins as donors—
EC:1.18.-.-) with a larger EC number frequency (≥2 times more
frequent) in GC-rich compared with GC-poor bacteria in oxidoreductases than in the other 4 categories of the first digit level,
ie, 2 (acting on ether bonds—EC:3.3.-.-, acting on carbon-carbon
bonds—EC:3.7.-.-) in hydrolases, 1 (intramolecular lyases—
EC:5.5.-.-) in isomerases, and 1 (forming carbon-sulfur bonds—
EC:6.2.-.-) in ligases. The largest differences of EC relative
6
Table 2. GC content of coding sequences associated with bacterial genera in digestive tract of triatomines.
PhYLUM
CLASS
ORDER
SUBORDER
FAMILY
SPECIES
N
GC
CLASS
GC, %
σGC
GC1, % σGC1
GC2, % σGC2
GC3, % σGC3
FRa
Firmicutes
Bacilli
Bacillales
NA
Staphylococcaceae
Staphylococcus
saprophyticus
1844
Poor
33.8
4.6
46.8
5.4
31.7
4.6
22.9
3.8
Lb
Proteobacteria
α-Proteobacteria
Rickettsiales
NA
Anaplasmataceae
Wolbachia sp.
1037
Poor
34.4
4.5
44.3
4.8
32.9
4.3
26.1
4.3
Endoc
Proteobacteria
γ-Proteobacteria
Pasteurellales
NA
Pasteurellaceae
Haemophilus somnus
1601
Poor
37.5
5.1
49.5
5.1
34.9
4.5
28.3
5.7
L
Firmicutes
Bacilli
Lactobacillales
NA
Enterococcaceae
Enterococcus faecalis
2249
Poor
37.8
4.7
49.4
5.0
34.3
4.7
29.8
4.5
L
Firmicutes
Bacilli
Lactobacillales
NA
Streptococcaceae
Streptococcus oralis
1451
Poor
41.3
5.9
52.2
5.8
34.0
4.6
37.7
7.2
L
Proteobacteria
γ-Proteobacteria
Oceanospirillales
NA
Oceanospirillaceae
Marinomonas
posidonica
2795
Poor
44.6
4.6
54.0
4.4
37.9
3.9
41.7
5.3
L
γ-Proteobacteria
Enterobacteriales
NA
Enterobacteriaceae
Arsenophonus nasoniae 306
Poor
42.3
6.0
45.2
5.9
39.6
6.0
42.2
6.1
Endo
γ-Proteobacteria
Enterobacteriales
NA
Enterobacteriaceae
Pectobacterium
carotovorum
3322
Rich
52.2
6.4
59.0
6.0
40.9
4.7
56.7
8.6
L
Proteobacteria
γ-Proteobacteria
Enterobacteriales
NA
Enterobacteriaceae
Erwinia amylovora
2478
Rich
53.9
6.7
60.2
6.4
41.7
4.9
59.6
9.0
L
Actinobacteria
Actinobacteria
Actinomycetales
Corynebacterineae
Dietziaceae
Dietzia spp.
157
Rich
68.0
4.7
69.1
5.3
66.3
4.3
68.6
4.4
hd
Proteobacteria
γ-Proteobacteria
Enterobacteriales
NA
Enterobacteriaceae
Serratia marcescens
3620
Rich
59.4
7.5
63.2
6.6
42.7
5.1
72.5
10.8
h
Proteobacteria
β-Proteobacteria
Burkholderiales
NA
Comamonadaceae
Comamonas testosteroni 3941
Rich
62.0
6.2
65.2
5.3
45.9
4.9
74.8
8.3
L
Actinobacteria
Actinobacteria
Actinomycetales
Corynebacterineae
Mycobacteriaceae
Mycobacterium marinum 4464
Rich
65.6
5.4
68.0
4.9
50.6
5.5
78.0
5.8
h
Proteobacteria
α-Proteobacteria
Rhizobiales
NA
Bradyrhizobiaceae
Bradyrhizobium
japonicum
7029
Rich
64.0
6.1
65.1
4.9
47.9
4.9
78.9
8.6
L
Actinobacteria
Actinobacteria
Actinomycetales
Corynebacterineae
Gordoniaceae
Gordonia
polyisoprenivorans
3870
Rich
67.1
5.4
69.0
5.0
51.1
5.2
81.3
6.0
h
Actinobacteria
Actinobacteria
Actinomycetales
NA
Propionibacteriaceae
Propionibacterium
propionicum
2292
Rich
66.4
6.3
68.4
5.5
49.2
5.4
81.6
8.0
L
Proteobacteria
γ-Proteobacteria
Xanthomonadales
NA
Xanthomonadaceae
Stenotrophomonas
maltophilia
3275
Rich
67.2
5.7
69.8
5.6
48.8
5.4
83.0
6.2
L
Actinobacteria
Actinobacteria
Actinomycetales
Corynebacterineae
Corynebacteriaceae
Corynebacterium
terpenotabidum
1861
Rich
67.3
5.4
68.8
5.1
49.7
5.3
83.3
5.7
h
Actinobacteria
Actinobacteria
Actinomycetales
Corynebacterineae
Nocardiaceae
Rhodococcus opacus
5908
Rich
67.9
5.7
69.7
4.9
50.6
5.3
83.5
6.7
h
Proteobacteria
α-Proteobacteria
Rhizobiales
NA
Xanthobacteraceae
Xanthobacter
autotrophicus
3783
Rich
67.6
6.0
69.1
5.6
49.7
5.3
84.0
7.3
L
Proteobacteria
γ-Proteobacteria
Pseudomonadales
NA
Pseudomonadaceae
Pseudomonas
aeruginosa
4627
Rich
66.6
6.4
68.7
5.8
46.8
5.8
84.3
7.5
L
Proteobacteria
α-Proteobacteria
Sphingomonadales
NA
Sphingomonadaceae
Sphingomonas wittichii
4069
Rich
68.7
5.9
69.3
5.6
50.0
5.2
86.7
6.9
L
Abbreviation: NA, not available.
aFr is for the frequency of bacterial species reported in Gumiel et al.8
bL is for low frequency.
cEndo is for endosymbiont.
dh is for high frequency and is highlighted in gray background.
The shading regions in Table 2 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes.
Bioinformatics and Biology Insights
Proteobacteria
Proteobacteria
Carels et al
7
Table 3. Relationships between GC3, genome size, and gene number in the representative bacterial species with complete genome sequence of
bacterial genera found in the intestinal tract of triatomines.
SPECIES
GC3
GENOME, BP
GENE, NB
Staphylococcus saprophyticus (NC_007350)
22.9
2 516 573
2445
Haemophilus somnus (NC_010519)
28.3
2 263 855
1980
Enterococcus faecalis (NC_018221)
29.8
2 987 449
2876
Streptococcus oralis (NC_015291)
37.7
1 958 688
1905
Marinomonas posidonica (NC_015559)
41.7
3 899 938
3491
Pectobacterium carotovorum (NC_012917)
56.7
4 862 911
4246
Erwinia amylovora (NC_013971)
59.6
3 805 872
3437
Serratia marcescens (NC_020064)
72.5
4 858 215
4361
Comamonas testosteroni (NC_013446)
74.8
5 373 642
4802
Mycobacterium marinum (NC_010612)
78.0
6 636 826
5423
Bradyrhizobium japonicum (NC_017249)
78.9
9 207 382
8826
Gordonia polyisoprenivorans (NC_013441)
81.3
5 669 804
4945
Propionibacterium propionicum (NC_018142)
81.6
3 449 358
2938
Stenotrophomonas maltophilia (NC_017671)
83.0
4 769 154
4101
Corynebacterium terpenotabidum (NC_021663)
83.3
2 751 232
2369
Rhodococcus opacus (NC_012522)
83.5
7 913 449
7246
Xanthobacter autotrophicus (NC_009720)
84.0
5 308 932
4746
Pseudomonas aeruginosa (NC_021577)
84.3
6 342 033
5762
Sphingomonas wittichii (NC_009511)
86.7
5 382 259
4850
—
—
Correlations
rGC3 × GenomSza
0.66
rGC3 × GeneNbb
—
0.61
—
rGenomSz × GeneNbc
—
—
0.99
aCorrelation
for GC3 vs genome size.
for GC3 vs gene number.
for genome size vs gene number.
The shading regions in Table 3 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes.
bCorrelation
cCorrelation
frequency, according to the second digit within those of the first
digit category, between GC-poor and GC-rich bacteria were
due to acting on single donors with incorporation of molecular oxygen—EC:1.13.-.- (difference of ~24 times) and acting on paired
donors with incorporation or reduction of molecular oxygen—
EC:1.14.-.- (difference of ~8 times). The differences due to acting on iron-sulfur proteins as donors—EC:1.18.-.- (difference of
~5 times), acting on ether bonds—EC:3.3.-.- (difference of
~6 times), acting on carbon-carbon bonds—EC:3.7.-.- (difference
of ~4 times), and forming carbon-sulfur bonds—EC:6.2.-.- (difference of ~4 times) were also relatively large (the other differences being around 2 times). Thus, the functional variability of
the enzymatic apparatus seems to be important for a bacterium
to be able to outperform the others in the intestinal environment
of triatomines. In addition, we found that even if the function
acting on diphenols and related substances as donors (EC:1.10.-.-)
exists only at a low rate in all GC-rich bacteria of Table 5, it is
simply absent from the GC-poor bacteria found in DTT. This
kind of function is an example of a larger metabolic complexity
in GC-rich bacteria in the DTT environment.
The comparison of the third digit place of EC numbers
(Table 6) for difference of enzymatic activity between GC-poor
and GC-rich bacteria of the DTT showed a consistently larger
number of enzymes involved in oxygen and nitrogen processing suggesting a much larger ability to cope with the degradation of complex substrates of higher chemical stability such as
those containing aromatic rings (aryls). These enzymes can be
mainly grouped under EC numbers 1.13.11.-, 1.4.3.-, 1.3.99.-,
and 1.14.99.- but also to a lesser extent in 1.13.12.-, 1.14.11.-,
1.14.12.-, 1.14.13.-, 1.1.99.-, and 1.7.99.-.
Bioinformatics and Biology Insights
Abbreviations: Ct, Corynebacterium terpenotabidum; EC no., Enzyme Commission number; Ef, Enterococcus faecalis; Gp, Gordonia polyisoprenivorans; hs, Haemophilus somnus; Mm, Mycobacterium marinum; Ro,
Rhodococcus opacus; Sm, Stenotrophomonas maltophilia; So, Streptococcus oralis; Ss, Staphylococcus saprophyticus.
difference where AvGCr is for average of GC-rich and AvGCp is for average of GC-poor.
The shading regions in Table 4 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes.
7.5
6. -.-.-
aFactor
Ligases
1.0
1.5
7.1
7.6
8.4
8.3
6.6
4.7
0.4
7.1
7.3
6.6
5.3
5. -.-.-
7.1
Isomerases
0.7
1.0
4.5
4.3
4.3
4.5
3.3
6.2
1.1
6.1
5.1
6.7
8.2
4. -.-.-
7.5
Lyases
1.1
0.6
9.3
8.9
9.3
9.6
8.5
10.1
1.4
8.1
7.2
6.9
29.3
3. -.-.-
10.0
hydrolases
0.7
3.8
22.8
19.8
25.3
19.6
21.2
28.3
3.9
31.5
33.9
35.5
31.9
2. -.-.-
27.2
Transferases
0.9
2.6
28.7
25.3
32.4
27.6
28.8
29.5
1.2
32.8
33.7
14.4
12.8
12.7
17.9
1. -.-.-
14.4
31.6
1.9
6.4
27.5
34.2
20.3
30.5
31.6
2.4
21.2
83.5
83.3
81.3
78.0
72.5
37.7
29.8
28.3
22.9
GC3,%
33.8
Oxidoreductases
CLASS OF ENZYME FUNCTION
AVGCR/AVGCPa
SD
AVERAGE
RO
CT
GP
MM
SM
SD
AVERAGE
SO
EF
hS
SS
ECNO.
Table 4. Relative frequency (%) of enzymatic functions in GC-poor and GC-rich bacterial species found in triatomine digestive tract according to the irst EC number digit.
8
With the list of EC numbers (n = 30) that are overrepresented in GC-rich bacteria according to the first 3 digits (Table
6), the protein sequences were retrieved corresponding to the
EC numbers fully described on the 4 digits (n = 778) in (1) S
saprophyticus (n = 39), H somnus (n = 25), E faecalis (n = 29), and
S oralis (n = 11), ie, 26 EC numbers on average (σ = 11.6) for
GC-poor bacteria and (2) C terpenotabidum (n = 55), G polyisoprenivorans (n = 96), M marinum (n = 73), R opacus (n = 116),
and S marcescens (n = 95), ie, 87 EC numbers on average
(σ = 23.0) for GC-rich bacteria. This statistic means that
GC-rich bacteria have enzymes with 3.3 times more enzymatic functionalities than GC-poor ones, on average, according to the list of Table 6, which sustains the hypothesis that
GC-rich bacteria outperform GC-poor bacteria because of
their more complex metabolism, which seems to be an advantage in the DTT. Most enzymatic activities were found in dehydrogenases (aldehyde and amino acid), oxygenases (mono and di),
and ligase (acetate-CoA), which are enzymatic activities involved
in the very first steps of molecular degradation and synthesis.
Table 7 shows that among 35 enzymatic reactions that are
overrepresented in GC-rich bacteria, the large majority are from
oxidoreductases (74%) followed by transferases and ligases (9%
each) with hydrolases and lyases in last position accounting for
only 6% and 3%, respectively. A closer look at Table 7 enables
understanding that the enzyme groups, which most explain the
differences between GC-rich and GC-poor bacteria, are ranked
by decreasing level of factor difference (AvGCr/AvGCp) and that
the data of Table 7 can be reorganized as shown in Table 8.
In conclusion, we can say from the divisions in Table 8 that
oxygenases (incorporation of oxygen in organic substrates) and
CoA ligases (a central function in energy storage) make up the
main significant differences, in comparison with the more basic
metabolic functions, between the GC-poor and GC-rich bacteria. In addition, this difference emphasizes the existence of a
larger metabolic variety of enzymatic systems in GC-rich bacteria than in GC-poor ones in the DTT. Also, the relative frequency of the enzymes ranked AvGCr/AvGCp < 2 is low in
GC-rich bacteria, but because they are absent in GC-poor bacteria, they are probably of significance too.
The shotgun sequencing of bacteria from DTT grown in LB
medium produced a total number of 723 543 readings whose size
followed a bimodal distribution with the significant mode at
350 bp (Figure 1A). The average size significantly increased after
contig assembling as can be seen in Figure 1B; however, most of
the ORFs remained below 100 bp (Figure 1C). Filtering of
cORFs with UFM resulted in a final sample of 35 105 cORFs
mostly in the range of 100 to 300 bp (Figure 1D), which is in the
acceptable limit to perform homology comparison.
From the shotgun sequence samples, 2233 significant
homologies (best hit) were found among the putative 35 105
cORFs with the representative species of the genera of
GC-rich bacteria reported by Gumiel et al.8 Among the
2233 sequences, 425 (19.0%) had significant homologies
Carels et al
Table 5. Relative frequency (%) of enzymatic functions in GC-poor and GC-rich bacterial species found in triatomine digestive tract according to the 2 irst EC number digits.
ECNO.
SS
hS
EF
SO
AVERAGE
GC3,%
22.9
28.3
29.8
37.7
1.3.-.-
1.3
1.4
1.2
1.3
1.3
1.4.-.-
0.9
0.2
0.3
0.3
1.10.-.-
0.0
0.0
0.0
1.13.-.-
0.0
0.1
1.14.-.-
0.8
1.18.-.-
SD
SM
MM
GP
CT
RO
AVERAGE
SD
AVGCR/
AVGCPa
CLASS OF ENZYME FUNCTION
72.5
78.0
81.3
83.3
83.5
0.1
1.6
5.1
4.6
2.8
5.2
3.8
1.6
2.9
Acting on the Ch-Ch group of donors
0.4
0.3
0.9
0.9
0.9
1.3
1.6
1.1
0.3
2.7
0.2
0.3
0.1
0.2
0.1
∞
Acting on the Ch-Nh2 group of donors (amino
acid oxidoreductase)
0.0
0.0
0.0
0.4
0.1
0.0
0.0
0.0
0.1
0.6
0.6
0.8
0.1
1.5
0.7
0.5
24.5
Acting on single donors with incorporation of
molecular oxygen (monooxygenases)
0.6
0.1
0.1
0.4
0.4
1.4
4.3
4.9
2.1
3.8
3.3
1.5
8.2
Acting on paired donors, with incorporation or
reduction of molecular oxygen (dioxygenases)
0.1
0.0
0.2
0.0
0.1
0.1
0.3
0.2
0.4
0.4
0.5
0.4
0.1
5.0
Acting on iron-sulfur proteins as donors
3.3.-.-
0.1
0.0
0.1
0.0
0.0
0.1
0.1
0.4
0.2
0.3
0.4
0.3
0.1
6.1
Acting on ether bonds
3.7.-.-
0.1
0.0
0.0
0.0
0.0
0.1
0.1
0.1
0.2
0.0
0.2
0.1
0.1
4.4
Acting on carbon-carbon bonds
5.5.-.-
0.0
0.1
0.0
0.1
0.1
0.1
0.2
0.1
0.2
0.1
0.1
0.1
0.0
2.2
Intramolecular lyases
6.2.-.-
0.6
0.5
0.5
0.1
0.4
0.2
0.5
2.3
1.7
0.6
3.3
1.7
1.2
4.0
Forming carbon-sulfur bonds
Acting on diphenols and related substances
as donors (diphenol oxidoreductases)
Abbreviations: Ct, Corynebacterium terpenotabidum; EC no., Enzyme Commission number; Ef, Enterococcus faecalis; Gp, Gordonia polyisoprenivorans; hs, Haemophilus somnus; Mm, Mycobacterium marinum; Ro,
Rhodococcus opacus; Sm, Stenotrophomonas maltophilia; So, Streptococcus oralis; Ss, Staphylococcus saprophyticus.
aFactor difference where Av
GCr is for average of GC-rich and AvGCp is for average of GC-poor.
The shading regions in Table 5 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes and is expected to improve table readability.
9
10
Table 6. Relative frequency (%) of enzymatic functions in GC-poor and GC-rich bacterial species found in the triatomine digestive tract according to the 3 irst EC number digits.
ECNO.
SS
hS
EF
SO
AVERAGE
GC3,%
22.9
28.3
29.8
37.7
1.1.2.-
0.0
0.0
0.0
0.0
0.0
1.1.3.-
0.0
0.0
0.1
0.1
1.1.99.-
0.1
0.0
0.1
1.2.1.-
1.5
0.9
1.3.99.-
0.3
1.4.1.-
SD
SM
MM
GP
CT
RO
AVERAGE
SD
72.5
78.0
81.3
83.3
83.5
0.0
0.1
0.2
0.1
0.1
0.1
0.1
0.0
0.2
0.3
0.0
0.1
0.1
0.5
0.1
0.9
0.3
0.9
0.5
1.5
0.6
0.2
0.1
0.3
0.2
0.6
0.1
0.3
0.1
0.3
1.4.3.-
0.0
0.1
0.0
0.0
1.4.4.-
0.2
0.0
0.0
1.5.3.-
0.0
0.0
1.6.1.-
0.0
1.7.99.-
AVGCR/
AVGCPa
CLASS OF ENZYME FUNCTION
0.0
0.1
0.1
∞
With a cytochrome as acceptor
0.0
0.4
0.2
0.2
2.8
With oxygen as acceptor
0.3
0.1
0.2
0.2
0.2
4.8
With unknown physiological acceptors
2.1
2.6
1.8
3.5
2.3
0.8
2.6
With NAD+ or NADP+ as acceptor
0.6
4.5
3.4
1.8
4.7
3.0
1.8
9.6
With unknown physiological acceptors
0.2
0.5
0.3
0.5
0.7
0.9
0.6
0.2
2.0
With NAD+ or NADP+ as acceptor
0.0
0.1
0.3
0.5
0.4
0.3
0.6
0.4
0.1
14.2
With oxygen as acceptor
0.0
0.1
0.1
0.1
0.2
0.0
0.3
0.1
0.1
0.1
2.8
With a disulide as acceptor
0.0
0.0
0.0
0.0
0.2
0.1
0.5
0.1
0.2
0.2
0.2
∞
With oxygen as acceptor
0.2
0.0
0.0
0.1
0.1
0.2
0.3
0.3
0.1
0.1
0.2
0.1
3.5
With NAD+ or NADP+ as acceptor
0.0
0.1
0.0
0.0
0.0
0.1
0.7
0.1
0.0
0.0
0.3
0.2
0.3
6.7
With unknown physiological acceptors
1.9.3.-
0.0
0.0
0.1
0.0
0.0
0.0
0.0
0.1
0.2
0.3
0.1
0.1
0.1
5.3
With oxygen as acceptor
1.13.11.-
0.0
0.1
0.0
0.0
0.0
0.1
0.5
0.4
0.6
0.0
1.2
0.5
0.4
17.5
With incorporation of 2 atoms of oxygen
1.13.12.-
0.0
0.0
0.0
0.0
0.0
0.0
0.1
0.3
0.3
0.1
0.5
0.2
0.2
∞
1.14.11.-
0.0
0.0
0.0
0.1
0.0
0.1
0.2
0.1
0.3
0.0
0.2
0.1
0.1
4.3
With 2-oxoglutarate as 1 donor, and incorporation of 1
atom each of oxygen into both donors
1.14.12.-
0.2
0.0
0.0
0.0
0.1
0.1
0.2
0.0
0.2
0.4
0.4
0.2
0.2
4.5
With NADh or NADPh as 1 donor, and incorporation of 2
atoms of oxygen into 1 donor
1.14.13.-
0.5
0.5
0.1
0.0
0.3
0.3
0.6
1.0
3.0
0.6
1.7
1.4
1.0
4.9
With NAD or NADh as 1 donor, and incorporation of 1
atom of oxygen
1.14.99.-
0.1
0.0
0.0
0.0
0.0
0.1
0.0
0.1
0.3
0.3
0.4
0.2
0.2
9.2
Miscellaneous
1.17.7.-
0.0
0.1
0.0
0.0
0.0
0.1
0.1
0.1
0.1
0.0
0.0
0.1
0.0
2.1
With an iron-sulfur protein as acceptor
1.18.1.-
0.1
0.0
0.2
0.0
0.1
0.1
0.2
0.2
0.4
0.4
0.6
0.4
0.2
5.1
With NAD+ or NADP+ as acceptor
2.7.11.-
0.4
0.0
0.2
0.3
0.2
0.2
0.2
0.8
1.1
0.6
1.3
0.8
0.4
3.6
Protein-serine/threonine kinases
2.8.3.-
0.1
0.2
0.0
0.0
0.1
0.1
0.2
0.7
0.5
0.6
1.4
0.7
0.4
7.7
CoA-transferases
With incorporation of 1 atom of oxygen (internal
monooxygenases or internal mixed function oxidases)
Bioinformatics and Biology Insights
Carels et al
Table 6. (Continued)
ECNO.
SS
hS
EF
SO
AVERAGE
GC3,%
22.9
28.3
29.8
37.7
2.10.1.-
0.0
0.1
0.0
0.0
0.0
3.1.6.-
0.0
0.2
0.0
0.1
3.3.2.-
0.1
0.0
0.1
4.3.3.-
0.0
0.0
5.1.99.-
0.0
5.3.3.-
SD
SM
MM
GP
CT
RO
AVERAGE
SD
AVGCR/
AVGCPa
CLASS OF ENZYME FUNCTION
72.5
78.0
81.3
83.3
83.5
0.1
0.1
0.1
0.1
0.0
0.0
0.1
0.0
2.1
Molybdenumtransferases or tungstentransferases with
sulide groups as acceptors
0.1
0.1
0.2
0.3
0.1
0.1
0.4
0.2
0.1
2.4
Sulfuric ester hydrolases
0.0
0.1
0.1
0.1
0.4
0.2
0.1
0.4
0.3
0.2
5.1
Ether hydrolases
0.1
0.0
0.0
0.0
0.3
0.0
0.0
0.1
0.0
0.1
0.1
4.1
Amine-lyases
0.0
0.0
0.0
0.0
0.0
0.1
0.1
0.3
0.1
0.3
0.2
0.1
∞
Acting on other compounds
0.1
0.0
0.1
0.1
0.1
0.1
0.3
0.4
0.3
0.1
0.2
0.3
0.1
3.2
Transposing C=C bonds
5.5.1.-
0.0
0.1
0.0
0.1
0.1
0.1
0.2
0.1
0.2
0.1
0.1
0.2
0.0
2.4
Miscellaneous
6.2.1.-
0.6
0.5
0.5
0.1
0.4
0.2
0.6
2.6
1.9
0.6
3.6
1.9
1.3
4.3
Miscellaneous
Abbreviations: Ct, Corynebacterium terpenotabidum; EC no., Enzyme Commission number; Ef, Enterococcus faecalis; Gp, Gordonia polyisoprenivorans; hs, Haemophilus somnus; Mm, Mycobacterium marinum; Ro,
Rhodococcus opacus; Sm, Stenotrophomonas maltophilia; So, Streptococcus oralis; Ss, Staphylococcus saprophyticus.
aFactor difference where Av
GCr is for average of GC-rich and AvGCp is for average of GC-poor.
The shading regions in Table 6 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes.
11
12
Table 7. Relative frequency (%) of enzymatic functions in GC-poor and GC-rich bacterial species found in the triatomine digestive tract according to the 4 EC number digits.
ECNO.
SS
hS
EF
SO
AVERAGE
GC3,%
22.9
28.3
29.8
37.7
1.1.99.1
1.0
0.0
0.0
0.0
0.3
1.2.1.2
4.0
1.0
1.0
0.0
1.2.1.3
3.0
0.0
1.0
1.2.1.7
1.0
1.0
1.2.1.8
1.0
1.2.1.10
SD
SM
MM
GP
CT
RO
AVERAGE
SD
AVGCR/
AVGCPa
CLASS OF ENZYME FUNCTION
72.5
78.0
81.3
83.3
83.5
0.5
2.0
1.0
1.0
1.0
2.0
1.4
0.5
5.6
Choline dehydrogenase
1.5
1.7
7.0
8.0
7.0
2.0
9.0
6.6
2.7
4.4
Formate dehydrogenase
0.0
1.0
1.4
3.0
4.0
5.0
2.0
14.0
5.6
4.8
5.6
Aldehyde dehydrogenase (NAD+)
0.0
0.0
0.5
0.6
1.0
3.0
2.0
2.0
1.0
1.8
0.8
3.6
Benzaldehyde dehydrogenase (NADP+)
0.0
0.0
0.0
0.3
0.5
5.0
0.0
1.0
1.0
2.0
1.8
1.9
7.2
Betaine-aldehyde dehydrogenase
0.0
0.0
3.0
0.0
0.8
1.5
1.0
1.0
1.0
0.0
5.0
1.6
1.9
2.1
Acetaldehyde dehydrogenase (acetylating)
1.2.1.16
1.0
1.0
0.0
0.0
0.5
0.6
1.0
1.0
2.0
1.0
2.0
1.4
0.5
2.8
Glyceraldehyde-3-phosphate dehydrogenase (NADP+)
(phosphorylating)
1.2.1.27
0.0
1.0
0.0
0.0
0.3
0.5
1.0
1.0
3.0
0.0
5.0
2.0
2.0
8.0
Methylmalonate-semialdehyde dehydrogenase (acylating)
1.2.1.38
1.0
0.0
0.0
0.0
0.3
0.5
1.0
0.0
1.0
1.0
1.0
0.8
0.4
3.2
N-acetyl-g-glutamyl-phosphate reductase
1.2.1.70
1.0
1.0
0.0
0.0
0.5
0.6
1.0
1.0
1.0
1.0
1.0
1.0
0.0
2.0
Glutamyl-tRNA reductase
1.3.99.1
2.0
3.0
1.0
0.0
1.5
1.3
8.0
6.0
9.0
2.0
7.0
6.4
2.7
4.3
Succinate dehydrogenase
1.4.1.1
3.0
0.0
1.0
0.0
1.0
1.4
3.0
4.0
6.0
4.0
8.0
5.0
2.0
5.0
Alanine dehydrogenase
1.4.1.2
2.0
0.0
0.0
0.0
0.5
1.0
2.0
1.0
0.0
0.0
6.0
1.8
2.5
3.6
Glutamate dehydrogenase
1.4.1.13
2.0
0.0
1.0
0.0
0.8
1.0
3.0
3.0
5.0
2.0
6.0
3.8
1.6
5.1
Glutamate synthase (NADPh)
1.4.3.1
0.0
0.0
0.0
0.0
0.0
0.0
1.0
1.0
1.0
2.0
3.0
1.6
0.9
∞
D-aspartate
1.4.4.2
2.0
0.0
0.0
0.0
0.5
1.0
2.0
3.0
0.0
2.0
3.0
2.0
1.2
4.0
Glycine dehydrogenase (decarboxylating)
1.5.3.1
0.0
0.0
0.0
0.0
0.0
0.0
4.0
1.0
5.0
1.0
5.0
3.2
2.0
∞
Sarcosine oxidase
1.6.1.2
0.0
2.0
0.0
0.0
0.5
1.0
2.0
2.0
3.0
1.0
3.0
2.2
0.8
4.4
NAD(P)+ transhydrogenase (Re/Si-speciic)
1.7.99.4
0.0
1.0
0.0
0.0
0.3
0.5
12.0
1.0
0.0
0.0
6.0
3.8
5.2
15.2
Nitrate reductase
1.13.11.2
0.0
1.0
0.0
0.0
0.3
0.5
3.0
4.0
2.0
0.0
11.0
4.0
4.2
16.0
Catechol 2,3-dioxygenase
1.13.11.24
0.0
1.0
0.0
0.0
0.3
0.5
2.0
2.0
0.0
0.0
3.0
1.4
1.3
5.6
quercetin 2,3-dioxygenase
1.14.12.1
2.0
0.0
0.0
0.0
0.5
1.0
3.0
0.0
1.0
1.0
3.0
1.6
1.3
3.2
Anthranilate 1,2-dioxygenase (deaminating, decarboxylating)
1.14.13.1
1.0
0.0
0.0
0.0
0.3
0.5
5.0
1.0
4.0
1.0
6.0
3.4
2.3
13.6
1.14.13.8
0.0
0.0
0.0
0.0
0.0
0.0
1.0
1.0
4.0
1.0
3.0
2.0
1.4
∞
Dimethyl aniline monooxygenase (N-oxide-forming)
1.14.99.3
1.0
0.0
0.0
0.0
0.3
0.5
0.0
1.0
2.0
2.0
2.0
1.4
0.9
5.6
heme oxygenase (decyclizing)
oxidase
Bioinformatics and Biology Insights
Salicylate 1-monooxygenase
Carels et al
Table 7. (Continued)
ECNO.
SS
hS
EF
SO
AVERAGE
GC3,%
22.9
28.3
29.8
37.7
1.17.7.1
0.0
1.0
0.0
0.0
0.3
1.18.1.2
1.0
0.0
2.0
0.0
1.18.1.3
0.0
0.0
0.0
2.7.11.1
1.0
0.0
2.8.3.1
0.0
2.10.1.1
SD
SM
MM
GP
CT
RO
AVERAGE
SD
AVGCR/
AVGCPa
CLASS OF ENZYME FUNCTION
72.5
78.0
81.3
83.3
83.5
0.5
1.0
2.0
1.0
0.0
1.0
1.0
0.7
4.0
4-hydroxy-3-methylbut-2-en-1-yldiphosphate synthase
0.8
1.0
2.0
1.0
3.0
2.0
5.0
2.6
1.5
3.5
Ferredoxin—NADP+ reductase
0.0
0.0
0.0
1.0
1.0
1.0
1.0
6.0
2.0
2.2
∞
Ferredoxin—NAD+ reductase
1.0
2.0
1.0
0.8
2.0
12.0
12.0
4.0
29.0
11.8
10.6
11.8
Nonspeciic serine/threonine protein kinase
0.0
0.0
0.0
0.0
0.0
1.0
9.0
4.0
1.0
17.0
6.4
6.8
∞
Propionate CoA-transferase
0.0
1.0
0.0
0.0
0.3
0.5
1.0
2.0
1.0
0.0
1.0
1.0
0.7
4.0
Molybdopterin molybdotransferase
3.1.6.1
0.0
1.0
0.0
1.0
0.5
0.6
2.0
5.0
0.0
1.0
7.0
3.0
2.9
6.0
Arylsulfatase
3.3.2.1
1.0
0.0
1.0
0.0
0.5
0.6
1.0
4.0
1.0
0.0
1.0
1.4
1.5
2.8
Isochorismatase
4.3.3.7
0.0
0.0
1.0
0.0
0.3
0.5
5.0
0.0
0.0
1.0
0.0
1.2
2.2
4.8
4-hydroxy-tetrahydrodipicolinate synthase
6.2.1.1
2.0
0.0
0.0
0.0
0.5
1.0
2.0
2.0
6.0
1.0
15.0
5.2
5.8
10.4
6.2.1.3
1.0
1.0
1.0
0.0
0.8
0.5
4.0
5.0
6.0
2.0
9.0
5.2
2.6
6.9
Long-chain-fatty-acid-CoA ligase
6.2.1.26
0.0
1.0
1.0
0.0
0.5
0.6
1.0
1.0
1.0
0.0
3.0
1.2
1.1
2.4
o-Succinylbenzoate-CoA ligase
Acetate-CoA ligase
Abbreviations: Ct, Corynebacterium terpenotabidum; EC no., Enzyme Commission number; Ef, Enterococcus faecalis; Gp, Gordonia polyisoprenivorans; hs, Haemophilus somnus; Mm, Mycobacterium marinum; Ro,
Rhodococcus opacus; Sm, Stenotrophomonas maltophilia; So, Streptococcus oralis; Ss, Staphylococcus saprophyticus.
aFactor difference where Av
GCr is for average of GC-rich and AvGCp is for average of GC-poor.
The shading regions in Table 7 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes.
13
Bioinformatics and Biology Insights
14
Table 8. Enzyme reactions of Table 7 classiied by decreasing AvGCr/AvGCp.
AVGCR/AVGCPa
S. NO.
EC NO.
ENZYMATIC FUNCTION
10 to 16
1
1.13.11.2
Catechol 2,3-dioxygenase
2
1.7.99.4
Nitrate reductase
3
1.14.13.1
Salicylate 1-monooxygenase
4
2.7.11.1
Nonspeciic serine/threonine protein kinase
5
6.2.1.1
Acetate—CoA ligase
1
1.2.1.27
Methylmalonate-semialdehyde dehydrogenase
2
1.2.1.8
Betaine-aldehyde dehydrogenase
3
6.2.1.3
Long-chain-fatty-acid-CoA ligase
4
3.1.6.1
Arylsulfatase
5
1.1.99.1
Choline dehydrogenase
6
1.2.1.3
Aldehyde dehydrogenases
7
1.13.11.24
quercetin 2,3-dioxygenase
8
1.14.99.3
heme oxygenase—biliverdin-producing
9
1.4.1.13
Glutamate synthase—NADPh
10
1.4.1.1
Alanine dehydrogenase
1
4.3.3.7
4-hydroxy-tetrahydrodipicolinate synthase
2
1.2.1.2
Formate dehydrogenase
3
1.6.1.2
NAD(P)+ transhydrogenase—Re/Si-speciic
4
1.3.99.1
Succinate dehydrogenase
5
1.4.4.2
Glycine dehydrogenase—aminomethyl-transferring
6
1.17.7.1
Cytidine diphosphate-4-dehydro-6-deoxyglucose reductase
1
2.10.1.1
Molybdopterin molybdotransferase
2
1.2.1.7
Benzaldehyde dehydrogenase
3
1.4.1.2
Glutamate dehydrogenase
4
1.18.1.2
Ferredoxin—NADP+ reductase
5
1.2.1.38
N-acetyl-g-glutamyl-phosphate reductase
6
3.3.2.1
Isochorismatase
7
6.2.1.26
o-Succinylbenzoate—CoA ligase
8
1.2.1.10
Acetaldehyde dehydrogenase
9
1.2.1.70
Glutamyl-tRNA reductase
1
1.4.3.1
D-Aspartate
2
1.5.3.1
Sarcosine oxidase
3
1.14.13.8
Flavin-containing monooxygenase
4
1.18.1.3
Ferredoxin—NAD+ reductase
5
2.8.3.1
Propionate CoA-transferase
5 to <10
4 to <5
2 to <4
<2
Abbreviation: EC no., Enzyme Commission number.
aFactor difference where Av
GCr is for average of GC-rich and AvGCp is for average of GC-poor.
oxidase
Carels et al
15
with KEGG for the EC number list of Table 6. In contrast,
19 715 significant homologies (best hit) were found among
the putative 35 105 cORFs with the representative species of
the genera of GC-poor bacteria reported by Gumiel et al.8
Among the 19 715 sequences, 1424 (7.2%) had significant
homologies with KEGG for the EC number list of Table 6.
Because the sample sizes are very different, one must be concerned with a statistical consistency of the factor ~2.7 (close
to the theoretical value of 3.3 found with the model) of overrepresented enzymes (Tables 6 and 7) observed in GC-rich
compared with GC-poor bacteria. The Z test applied to the
comparison of 2 proportions allows the formal conclusion
that the null hypothesis of proportion equality must be
rejected because Zobs (19) > Zth (1.96). Thus, despite a bias
introduced by LB fermentation is favorable for the growth of
GC-poor bacteria, the conclusion from the shotgun DNA
sequencing is that overrepresented enzymatic activities are
more frequent, in relative terms, in a medium fermented by
GC-rich than in a medium fermented by GC-poor bacteria,
as suggested by the model analysis based on complete genome
sequences available from National Center for Biotechnology
Information (NCBI).
Discussion
A recent investigation compared the microbiota of DTT in the
presence or absence of T cruzi.24 Globally, it showed a predominance of GC-rich bacterial species (without considering the
intracellular endosymbiont Arsenophonus)25 as previously
described7,8 except for Staphylococcus which predominated in
some individuals of P megistus and T infestans. However, the
results of Díaz et al24 may be equivocal because the V3-V4
hypervariable region of 16S rDNA produces a less accurate
quantitative description than with the V1-V3 region used by
Gumiel et al,8 particularly for Staphylococcus. Also, with the
V3-V4 region, differences between expected and observed frequencies of 10 to 300 times were reported by Zheng et al26 for
this latter genus, whereas the measure obtained using V1-V3
region was close to the expected value.
The GC level of a genome is an interesting variable to consider because it is robust in the sense that it is expected to be
globally conserved at the level of the family rank.13 Thus, if one
GC-poor bacterial species is present in one family, there is a
major likelihood that another species of that family will also be
GC-poor. The same reasoning also applies to GC-rich organisms. However, in special situations, such as in endosymbiosis
where the selective constraints are not those normally encountered by the members of the family, the above tendency is violated because endosymbionts are generally GC-poor,
independent of the family they belong to. The GC-poor trait of
endosymbionts may be due to an evolutionary convergence
induced by the peculiar constraints imposed by the intracellular
environment although this is debateable.27 The fact that the
luminal environment of DTT in which T cruzi thrives is very
Figure 1. Relative frequency of shotgun sequences associated with
bacteria found in the digestive tract of triatomines. (A) Size of reads
obtained by 454 Titanium technology, (B) contig size after successive
read assembling with Velvet and CAP3, (C) size of ORFs extracted
from read contigs, and (D) size of coding ORFs after UFM iltering.
cORFs indicate coding open reading frames; ORFs, open reading
frames.
16
different compared with that of intestinal epithelial cells was sufficient to eliminate the endosymbionts, Arsenophonus, Wolbachia,
Candidatus, and Rohrkolberia from the present analysis.
Predominant bacterial species are GC-rich,8 which raises
the question of whether a cause and effect relationship exists
between a bacterial species being GC-rich and its growth success28,29 in the DTT environment. In fact, the positive correlation between GC3 and genome size suggests that, in the DTT,
bacterial species with GC-rich genomes have a potentially
more complex metabolism than those with GC-poor genomes,
which would be an advantage for the bacteria in this niche.
The most significant differences that we found between the
bacterial groups were due to oxidoreductase enzymes that are
much more numerous in GC-rich than GC-poor bacteria and
seem to confer a metabolic advantage to GC-rich bacteria in
an environment such as blood, in particular, nitrate reductases
and oxygenases that are common in GC-rich bacteria.
In the enzymatic reaction involving nitrate reductases (EC
1.7.99.4, KEGG map 910),30 the electron transport system is
similar to that of aerobic respiration.31,32 It can be complemented by vitamin K to generate the energy required to survive
in anaerobic conditions.33
Oxygenases are enzymes that oxidize a substrate by the
transference of gaseous oxygen. Dioxygenases transfer both oxygen atoms of O2 into the substrate,34 whereas monooxygenases,
such as phenolases (cytochrome P450 oxidases), incorporate only
one atom of molecular oxygen into a substrate, such as phenols,
and the other atom is reduced to H2O.35 Oxygenases are usual
in soil bacteria because oxygen reactivity plays important roles
in the degradation of complex substrates. In particular, ringcleaving dioxygenases catalyze key reactions in the aerobic
microbial degradation of aromatic compounds. Many pathways
converge to catecholic intermediates. An example of the degradadation of a complex substrate is catechol 2,3-dioxygenases (EC
1.13.11.2) that catalyzes the opening of the benzene ring
(KEGG maps 361, 362, 622, 643) and converts catechol into
semialdehyde (OHC-R-COOH).36,37 Ring-cleaving dioxygenases that are active toward ring compounds belong to the
cupin superfamily. Cupin-type dioxygenases also involve quercetinases (flavonol 2,4-dioxygenases), which open up 2 C-C bonds
of the heterocyclic ring of quercetin, a widespread plant flavonol.38 In GC-rich bacteria, several other enzymes involved in
ring modification or heteroatom oxidation are also available
such as (1) arylsulfatases (EC 3.1.6.1), (2) benzaldehyde dehydrogenases—NADP+ (EC 1.2.1.7),39 and (3) flavin-containing
monooxygenases (EC 1.14.13.8), which can oxidize a wide array
of heteroatoms, particularly soft nucleophiles, such as amines,
sulfides, and phosphites from xeno-substrates, with no common structural features, to facilitate their excretion.40,41
In the DTT, oxygenases have been shown to allow the access
to iron of bacteria that encode that enzymatic system via
hemoglobin degradation with heme oxygenases—biliverdin-producing (EC 1.14.99.3).42 Heme oxygenase is an enzyme that
Bioinformatics and Biology Insights
catalyzes the degradation of heme and produces biliverdin,
iron, and carbon monoxide.43–45 Biliverdin is subsequently converted to bilirubin by biliverdin reductases. Iron is an essential
nutrient required for the survival of most bacteria.46
Bioavailability of iron in many environments such as soil or sea
is limited by the very low solubility of the Fe3+ ion. Microbes
release siderophores to scavenge iron from these mineral phases
by formation of soluble Fe3+ complexes that can be taken up by
active transport mechanisms. Many siderophores are nonribosomal peptides,47,48 although several are biosynthesized independently. Some pathogenic bacteria, such as S marcescens, can
use heme and hemoproteins as iron sources, independently of
siderophore production, by mechanisms involving outer membrane heme-binding proteins and heme transport systems.49,50
The iron-binding protein, transferrin, produces a marked
increase in S marcescens hemolytic activity.51
The levels of extracellular iron available within a host are limited, with most of the free iron being complexed to high-affinity
binding proteins such as transferrin. To circumvent this low iron
availability, pathogens have developed sophisticated mechanisms
to use the host’s iron-containing and heme-containing proteins.
The mechanism by which gram-positive bacteria, such as
Corynebacterium diphtheriae, acquire heme is similar to the heme
transport with siderophore52 and involves iron-chelating molecules excreted in the bacterial environment. Once the heme has
been transported across the outer membrane and is localized
within the cytoplasm, it is degraded by heme oxygenase.53,54
In contrast to oxygenases, oxidases (that reduce molecular
oxygen to hydrogen peroxide or to water) and dehydrogenases
(by transferring hydrogen from one substance to another) are
mainly, if not exclusively, involved in energy metabolism. Many
of the hydrogenases predominating in GC-rich bacteria are
involved in many different central pathways such as glycolysis,
the tricarboxylic acid cycle and oxidative phosphorylation that
are essential to cell success in their environment.
As shown by Unrean and Srienc,55 a “cell system has a natural tendency to evolve with time towards an asymptotic state
with maximum rate of entropy production.” In addition, De
Martino et al56 showed that “growth rate can be explained in
terms of a trade-off between the higher fitness of fast-growing
phenotypes and the higher entropy of slow-growing ones.”
From the results of this article and those of Unrean and Srienc55
and De Martino et al,56 it can be deduced that the success of
GC-rich compared with GC-poor bacteria in the DTT is due
to their enhanced ability to metabolize chemically complex
substrates. The higher entropy of their metabolic networks may
at least result from the predominance of hydrogenase functions
in central metabolic pathways such as those for amino acid and
nucleotide metabolism. In parallel with these increases in
enzymatic functions, a conserved set of CoA enzymes was also
found to be predominant in GC-rich bacteria and involved in
different pathways such as the synthesis of chemical bond
between large molecules (EC 6.2.1.1),57 toxic compound
Carels et al
degradation (EC 2.8.3.1, EC 1.2.1.10),58 and fatty acid (EC
6.2.1.3) and amino acid (EC 4.3.3.7)59 metabolism.
The higher metabolic activity found in GC-rich bacteria suggests that signaling proteins should also be significantly increased.
Indeed, a large difference was found for the nonspecific serine/
threonine protein kinases (EC 2.7.11.1), which belong to the family of transferases, specifically protein-serine/threonine kinases.
These enzymes transfer phosphates to the oxygen atom of a serine or threonine side chain in proteins. This process is called
phosphorylation and is known to regulate most of the cellular
pathways, especially those involved in signal transduction.60
In agreement with De Martino et al,56 the size inversion of
GC-rich and GC-poor bacterial population found in the shotgun sequencing analysis is not surprising because the rich LB
medium is more favorable for fast-growing bacteria with small
genomes and less enzymatic abilities (lower metabolic network
entropy). During experiments in this study, to have sufficient
DNA, it was necessary to amplify it for sequencing and bacterial
culturing were necessary steps, but, of course, at a cost of a bias.
The population bias favored GC-poor bacteria and demonstrates the importance of using culture-independent techniques
for in situ microbiota investigation. In this respect, the strategy
of describing the microbiota composition by 16S rDNA
sequencing prior to any further metagenomic description is
surely the best, provided that the complete genome sequences for
the metagenomes investigated are available. Thus, complete
genome sequences allow the construction of a model suitable to
determine what can be reasonably expected from the present
experiments. Despite its bias, the shotgun analysis undertaken
shows that the inferences proposed through the present model
are still relevant. Therefore, the species believed to be representative of their respective genera are indeed representative in the
context of this work because the proportion of predominant
enzymes in the experiments is similar to that of the model.
Shotgun sequencing of microbiota is expensive, and the
large amount of data provided can be difficult to analyze, especially when a eukaryote vector and blood meal source are
involved as most of the sequences come from the host gut and
not from the rare microbiota it contains.61 For instance, the
genome of Rhodnius prolixus RproC1 was predicted to be about
733 Mb, whereas the average size of each bacterial genome
sequence of its digestive tract is only about 4 to 5 Mb.62 Another
limitation of shotgun sequencing is that the information it provides on the composition of a microbiota depends on a reference set of microbial genomes which is still only small, typically
in the range of few thousand genomes.63 In contrast, large
numbers of 16S rDNA gene sequences are available for comparative analyses. For example, the RDP Release 11.5 of
September 30, 2016 consisted of 3 356 809 aligned and annotated 16S rDNA sequences (http://rdp.cme.msu.edu/).
The 1550 bp of the 16S gene consist of 8 highly conserved
regions (U1-U8) and 9 variable regions across the bacterial
domain.64 Identifying the organisms populating a microbial
17
community and their relative abundances is the typical primary
objective of investigations based on 16S rDNA amplicon characterization. A similarity comparison of 16S gene sequences is
usually used as the gold standard for taxonomic identification at
least at the genus level.65 The characterization of 16S amplicons
by DGGE has been a useful technique for rapid assessment of
the composition of DTT microbiota and is particularly suitable
for a first-pass comparison of multiple samples.61 However,
sequencing the 16S gene is currently the most common approach
used in microbial classification.66 The application of next-generation sequencing to microbial ecology has shown that the diversity in microbial populations is significantly higher than
previously estimated by traditional culture-based and conventional molecular methods.67 New technologies of DNA microarray (PhyloChip, Second Genome Inc, South San Francisco,
USA) are now supporting microbiota investigations by 16S
rDNA classification and offer the benefit of simultaneous detection of thousands of genes in a single shot.68,69 Core genes
enriched for housekeeping functions are also used to enrich classification based on 16S rDNA and to improve the resolution of
microbial community structure.70
Conclusions
The qualitative and quantitative description of a microbiota, as
adapted from Gumiel et al,8 is more precise than a blind
metagenomic analysis by DNA shotgun as long as complete
genome sequences exist for the bacterial genera diagnosed by
16S rDNA. This is precisely the case in the present investigation
as most of the genomes in the list of bacterial species identified
by 16S rDNA sequencing by Gumiel et al8 had companion species effectively sequenced that can be downloaded from the
NCBI server. The most striking differences in overrepresented
enzymatic functions that are found in GC-rich bacteria (prominent in the colonization of the triatomine digestive tract compared with GC-poor ones) are for the most part due to
oxidoreductases. We conclude that this group of enzymatic functions allows GC-rich bacteria to outcompete GC-poor ones in
an environment where the fermentation of a medium such as
fresh blood may need some specific metabolic activities such as
iron recycling and oxygen management. In such a context,
GC-rich bacteria would have a comparative advantage in the
colonization of their environment, thanks to their more complex
enzymatic apparatus, however, at the cost of a larger genome that
is slower to replicate. In consequence, invertebrate vectors are
valuable systems in which to study the properties that may favor
one particular microbial community as opposed to another.71
Acknowledgements
The authors thank Norman A Ratcliffe for reading and editing
the manuscript.
Author Contributions
NC, MG, FFdM, CJdCM, and PA conceived and designed the
experiments. NC analyzed the data and wrote the first draft of
Bioinformatics and Biology Insights
18
the manuscript. PA contributed to the writing of the manuscript. All authors reviewed and approved the final manuscript.
Disclosures and Ethics
As a requirement of publication, author(s) have provided to the
publisher signed confirmation of compliance with legal and
ethical obligations including but not limited to the following:
authorship and contributorship, conflicts of interest, privacy
and confidentiality, and (where applicable) protection of human
and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that
this article is unique and not under consideration or published
in any other publication, and that they have permission from
rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer
reviewers report no conflicts of interest.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
World Health Organization. WHO media centre: Chagas disease (American
trypanosomiasis). http://www.who.int/mediacentre/factsheets/fs340/en/. Published 2014. Accessed January 9, 2015.
Moraes CB, Giardini MA, Kim H, et al. Nitroheterocyclic compounds are more
eicacious than CYP51 inhibitors against Trypanosoma cruzi: implications for
Chagas disease drug discovery and development. Sci Rep. 2014;4:4703.
Quijano-Hernandez I, Dumonteil E. Advances and challenges towards a vaccine
against Chagas disease. Hum Vaccin. 2011;7:1184–1191.
Azambuja P, Feder D, Garcia ES. Isolation of Serratia marcescens in the midgut
of Rhodnius prolixus: impact on the establishment of the parasite Trypanosoma
cruzi in the vector. Exp Parasitol. 2004;107:89–96.
Castro D, Morales C, Garcia E, Azambuja P. Inhibitory efects of d-mannose on
trypanosomatid lysis induced by Serratia marcescens. Exp Parasitol.
2007;115:200–204.
Garcia ES, Genta F, Azambuja P, Schaub GA. Interactions between intestinal compounds of triatomines and Trypanosoma cruzi. Trends Parasitol. 2010;26:499–505.
da Mota FF, Marinho LP, Moreira CJC, et al. Cultivation-independent methods
reveal diferences among bacterial gut microbiota in triatomine vectors of Chagas
disease. PLoS Negl Trop Dis. 2012;6:e1631.
Gumiel M, da Mota FF, Rizzo VS, et al. Characterization of the microbiota in
the guts of Triatoma brasiliensis and Triatoma pseudomaculata infected by Trypanosoma cruzi in natural conditions using culture independent methods. Parasite
Vector. 2015;8:245.
Dennison NJ, Jupatanakul N, Dimopoulos G. he mosquito microbiota inluences
vector competence for human pathogens. Curr Opin Insect Sci. 2014;3:6–13.
Beard CB, Cordon-Rosales C, Durvasula RV. Bacterial symbionts of the triatominae and their potential use in control of Chagas disease transmission. Annu
Rev Entomol. 2002;47:123–141.
Durvasula RV, Sundaram RK, Kirsch P, et al. Genetic transformation of a corynebacterial symbiont from the Chagas disease vector Triatoma infestans. Exp Parasitol. 2008;119:94–98.
Bentley SD, Parkhill J. Comparative genomic structure of prokaryotes. Annu
Rev Genet. 2004;38:771–792.
Takahashi M, Kryukov K, Saitou N. Estimation of bacterial species phylogeny
through oligonucleotide frequency distances. Genomics. 2009;93:525–533.
Bohlin J, Snipen L, Hardy SP, et al. Analysis of intra-genomic GC content
homogeneity within prokaryotes. BMC Genomics. 2010;11:464.
Lightield J, Fram NR, Ely B. Across bacterial phyla, distantly-related genomes
with similar genomic GC content have similar patterns of amino acid usage.
PLoS One. 2011;6:e17677.
Wu H, Zhang Z, Hu S, Yu J. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct. 2012;7:2.
Dias JCP, Ramos AN Jr, Gontijo ED, et al. II Consenso Brasileiro em doença de
Chagas, 2015. Epidemiol Serv Saúde. 2016;25:7–86.
Cuba CAC, Alvarenga NJ, Barretto AC, Marsden PD, Gama MP. Dipetalogaster
maximus (Hemiptera, Triatominae) for xenodiagnosis of patients with serologically detectable Trypanosoma cruzi infection. Trans R Soc Trop Med Hyg.
1979;73:524–527.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
Gouy M, Delmotte S. Remote access to ACNUC nucleotide and protein
sequence databases at PBIL. Biochimie. 2008;90:555–562.
Zerbino DR, Birney E. Velvet: algorithms for de novo short reads assembly using
de Bruijn graphs. Genome Res. 2008;18:821–829.
Huang X, Madan A. CAP3: a DNA sequence assembly program. Genome Res.
1999;9:868–877.
Carels N, Frias D. Classifying coding DNA with nucleotide statistics. Bioinform
Biol Insights. 2009;3:141–154.
Carels N, Frias D. A statistical method without training step for the classiication of
coding frame in transcriptome sequences. Bioinform Biol Insights. 2013;7:35–54.
Díaz S, Villavicencio B, Correia N, Costa J, Haag KL. Triatomine bugs, their
microbiota and Trypanosoma cruzi: asymmetric responses of bacteria to an
infected blood meal. Parasite Vector. 2016;9:636.
Hypsa V, Dale D. In vitro culture and phylogenetic analysis of “Candidatus
Arsenophonus triatominarum,” an intracellular bacterium from the triatomine
bug, Triatoma infestans. Int J Syst Bacteriol. 1997;47:1140–1144.
Zheng W, Tsompana M, Ruscitto A, et al. An accurate and eicient experimental approach for characterization of the complex oral microbiota. Microbiome.
2015;3:48.
Hildebrand F, Meyer A, Eyre-Walker A. Evidence of selection upon genomic
GC-content in bacteria. PLoS Genet. 2010;6:e1001107.
Ponce de Leon M, de Miranda A, Alvarez-Valin F, Carels N. he purine bias of
coding sequences is determined by physicochemical constraints on proteins. Bioinform Biol Insights. 2014;8:93–108.
Carels N, Ponce de Leon M. An interpretation of the ancestral codon from Miller’s amino acids and nucleotide correlations in modern coding sequences. Bioinform Biol Insights. 2015;9:37–47.
Tavares P, Pereira AS, Moura JJ, Moura I. Metalloenzymes of the denitriication
pathway. J Inorg Biochem. 2006;100:2087–2100.
Chen J, Strous M. Denitriication and aerobic respiration, hybrid electron transport chains and co-evolution. Biochim Biophys Acta. 2013;1827:136–144.
Segers FH, Kešnerová L, Kosoy M, Engel P. Genomic changes associated with
the evolutionary transition of an insect gut symbiont into a blood-borne pathogen. ISME J. 2017;11:1232–1244. doi:10.1038/ismej.2016.201.
Kwon O, Bhattacharyya DK, Meganathan R. Menaquinone (vitamin K2) biosynthesis: overexpression, puriication, and properties of o-succinylbenzoylcoenzyme A synthetase from Escherichia coli. J Bacteriol. 1996;178:6778–6781.
Hayaishi O. An odyssey with oxygen. Biochem Bioph Res Co. 2005;338:2–6.
Waterman MR. Professor Howard Mason and oxygen activation. Biochem Bioph
Res Co. 2005;338:7–11.
Junker F, Field JA, Bangerter F, et al. Dioxygenation and spontaneous deamination of 2-aminobenzene sulphonic acid in Alcaligenes sp. strain O-1 with subsequent meta ring cleavage and spontaneous desulphonation to 2-hydroxymuconic
acid. Biochem J. 1994;300:429–436.
Kobayashi S, Hayaishi O. Anthranilic acid conversion to catechol (Pseudomonas). Methods Enzymol. 1970;17:505–510.
Fetzner S. Ring-cleaving dioxygenases with a cupin fold. Appl Environ Microb.
2012;78:2505–2514.
Pastrorova I, de Koster CG, Boom JJ. Analytical study of free and ester bound
benzoic and cinnamic acids of gum benzoin resins by GC-MS and HPLC-frit
FAB-MS. Phytochem Anal. 1997;8:63–73.
Ziegler D. An overview of the mechanism, substrate speciicities, and structure
of FMOs. Drug Metab Rev. 2002;34:503–511.
Eswaramoorthy S, Bonanno JB, Burley SK, Swaminathan S. Mechanism of
action of a lavin-containing monooxygenase. Proc Natl Acad Sci U S A.
2006;103:9832–9837.
Wilks A, Schmitt MP. Expression and characterization of a heme oxygenase
(Hmu O) from Corynebacterium diphtheriae. Iron acquisition requires oxidative
cleavage of the heme macrocycle. J Biol Chem. 1998;273:837–841.
Kikuchi G, Yoshida T, Noguchi M. Heme oxygenase and heme degradation.
Biochem Bioph Res Co. 2005;338:558–567.
Ryter SW, Alam J, Choi AMK. Heme oxygenase-1/carbon monoxide: from basic
science to therapeutic applications. Physiol Rev. 2006;86:583–650.
Evans JP, Niemevz F, Buldain G, de Montellano PO. Isoporphyrin intermediate
in heme oxygenase catalysis. Oxidation of alpha-meso-phenylheme. J Biol Chem.
2008;283:19530–19539.
Wandersman C, Delepelaire P. Bacterial iron sources: from siderophores to
hemophores. Annu Rev Microbiol. 2004;58:611–647.
Hider RC, Kong X. Chemistry and biology of siderophores. Nat Prod Rep.
2010;27:637–657.
Miethke M, Marahiel M. Siderophore-based iron acquisition and pathogen control. Microbiol Mol Biol Rev. 2007;71:413–451.
Létofé S, Ghigo JM, Wandersman C. Iron acquisition from heme and hemoglobin by a Serratia marcescens extracellular protein. Proc Natl Acad Sci U S A.
1994;91:9876–9880.
Létofé S, Ghigo JM, Wandersman C. Secretion of the Serratia marcescens
HasA protein by an ABC transporter. J Bacteriol. 1994;176:5372–5377.
Carels et al
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
Poole K, Braun V. Iron regulation of Serratia marcescens hemolysin gene expression. Infect Immun. 1988;56:2967–2971.
Fukushima T, Allred BE, Sia AK, Nichiporuk R, Andersen UN, Raymond KN.
Gram-positive siderophore-shuttle with iron-exchange from Fe-siderophore to
apo-siderophore by Bacillus cereus YxeB. Proc Natl Acad Sci U S A.
2013;110:13821–13826.
Matsui T, Furukawa M, Unno M, Tomita T, Ikeda-Saito M. Roles of distal Asp
in heme oxygenase from Corynebacterium diphtheriae, HmuO: a water-driven
oxygen activation mechanism. J Biol Chem. 2005;280:2981–2989.
Lai W, Chen H, Matsui T, et al. Enzymatic ring-opening mechanism of verdoheme by the heme oxygenase: a combined X-ray crystallography and QM/MM
study. J Am Chem Soc. 2010;132:12960–12970.
Unrean P, Srienc F. Metabolic networks evolve towards states of maximum
entropy production. Metab Eng. 2011;13:666–673.
De Martino D, Capuani F, De Martino A. Growth against entropy in bacterial
metabolism: the phenotypic trade-of behind empirical growth rate distributions
in E. coli. Phys Biol. 2016;13:036005. doi:10.1088/1478-3975/13/3/036005.
Gardner JG, Grundy FJ, Henkin TM, Escalante-Semerena JC. Control of acetyl-coenzyme A synthetase (AcsA) activity by acetylation/deacetylation without
NAD(+) involvement in Bacillus subtilis. J Bacteriol. 2006;188:5460–5468.
Manjasetty BA, Powlowski J, Vrielink A. Crystal structure of a bifunctional
aldolase-dehydrogenase: sequestering a reactive and volatile intermediate. Proc
Natl Acad Sci U S A. 2003;100:6992–6997.
Mirwaldt C, Korndörfer I, Huber R. he crystal structure of dihydrodipicolinate
synthase from Escherichia coli at 2.5 A resolution. J Mol Biol. 1995;246:227–239.
Wolanin PW, homason PA, Stock JB. Histidine protein kinases: key signal
transducers outside the animal kingdom. Genome Biol. 2002;3:reviews3013.
doi:10.1186/gb-2002-3-10-reviews3013.
Fraher MH, O’Toole PW, Quigley EM. Techniques used to characterize the gut
microbiota: a guide for the clinician. Nat Rev Gastroenterol Hepatol.
2012;9:312–322.
19
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
Mesquita RD, Vionette-Amaral RJ, Lowenberger C, et al. Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to
hematophagy and parasite infection. Proc Natl Acad Sci U S A.
2015;112:14936–14941.
Markowitz VM, Chen IM, Palaniappan K, et al. IMG: the Integrated Microbial
Genomes database and comparative analysis system. Nucleic Acids Res.
2012;40:D115–D122.
Jonasson J, Olofsson M, Monstein HJ. Classiication, identiication and subtyping of bacteria based on pyrosequencing and signature matching of 16S rDNA
fragments. APMIS. 2002;110:263–272.
Clarridge JE III. Impact of 16S rRNA gene sequence analysis for identiication
of bacteria on clinical microbiology and infectious diseases. Clin Microbiol Rev.
2004;17:840–862.
Spiegelman D, Whissell G, Greer CW. A survey of the methods for the characterization of microbial consortia and communities. Can J Microbiol.
2005;51:355–386.
Yun JH, Roh SW, Whon TW, et al. Insect gut bacterial diversity determined by
environmental habitat, diet, developmental stage, and phylogeny of host. Appl
Environ Microb. 2014;80:5254–5264.
Gentry TJ, Wickham GS, Schadt CW, He Z, Zhou J. Microarray applications in
microbial ecology research. Microb Ecol. 2006;52:159–175.
DeSantis TZ, Brodie EL, Moberg JP, Zubieta IX, Piceno YM, Andersen GL.
High-density universal 16S rRNA microarray analysis reveals broader diversity
than typical clone library when sampling the environment. Microb Ecol.
2007;53:371–383.
Segata N, Huttenhower C. Toward an eicient method of identifying core genes
for evolutionary and functional microbial phylogenies. PLoS ONE.
2011;6:e24704.
Newton ILG, Sheehan KB, Lee FJ, Horton MA, Hicks RD. Invertebrate systems for hypothesis-driven microbiome research. Microbiome Sci Med. 2013;1:1–
9. doi:10.2478/micsm-2013-0001.