Academia.eduAcademia.edu
733422 BBI0010.1177/1177932217733422Bioinformatics and Biology InsightsCarels et al research-article2017 A Metagenomic Analysis of Bacterial Microbiota in the Digestive Tract of Triatomines Nicolas Carels1, Marcial Gumiel2, Fabio Faria da Mota3, Carlos José de Carvalho Moreira4 and Patricia Azambuja2,5 Bioinformatics and Biology Insights Volume 11: 1–19 © The Author(s) 2017 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/1177932217733422 https://doi.org/10.1177/1177932217733422 1Laboratório de Modelagem de Sistemas Biológicos, National Institute for Science and Technology on Innovation in Neglected Diseases (INCT-IDN), Centro de Desenvolvimento Tecnológico em Saúde (CDTS), Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, Brazil. 2Laboratório de Bioquímica e Fisiologia de Insetos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz (IOC/FIOCRUZ), Rio de Janeiro, Brazil. 3Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz (IOC/FIOCRUZ), Rio de Janeiro, Brazil. 4Laboratório de Doenças Parasitárias, Fundação Oswaldo Cruz (IOC, FIOCRUZ), Rio de Janeiro, Brazil. 5Departamento de Entomologia Molecular, Instituto Nacional de Entomologia Molecular (INCT-EM), Rio de Janeiro, Brazil. ABSTRACT: The digestive tract of triatomines (DTT) is an ecological niche favored by microbiota whose enzymatic profile is adapted to the specific substrate availability in this medium. This report describes the molecular enzymatic properties that promote bacterial prominence in the DTT. The microbiota composition was assessed previously based on 16S ribosomal DNA, and whole sequenced genomes of bacteria from the same genera were used to calculate the GC level of rare and prominent bacterial species in the DTT. The enzymatic reactions encoded by coding sequences of both rare and common bacterial species were then compared and revealed key functions explaining why some genera outcompete others in the DTT. Representativeness of DTT microbiota was investigated by shotgun sequencing of DNA extracted from bacteria grown in liquid Luria-Bertani broth (LB) medium. Results showed that GC-rich bacteria outcompete GC-poor bacteria and are the dominant components of the DTT microbiota. In addition, oxidoreductases are the main enzymatic components of these bacteria. In particular, nitrate reductases (anaerobic respiration), oxygenases (catabolism of complex substrates), acetate-CoA ligase (tricarboxylic acid cycle and energy metabolism), and kinase (signaling pathway) were the major enzymatic determinants present together with a large group of minor enzymes including hydrogenases involved in energy and amino acid metabolism. In conclusion, despite their slower growth in liquid LB medium, bacteria from GC-rich genera outcompete the GC-poor bacteria because their specific enzymatic abilities impart a selective advantage in the DTT. KEYWORDS: GC content, genome size, gene number, EC number, ecological niche, midgut RECEIVED: December 23, 2016. ACCEPTED: April 10, 2017. PEER REVIEW: Five peer reviewers contributed to the peer review report. Reviewers’ reports totaled 878 words, excluding any conidential comments to the academic editor. INCT-IDN. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. DECLARATION OF CONFLICTING INTERESTS: The author(s) declared no potential conlicts of interest with respect to the research, authorship, and/or publication of this article. TYPE: Original Research FUNDING: The author(s) disclosed receipt of the following inancial support for the research, authorship, and/or publication of this article: N.C. received a grant (PAPES-V) from Fiocruz/CNPq (http://portal.iocruz.br/pt-br; http://cnpq.br/) and M.G. a PhD scholarship from CAPES (http://www.capes.gov.br/). The publication fees were paid by CORRESPONDING AUTHOR: Nicolas Carels, Laboratório de Modelagem de Sistemas Biológicos, National Institute for Science and Technology on Innovation in Neglected Diseases (INCT-IDN), Centro de Desenvolvimento Tecnológico em Saúde (CDTS), Fundação Oswaldo Cruz (FIOCRUZ), Av. Brasil 4036, Manguinhos, 21040-361 - RJ, Rio de Janeiro, Brazil. Email: [email protected] Introduction Chagas disease remains a serious health concern in South American countries, with approximately 8 million people in the chronic phase of this parasitosis. Trypanosoma cruzi, the causative agent, is mainly transmitted to humans by insects from the Triatominae subfamily distributed throughout the American continent.1 The host-parasite relationship between T cruzi and vertebrate hosts has been extensively studied and studies continue to develop new drugs and vaccines.2,3 In contrast, there are still few studies on the host-parasite relationship involving T cruzi and its interaction with the microbiota in the triatomine vector gut. Pioneer work by Azambuja et al4 showed that Serratia marcescens, belonging to the family Enterobacteriaceae, is a major component of the bacterial microbiota in the digestive tract of triatomines (DTT) that may kill T cruzi through mannose-sensitive fimbriae5,6 and could thus affect the epidemiology of Chagas disease. An investigation of the bacterial composition in the DTT was only recently undertaken at a molecular level through 16S ribosomal DNA (rDNA) characterization by da Mota et al7 and Gumiel et al.8 These 2 investigations found that the bacterial microbiota diversity is low (less than 10 major species) and varies in composition depending on the species of host triatomine. Apart from the intracellular endosymbiont genera, Arsenophonus, Wolbachia, and Candidatus Rohrkolberia, the major bacterial species found in the DTT were from Serratia genera and from the suborder Corynebacterineae (Mycobacterium, Rhodococcus, Gordonia, Corynebacterium, and Dietzia). Another microbiota with a low level of diversity has been described in female mosquitoes (Anopheles gambiae and Aedes aegypti), which are also hematophagous insects.9 This low number of major bacterial species found in the DTT provides an opportunity to investigate their molecular determinants. Aside from the major bacterial species mentioned above, da Mota et al7 and Gumiel et al8 also found several bacterial species that previously have not been reported to Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). Bioinformatics and Biology Insights 2 reproduce significantly in DTT. These species belong to the following genera: Acinetobacter, Actinomyces, Adhaeribacter, Bradyrhizobium, Chryseobacterium, Comamonas, Diaphorobacter, Enterococcus, Erwinia, Geobacillus, Haemophilus, Hydrogenophilus, Janthinobacterium, Marinomonas, Microvirga, Pectobacterium, Propionibacterium, Providencia, Pseudomonas, Shinella, Sphingomonas, Staphylococcus, Stenotrophomonas, Streptococcus, Streptophyta, Williamsia, and Xanthobacter. If some bacterial species reproduce optimally in the DTT, there must be a biochemical basis and it needs elucidation due to the possibility of controlling Chagas disease through paratransgenesis to reduce vector competence with genetically modified symbionts.10,11 The aim of this investigation was to identify enzymatic determinants resulting in the success of bacterial genera in the DTT, such as Serratia and the members of Corynebacterineae, in contrast to minor genera, including Staphylococcus, Streptococcus, Haemophilus, or Enterococcus. Thus, the biochemical differences were analyzed between the GC-rich (rich in guanine + cytosine) and GC-poor bacterial species found in the DTT, emphasizing specific enzymatic functions that could explain the success of the former compared with the latter bacteria.12–16 Materials and Methods The successive methodological steps followed to analyze the genome properties and enzymatic determinants in the particular niche of DTT were as follows: 1. To set up a metagenomic model of wild microbiota in DTT from species of bacteria having their genome completely sequenced and being as close as possible (from the same genera) to those identified by 16S rDNA sequencing; 2. To characterize the gross genomic features (genome size, gene number, GC level, enzymatic annotation) from bacteria of the metagenomic model; 3. To set up a working hypothesis based on the metagenomic model to justify the enzymatic determinants that make certain bacteria outcompete others in the DTT niche; 4. To validate the inference drawn from the metagenomic model through a bench experiment using a quantitative marker. The bench experiment was a shotgun sequencing characterization of the bacterial population from DTTs after Luria-Bertani broth (LB) culture, whereas the quantitative marker was the ratio of the enzyme annotations that are overrepresented in GC-rich compared with GC-poor bacterial species relative to the DNAencoded protein samples from these bacteria. Nacional de Experimentação Animal/Ministério de Ciência e Tecnologia. Triatomines were captured under the license L14323-7 given by the Sistema de Autorização e Informação em Biodiversidade (SISBIO) of the Instituto Chico Mendes de Conservação da Biodiversidade/Ministério do Meio Ambiente (MMA). Triatomine colonies, gut dissection, and bacterial cultures Triatoma infestans, Triatoma vitticeps, Panstrongylus megistus and Rhodnius neglectus are of epidemiologic importance.17 Dipetalogaster maximus has no epidemiologic importance, but presents good susceptibility to T cruzi, is used in xenodiagnosis, and is limited to Southern California and Mexico where it lives in a rocky habitat in association with lizards.18 The male and female triatomines used were in the fifth instar and maintained on chicken blood over approximately 20 generations in the Laboratório de Doenças Parasitárias (Instituto Oswaldo Cruz— IOC, Fundação Oswaldo Cruz—FIOCRUZ. Triatomines were dissected 7 to 10 days after feeding by opening the dorsal side from the posterior end of the abdomen to the last thoracic segment. Meticulous dissection of the midgut (stomach and intestine) and hindgut (rectum) was performed using a sterile ultrafine insulin syringe needle. Feces were obtained by abdominal compression or spontaneous ejections immediately after feeding. Guts and feces were collected together in sterile Eppendorf tubes and maintained at −20°C until use. All steps were performed under aseptic conditions. Three guts and their feces of each T infestans, T vitticeps, D maximus, P megistus, and R neglectus were then incubated in 2-mL Eppendorf tubes filled with liquid LB (Sigma-Aldrich Brasil Ltda., Sao Paulo, Brazil) at 30°C without agitation for circa 48 hours until turbidity, due to bacterial growth, became evident. DNA extraction from bacterial cultures After incubation in LB medium for 48 hours at 30°C without agitation in 2-mL Eppendorf tubes, the triatomine digestive tracts were removed, and the DNA from the remaining bacterial suspensions was extracted with the Fast DNA Spin Kit for Soil (BIO 101 Systems; Qbiogene, Carlsbad, CA, USA) according to the manufacturer’s protocol. DNA concentrations were determined using a NanoDrop spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA). About 1 µg DNA for each sample of the 5 triatomine species was then amplified with a Nextera DNA Library Preparation Kit (Illumina, San Diego, CA, USA) and sequenced through 454 Titanium technology. Ethics statement The animals used for blood feeding the triatomines at FIOCRUZ were treated according to the Ethical Principles in Animal Experimentation approved by the Ethics Committee in Animal Experimentation (CEUA/FIOCRUZ) under the license numbers LW-24/2013 and following the protocol from Conselho Microbial composition of triatomine digestive tract, sequence databases, and GC content The predominant bacterial genera identified by denaturing gradient gel electrophoresis (DGGE) in the digestive tract of T infestans, T vitticeps, D maximus, P megistus, and R neglectus Carels et al were Serratia, Erwinia, Candidatus Rohrkolberia, Providencia, Pectobacterium, and Arsenophonus.7 Of these 5 triatomine species, the genus Triatoma had a more diverse microbiota. A previous more detailed study of the microbiota composition from digestive tracts of Triatoma brasiliensis collected in the field8 showed that the most abundant bacterial genera found by 454 sequencing of 16S rDNA were Gordonia sp. (36%), Serratia sp. (18%), Mycobacterium sp. (18%), Corynebacterium (6%), and Rhodococcus sp. (6%). Serratia was the most widely distributed genus among the triatomine species investigated here (Table 1). Complete genomes were sequenced for at least one species in most of the bacterial genera identified in this work (Table 1) and can be downloaded from ftp://ftp.ncbi.nih.gov/genomes/ genbank/bacteria/. The coding sequences (CDS) were retrieved from the sequences of these genomes, available in *.fna files (see Table 1), by homologous comparison (tBLASTn) with their protein sequences, available in *.faa files (“*” stands for the name of the bacterial species under consideration). The average GC content was then calculated for (1) whole CDS or for (2) the first (GC1), second (GC2), and third (GC3) codon positions of each genome set, using a Perl script. When a complete genome sequence was not available for a given bacterial genus, as in the case of Arsenophonus sp. and Dietzia sp., a CDS sample was retrieved from GenBank (release 208—June 15, 2015) using the Infobiogen server (see http://www.infobiogen.fr) and the ACNUC/QUERY retrieval system19 with the options t = cds. The CDS samples used here for the GCx (x = 1, 2, 3, or the average of them) calculations are from bacterial species that, in most cases, were not the same as those diagnosed by da Mota et al7 and Gumiel et al,8 although the GC content obtained from these species was considered representative of the genus to which they belonged. Reference was made to Takahashi et al13 who stated that the construction of phylogenetic trees based on oligonucleotide frequency of bacterial species with similar GC contents led to topologies that were congruent at genus and family levels with those constructed from homologous genes. The bacterial genomes in GC-poor and GC-rich species were divided according to whether their GC3 was lower or higher than 50%. Enzymatic profiling of bacterial genomes used as references For this study, the classification Enzyme Commission (EC) numbers from the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) was used. If different enzymes catalyze the same reaction, then they receive the same EC number. Approximately 5500 enzyme reactions have already been classified according to a 4-digit hierarchy that is used to progressively refine classification descriptions. Briefly, the first digit reports on the type of reaction considered, which is divided into 6 main categories: Oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases. The second digit of the EC number describes the type of chemical object the reaction is acting 3 on, whereas the third digit often describes the type of donor or acceptor group. Finally, the fourth digit associates the enzyme with its reaction name (see a complete description at http:// www.chem.qmul.ac.uk/iubmb/enzyme/). Protein sequences of files (1) NC_018221.faa (Enterococcus faecalis), NC_010519.faa (Haemophilus somnus), NC_007350.faa (Staphylococcus saprophyticus), NC_015291.faa (Streptococcus oralis) and (2) NC_010612.faa (Mycobacterium marinum), NC_012522.faa (Rhodococcus opacus), NC_016906.faa (Gordonia polyisoprenivorans), NC_020064.faa (Serratia marcescens), and NC_021663.faa (Corynebacterium terpenotabidum) were considered representative of GC-poor and GC-rich bacterial species found in the DTT, respectively.8 By taking only the best hits (E value < 0.0001) into account, the protein sequences (BLASTp) of each of the files outlined above were compared with the enzyme sequences from the database of the Kyoto Encyclopedia of Genes and Genomes (KEGG version from June 2015; http:// www.genome.jp/kegg/) where the EC numbers are available. A homologous hit was considered significant when its identity rate was at least 60% for at least 33 amino acids. For each file of the homology comparison, the EC numbers (http://www.enzymedatabase.org/class.php) were grouped according to their first (6 classes), second (67 subclasses), third (264 subclasses), and fourth digits (the whole EC number set of 5549 approved enzymes as available from BRENDA—online release as of June, 30, 2015; http://www.brenda-enzymes.org/all_enzymes.php; see the “Results” section). Finally, the relative frequency of EC numbers per functional category were compared between GC-poor and GC-rich genomes of bacterial species considered representative of the bacterial genera diagnosed in DTT. Shotgun sequencing and analysis of DNA from bacterial cultures of triatomine guts The shotgun sequencing of bacteria from DTT incubated in LB medium was performed to determine the validity of the model proposed above. At the time of the experiment, a representative amplification by polymerase chain reaction would have needed an amount of DNA that was not compatible with that obtained from direct extraction of triatomine feces. Thus, a culture step was introduced prior to shotgun library construction knowing that it would introduce a bias due to the different growth conditions in LB and DTT. The 723 543 reads obtained by 454 sequencing were mounted into 16 435 contigs using Velvet20 according to http://ged.msu. edu/angus/tutorials-2011/short-read-assembly-velvet.html (k = 31, ie, 31mers were looking for overlaps between reads) and further assembled with CAP321 to finally obtain 14 269 nonredundant sequences (supplementary file S1). We then extracted the 738 297 open reading frames (ORF) from both positive and negative strands of these 14 269 sequences and filtered them out for CDS ORFs (cORFs) larger than 99 bp (base pairs) using the universal feature method (UFM),22,23 ending up with 35 105 cORFs compatible with the purine bias found in the CDSs Bioinformatics and Biology Insights 4 Table 1. Sequence materials for GC and enzymatic characterization. GENERA FREq. WhOLE GENOME SEqUENCES GC Acinetobactera 2 NA Actinomycesa 1 NA Adhaeribactera 1 NA Arsenophonusb Endo GenBank Poor Bradyrhizobiuma 1 Bradyrhizobium_japonicum_USDA_6_uid158851/NC_017249.fna Rich Chryseobacteriuma 1 NA Comamonasa 1 Comamonas_testosteroni_CNB_2_uid62961/NC_013446.fna Rich Corynebacteriuma 2015 Corynebacterium_terpenotabidum_Y_11_uid210639/NC_021663.fna Rich Diaphorobactera 2 NA Dietziaa 5008 GenBank Rich Enterococcusa 6 Enterococcus_faecalis_D32_uid171261/NC_018221.fna Poor Erwiniab Low Erwinia_amylovora_ATCC_49946_uid46943/NC_013971.fna Rich Geobacillusa 2 NA Gordoniaa 11825 Gordonia_polyisoprenivorans_Vh2_uid86651/NC_016906.fna Rich Haemophilusa 1 haemophilus_somnus_2336_uid57979/NC_010519.fna Poor Hydrogenophilusa 15 NA Janthinobacteriuma 1 NA Marinomonasa 1 Marinomonas_posidonica_IVIA_Po_181_uid67323/NC_015559.fna Microvirgaa 2 NA Mycobacteriuma 5737 Mycobacterium_marinum_M_uid59423/NC_010612.fna Rich Pectobacteriumb Low Pectobacterium_carotovorum_PC1_uid59295/NC_012917.fna Rich Propionibacteriuma 5 Propionibacterium_propionicum_F0230a_uid170533/NC_018142.fna Rich Pseudomonasa 4 Pseudomonas_aeruginosa_RP73_uid209328/NC_021577.fna Rich Rhodococcusa 1855 Rhodococcus_opacus_B4_uid13791/NC_012522.fna Rich Serratiaa 4917 Serratia_marcescens_FGI94_uid185180/NC_020064.fna Rich Shinellaa 2 NA Sphingomonasa 1 Sphingomonas_wittichii_RW1_uid58691/NC_009511.fna Rich Staphylococcusa 3 Staphylococcus_saprophyticus_ATCC_15305_uid58411/NC_007350.fna Poor Stenotrophomonasa 1 Stenotrophomonas_maltophilia_D457_uid162199/NC_017671.fna Rich Streptococcusa 1 Streptococcus_oralis_Uo5_uid65449/NC_015291.fna Poor Streptophytaa 2 NA Williamsiaa 15 NA Wolbachiab Endo Wolbachia_endosymbiont_of_Culex_quinquefasciatus_Pel_uid61645/ NC_010981.fna Poor Xanthobactera 1 Xanthobacter_autotrophicus_Py2_uid58453/NC_009720.fna Rich Poor Abbreviation: NA, not available. aBacterial species detected by Gumiel et al.8 bBacterial species detected by da Mota et al.7 Freq. is for the numbers in Table 2 of Gumiel et al8 indicating the maximum absolute number of times a genus was detected over each of 4 samples. Low is for the low but uncharacterized level of detection of a genus by DGGE in da Mota et al.7 Endo is for the high but uncharacterized level of detection of an endosymbiont by DGGE in da Mota et al.7 Carels et al 5 (supplementary file S2). These sequences were then compared (BLASTx) with a data set composed by the protein sequences of GC-rich bacteria from the whole genomes of C terpenotabidum (NC_021663.fna), G polyisoprenivorans (NC_016906.fna), M marinum (NC_010612.fna), R opacus (NC_012522.fna), and S marcescens (NC_020064.fna), downloaded from ftp://ftp.ncbi. nih.gov/genomes/Bacteria/ and filtered out the homologies for identity rates≥60% over ≥33 amino acids. We retrieved the sequence subset corresponding to these homologies with a Perl script and compared (BLASTx) them with the subset of protein sequences from KEGG corresponding to the list of EC numbers that are more frequent in GC-rich bacteria compared with GC-poor forms according to their first 3 digits. Finally, we did the same exercise with the protein sequences from the whole genomes of GC-poor bacteria, ie, S saprophyticus (NC_007350. fna), H somnus (NC_010519.fna), E faecalis (NC_018221.fna), and S oralis (NC_015291.fna), and compared the results. Statistics Due to the different environmental conditions, differential growth of the GC-poor and GC-rich bacteria in DTT and LB were to be expected. Thus, a marker is needed to verify that the model matches the bench experiment (shotgun sequences). Therefore, as a marker, the ratio (proportion) of (1) enzymatic functions that are overrepresented in GC-rich bacteria compared with GC-poor ones relative to (2) the whole DNA–encoded protein sample for the type of bacteria considered (GC-poor or GC-rich) were chosen. Because the relative ratio of these enzymatic functions is different in GC-rich and GC-poor bacteria, then rejection of the null hypothesis of equality of both proportions (1 for GC-rich and 1 for GC-poor bacteria) in the bench experiment is to be expected if it mirrors the model (where both proportions are different). There are at least 4 different methods to test the equality of 2 proportions, but 1 based on the Z score (https://onlinecourses.science.psu.edu/stat414/node/268) is presented here. According to this method, the hypothesis of equality of 2 proportions H0: p1 = p2 can be rejected if a quantity, Z, is larger than a theoretical value (1.96) of reference for a probability risk α = 0.05. The quantity Z is calculated using formula (1): ( p 1 − p 2 ) Z= p ( 1 − p )  n1 + n1  1 (1) 2 where, Y + Y2 p = 1 n1 + n2 (2) is the proportion of successes in the 2 samples combined (Y1 and Y2 are the absolute frequency of success in samples 1 and 2, respectively. The sample sizes of Y1 and Y2 are referred to as n1 and n2, respectively). Results In genera of bacteria isolated from Triatoma spp., those with GC-rich genomes surpass in relative number (75%) the genera of bacteria with GC-poor genomes (25%).8 In addition, among the genera described by Gumiel et al,8 the most widely represented include species with GC-rich genomes, whereas the genera only marginally represented include bacterial species with GC-poor genomes (Table 2). Obviously, being GC-rich is not sufficient for a bacterium to outperform others present in the gut of triatomines because 45% of the other minor bacterial species were also GC-rich.8 However, because all outperforming bacteria (6 belonging to 6 genera in 6 different families with 5 from Actinomycetales and 1 from Enterobacteriales) were GC-rich, the possibility of the GC level being a key factor for these bacteria in the DTT cannot be ignored (Table 2). On comparing GC3 with genome size and gene number for the species of the genera identified by Gumiel et al8 for which a complete genome sequence was available, a significant positive correlations was found for GC3 vs genome size (r = .66, P < .01), GC3 vs gene number (r = .61, P < .01), and genome size vs gene number (r = .99, P < .01) (Table 3). If the last correlation may seem trivial in bacteria, the first one is not (the second is a consequence of the first given the third). As a consequence of the positive correlation between GC3 and genome size, GC-rich genomes of DTT bacterial microbiota have a potentially more complex metabolism than that of GC-poor genomes, which seems to be an advantage in this system. A more careful analysis of Table 3 shows that Corynebacterium has a small genome (at least in the species considered here), but this fact is not necessarily a contradiction because several other Corynebacterineae in DTT have large genomes. It simply suggests that Corynebacterium (at least the species considered here) may be in a process of genome reduction on the basis of the enzymatic apparatus of the family. When comparing enzymatic activities in GC-poor and GC-rich bacteria through the evaluation of their relative frequency according to the first digit of the EC numbers (Table 4), oxidoreductases might explain the success of GC-rich bacteria because they were 2 times more frequent, on average, than in GC-poor ones. According to this observation, the enzymatic comparison of the second digit (Table 5) revealed a larger number of subcategories (6, ie, acting on the CH-CH group of donors—EC:1.3.-.-, acting on the CH-NH2 group of donors—EC:1.4.-.-, acting on single donors with incorporation of molecular oxygen—EC:1.13.-.-, acting on paired donors with incorporation or reduction of molecular oxygen—EC:1.14.-.-, acting on iron-sulfur proteins as donors— EC:1.18.-.-) with a larger EC number frequency (≥2 times more frequent) in GC-rich compared with GC-poor bacteria in oxidoreductases than in the other 4 categories of the first digit level, ie, 2 (acting on ether bonds—EC:3.3.-.-, acting on carbon-carbon bonds—EC:3.7.-.-) in hydrolases, 1 (intramolecular lyases— EC:5.5.-.-) in isomerases, and 1 (forming carbon-sulfur bonds— EC:6.2.-.-) in ligases. The largest differences of EC relative 6 Table 2. GC content of coding sequences associated with bacterial genera in digestive tract of triatomines. PhYLUM CLASS ORDER SUBORDER FAMILY SPECIES N GC CLASS GC, % σGC GC1, % σGC1 GC2, % σGC2 GC3, % σGC3 FRa Firmicutes Bacilli Bacillales NA Staphylococcaceae Staphylococcus saprophyticus 1844 Poor 33.8 4.6 46.8 5.4 31.7 4.6 22.9 3.8 Lb Proteobacteria α-Proteobacteria Rickettsiales NA Anaplasmataceae Wolbachia sp. 1037 Poor 34.4 4.5 44.3 4.8 32.9 4.3 26.1 4.3 Endoc Proteobacteria γ-Proteobacteria Pasteurellales NA Pasteurellaceae Haemophilus somnus 1601 Poor 37.5 5.1 49.5 5.1 34.9 4.5 28.3 5.7 L Firmicutes Bacilli Lactobacillales NA Enterococcaceae Enterococcus faecalis 2249 Poor 37.8 4.7 49.4 5.0 34.3 4.7 29.8 4.5 L Firmicutes Bacilli Lactobacillales NA Streptococcaceae Streptococcus oralis 1451 Poor 41.3 5.9 52.2 5.8 34.0 4.6 37.7 7.2 L Proteobacteria γ-Proteobacteria Oceanospirillales NA Oceanospirillaceae Marinomonas posidonica 2795 Poor 44.6 4.6 54.0 4.4 37.9 3.9 41.7 5.3 L γ-Proteobacteria Enterobacteriales NA Enterobacteriaceae Arsenophonus nasoniae 306 Poor 42.3 6.0 45.2 5.9 39.6 6.0 42.2 6.1 Endo γ-Proteobacteria Enterobacteriales NA Enterobacteriaceae Pectobacterium carotovorum 3322 Rich 52.2 6.4 59.0 6.0 40.9 4.7 56.7 8.6 L Proteobacteria γ-Proteobacteria Enterobacteriales NA Enterobacteriaceae Erwinia amylovora 2478 Rich 53.9 6.7 60.2 6.4 41.7 4.9 59.6 9.0 L Actinobacteria Actinobacteria Actinomycetales Corynebacterineae Dietziaceae Dietzia spp. 157 Rich 68.0 4.7 69.1 5.3 66.3 4.3 68.6 4.4 hd Proteobacteria γ-Proteobacteria Enterobacteriales NA Enterobacteriaceae Serratia marcescens 3620 Rich 59.4 7.5 63.2 6.6 42.7 5.1 72.5 10.8 h Proteobacteria β-Proteobacteria Burkholderiales NA Comamonadaceae Comamonas testosteroni 3941 Rich 62.0 6.2 65.2 5.3 45.9 4.9 74.8 8.3 L Actinobacteria Actinobacteria Actinomycetales Corynebacterineae Mycobacteriaceae Mycobacterium marinum 4464 Rich 65.6 5.4 68.0 4.9 50.6 5.5 78.0 5.8 h Proteobacteria α-Proteobacteria Rhizobiales NA Bradyrhizobiaceae Bradyrhizobium japonicum 7029 Rich 64.0 6.1 65.1 4.9 47.9 4.9 78.9 8.6 L Actinobacteria Actinobacteria Actinomycetales Corynebacterineae Gordoniaceae Gordonia polyisoprenivorans 3870 Rich 67.1 5.4 69.0 5.0 51.1 5.2 81.3 6.0 h Actinobacteria Actinobacteria Actinomycetales NA Propionibacteriaceae Propionibacterium propionicum 2292 Rich 66.4 6.3 68.4 5.5 49.2 5.4 81.6 8.0 L Proteobacteria γ-Proteobacteria Xanthomonadales NA Xanthomonadaceae Stenotrophomonas maltophilia 3275 Rich 67.2 5.7 69.8 5.6 48.8 5.4 83.0 6.2 L Actinobacteria Actinobacteria Actinomycetales Corynebacterineae Corynebacteriaceae Corynebacterium terpenotabidum 1861 Rich 67.3 5.4 68.8 5.1 49.7 5.3 83.3 5.7 h Actinobacteria Actinobacteria Actinomycetales Corynebacterineae Nocardiaceae Rhodococcus opacus 5908 Rich 67.9 5.7 69.7 4.9 50.6 5.3 83.5 6.7 h Proteobacteria α-Proteobacteria Rhizobiales NA Xanthobacteraceae Xanthobacter autotrophicus 3783 Rich 67.6 6.0 69.1 5.6 49.7 5.3 84.0 7.3 L Proteobacteria γ-Proteobacteria Pseudomonadales NA Pseudomonadaceae Pseudomonas aeruginosa 4627 Rich 66.6 6.4 68.7 5.8 46.8 5.8 84.3 7.5 L Proteobacteria α-Proteobacteria Sphingomonadales NA Sphingomonadaceae Sphingomonas wittichii 4069 Rich 68.7 5.9 69.3 5.6 50.0 5.2 86.7 6.9 L Abbreviation: NA, not available. aFr is for the frequency of bacterial species reported in Gumiel et al.8 bL is for low frequency. cEndo is for endosymbiont. dh is for high frequency and is highlighted in gray background. The shading regions in Table 2 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes. Bioinformatics and Biology Insights Proteobacteria Proteobacteria Carels et al 7 Table 3. Relationships between GC3, genome size, and gene number in the representative bacterial species with complete genome sequence of bacterial genera found in the intestinal tract of triatomines. SPECIES GC3 GENOME, BP GENE, NB Staphylococcus saprophyticus (NC_007350) 22.9 2 516 573 2445 Haemophilus somnus (NC_010519) 28.3 2 263 855 1980 Enterococcus faecalis (NC_018221) 29.8 2 987 449 2876 Streptococcus oralis (NC_015291) 37.7 1 958 688 1905 Marinomonas posidonica (NC_015559) 41.7 3 899 938 3491 Pectobacterium carotovorum (NC_012917) 56.7 4 862 911 4246 Erwinia amylovora (NC_013971) 59.6 3 805 872 3437 Serratia marcescens (NC_020064) 72.5 4 858 215 4361 Comamonas testosteroni (NC_013446) 74.8 5 373 642 4802 Mycobacterium marinum (NC_010612) 78.0 6 636 826 5423 Bradyrhizobium japonicum (NC_017249) 78.9 9 207 382 8826 Gordonia polyisoprenivorans (NC_013441) 81.3 5 669 804 4945 Propionibacterium propionicum (NC_018142) 81.6 3 449 358 2938 Stenotrophomonas maltophilia (NC_017671) 83.0 4 769 154 4101 Corynebacterium terpenotabidum (NC_021663) 83.3 2 751 232 2369 Rhodococcus opacus (NC_012522) 83.5 7 913 449 7246 Xanthobacter autotrophicus (NC_009720) 84.0 5 308 932 4746 Pseudomonas aeruginosa (NC_021577) 84.3 6 342 033 5762 Sphingomonas wittichii (NC_009511) 86.7 5 382 259 4850 — — Correlations rGC3 × GenomSza 0.66 rGC3 × GeneNbb — 0.61 — rGenomSz × GeneNbc — — 0.99 aCorrelation for GC3 vs genome size. for GC3 vs gene number. for genome size vs gene number. The shading regions in Table 3 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes. bCorrelation cCorrelation frequency, according to the second digit within those of the first digit category, between GC-poor and GC-rich bacteria were due to acting on single donors with incorporation of molecular oxygen—EC:1.13.-.- (difference of ~24 times) and acting on paired donors with incorporation or reduction of molecular oxygen— EC:1.14.-.- (difference of ~8 times). The differences due to acting on iron-sulfur proteins as donors—EC:1.18.-.- (difference of ~5 times), acting on ether bonds—EC:3.3.-.- (difference of ~6 times), acting on carbon-carbon bonds—EC:3.7.-.- (difference of ~4 times), and forming carbon-sulfur bonds—EC:6.2.-.- (difference of ~4 times) were also relatively large (the other differences being around 2 times). Thus, the functional variability of the enzymatic apparatus seems to be important for a bacterium to be able to outperform the others in the intestinal environment of triatomines. In addition, we found that even if the function acting on diphenols and related substances as donors (EC:1.10.-.-) exists only at a low rate in all GC-rich bacteria of Table 5, it is simply absent from the GC-poor bacteria found in DTT. This kind of function is an example of a larger metabolic complexity in GC-rich bacteria in the DTT environment. The comparison of the third digit place of EC numbers (Table 6) for difference of enzymatic activity between GC-poor and GC-rich bacteria of the DTT showed a consistently larger number of enzymes involved in oxygen and nitrogen processing suggesting a much larger ability to cope with the degradation of complex substrates of higher chemical stability such as those containing aromatic rings (aryls). These enzymes can be mainly grouped under EC numbers 1.13.11.-, 1.4.3.-, 1.3.99.-, and 1.14.99.- but also to a lesser extent in 1.13.12.-, 1.14.11.-, 1.14.12.-, 1.14.13.-, 1.1.99.-, and 1.7.99.-. Bioinformatics and Biology Insights Abbreviations: Ct, Corynebacterium terpenotabidum; EC no., Enzyme Commission number; Ef, Enterococcus faecalis; Gp, Gordonia polyisoprenivorans; hs, Haemophilus somnus; Mm, Mycobacterium marinum; Ro, Rhodococcus opacus; Sm, Stenotrophomonas maltophilia; So, Streptococcus oralis; Ss, Staphylococcus saprophyticus. difference where AvGCr is for average of GC-rich and AvGCp is for average of GC-poor. The shading regions in Table 4 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes. 7.5 6. -.-.- aFactor Ligases 1.0 1.5 7.1 7.6 8.4 8.3 6.6 4.7 0.4 7.1 7.3 6.6 5.3 5. -.-.- 7.1 Isomerases 0.7 1.0 4.5 4.3 4.3 4.5 3.3 6.2 1.1 6.1 5.1 6.7 8.2 4. -.-.- 7.5 Lyases 1.1 0.6 9.3 8.9 9.3 9.6 8.5 10.1 1.4 8.1 7.2 6.9 29.3 3. -.-.- 10.0 hydrolases 0.7 3.8 22.8 19.8 25.3 19.6 21.2 28.3 3.9 31.5 33.9 35.5 31.9 2. -.-.- 27.2 Transferases 0.9 2.6 28.7 25.3 32.4 27.6 28.8 29.5 1.2 32.8 33.7 14.4 12.8 12.7 17.9 1. -.-.- 14.4 31.6 1.9 6.4 27.5 34.2 20.3 30.5 31.6 2.4 21.2 83.5 83.3 81.3 78.0 72.5 37.7 29.8 28.3 22.9 GC3,% 33.8 Oxidoreductases CLASS OF ENZYME FUNCTION AVGCR/AVGCPa SD AVERAGE RO CT GP MM SM SD AVERAGE SO EF hS SS ECNO. Table 4. Relative frequency (%) of enzymatic functions in GC-poor and GC-rich bacterial species found in triatomine digestive tract according to the irst EC number digit. 8 With the list of EC numbers (n = 30) that are overrepresented in GC-rich bacteria according to the first 3 digits (Table 6), the protein sequences were retrieved corresponding to the EC numbers fully described on the 4 digits (n = 778) in (1) S saprophyticus (n = 39), H somnus (n = 25), E faecalis (n = 29), and S oralis (n = 11), ie, 26 EC numbers on average (σ = 11.6) for GC-poor bacteria and (2) C terpenotabidum (n = 55), G polyisoprenivorans (n = 96), M marinum (n = 73), R opacus (n = 116), and S marcescens (n = 95), ie, 87 EC numbers on average (σ = 23.0) for GC-rich bacteria. This statistic means that GC-rich bacteria have enzymes with 3.3 times more enzymatic functionalities than GC-poor ones, on average, according to the list of Table 6, which sustains the hypothesis that GC-rich bacteria outperform GC-poor bacteria because of their more complex metabolism, which seems to be an advantage in the DTT. Most enzymatic activities were found in dehydrogenases (aldehyde and amino acid), oxygenases (mono and di), and ligase (acetate-CoA), which are enzymatic activities involved in the very first steps of molecular degradation and synthesis. Table 7 shows that among 35 enzymatic reactions that are overrepresented in GC-rich bacteria, the large majority are from oxidoreductases (74%) followed by transferases and ligases (9% each) with hydrolases and lyases in last position accounting for only 6% and 3%, respectively. A closer look at Table 7 enables understanding that the enzyme groups, which most explain the differences between GC-rich and GC-poor bacteria, are ranked by decreasing level of factor difference (AvGCr/AvGCp) and that the data of Table 7 can be reorganized as shown in Table 8. In conclusion, we can say from the divisions in Table 8 that oxygenases (incorporation of oxygen in organic substrates) and CoA ligases (a central function in energy storage) make up the main significant differences, in comparison with the more basic metabolic functions, between the GC-poor and GC-rich bacteria. In addition, this difference emphasizes the existence of a larger metabolic variety of enzymatic systems in GC-rich bacteria than in GC-poor ones in the DTT. Also, the relative frequency of the enzymes ranked AvGCr/AvGCp < 2 is low in GC-rich bacteria, but because they are absent in GC-poor bacteria, they are probably of significance too. The shotgun sequencing of bacteria from DTT grown in LB medium produced a total number of 723 543 readings whose size followed a bimodal distribution with the significant mode at 350 bp (Figure 1A). The average size significantly increased after contig assembling as can be seen in Figure 1B; however, most of the ORFs remained below 100 bp (Figure 1C). Filtering of cORFs with UFM resulted in a final sample of 35 105 cORFs mostly in the range of 100 to 300 bp (Figure 1D), which is in the acceptable limit to perform homology comparison. From the shotgun sequence samples, 2233 significant homologies (best hit) were found among the putative 35 105 cORFs with the representative species of the genera of GC-rich bacteria reported by Gumiel et al.8 Among the 2233 sequences, 425 (19.0%) had significant homologies Carels et al Table 5. Relative frequency (%) of enzymatic functions in GC-poor and GC-rich bacterial species found in triatomine digestive tract according to the 2 irst EC number digits. ECNO. SS hS EF SO AVERAGE GC3,% 22.9 28.3 29.8 37.7 1.3.-.- 1.3 1.4 1.2 1.3 1.3 1.4.-.- 0.9 0.2 0.3 0.3 1.10.-.- 0.0 0.0 0.0 1.13.-.- 0.0 0.1 1.14.-.- 0.8 1.18.-.- SD SM MM GP CT RO AVERAGE SD AVGCR/ AVGCPa CLASS OF ENZYME FUNCTION 72.5 78.0 81.3 83.3 83.5 0.1 1.6 5.1 4.6 2.8 5.2 3.8 1.6 2.9 Acting on the Ch-Ch group of donors 0.4 0.3 0.9 0.9 0.9 1.3 1.6 1.1 0.3 2.7 0.2 0.3 0.1 0.2 0.1 ∞ Acting on the Ch-Nh2 group of donors (amino acid oxidoreductase) 0.0 0.0 0.0 0.4 0.1 0.0 0.0 0.0 0.1 0.6 0.6 0.8 0.1 1.5 0.7 0.5 24.5 Acting on single donors with incorporation of molecular oxygen (monooxygenases) 0.6 0.1 0.1 0.4 0.4 1.4 4.3 4.9 2.1 3.8 3.3 1.5 8.2 Acting on paired donors, with incorporation or reduction of molecular oxygen (dioxygenases) 0.1 0.0 0.2 0.0 0.1 0.1 0.3 0.2 0.4 0.4 0.5 0.4 0.1 5.0 Acting on iron-sulfur proteins as donors 3.3.-.- 0.1 0.0 0.1 0.0 0.0 0.1 0.1 0.4 0.2 0.3 0.4 0.3 0.1 6.1 Acting on ether bonds 3.7.-.- 0.1 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.2 0.0 0.2 0.1 0.1 4.4 Acting on carbon-carbon bonds 5.5.-.- 0.0 0.1 0.0 0.1 0.1 0.1 0.2 0.1 0.2 0.1 0.1 0.1 0.0 2.2 Intramolecular lyases 6.2.-.- 0.6 0.5 0.5 0.1 0.4 0.2 0.5 2.3 1.7 0.6 3.3 1.7 1.2 4.0 Forming carbon-sulfur bonds Acting on diphenols and related substances as donors (diphenol oxidoreductases) Abbreviations: Ct, Corynebacterium terpenotabidum; EC no., Enzyme Commission number; Ef, Enterococcus faecalis; Gp, Gordonia polyisoprenivorans; hs, Haemophilus somnus; Mm, Mycobacterium marinum; Ro, Rhodococcus opacus; Sm, Stenotrophomonas maltophilia; So, Streptococcus oralis; Ss, Staphylococcus saprophyticus. aFactor difference where Av GCr is for average of GC-rich and AvGCp is for average of GC-poor. The shading regions in Table 5 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes and is expected to improve table readability. 9 10 Table 6. Relative frequency (%) of enzymatic functions in GC-poor and GC-rich bacterial species found in the triatomine digestive tract according to the 3 irst EC number digits. ECNO. SS hS EF SO AVERAGE GC3,% 22.9 28.3 29.8 37.7 1.1.2.- 0.0 0.0 0.0 0.0 0.0 1.1.3.- 0.0 0.0 0.1 0.1 1.1.99.- 0.1 0.0 0.1 1.2.1.- 1.5 0.9 1.3.99.- 0.3 1.4.1.- SD SM MM GP CT RO AVERAGE SD 72.5 78.0 81.3 83.3 83.5 0.0 0.1 0.2 0.1 0.1 0.1 0.1 0.0 0.2 0.3 0.0 0.1 0.1 0.5 0.1 0.9 0.3 0.9 0.5 1.5 0.6 0.2 0.1 0.3 0.2 0.6 0.1 0.3 0.1 0.3 1.4.3.- 0.0 0.1 0.0 0.0 1.4.4.- 0.2 0.0 0.0 1.5.3.- 0.0 0.0 1.6.1.- 0.0 1.7.99.- AVGCR/ AVGCPa CLASS OF ENZYME FUNCTION 0.0 0.1 0.1 ∞ With a cytochrome as acceptor 0.0 0.4 0.2 0.2 2.8 With oxygen as acceptor 0.3 0.1 0.2 0.2 0.2 4.8 With unknown physiological acceptors 2.1 2.6 1.8 3.5 2.3 0.8 2.6 With NAD+ or NADP+ as acceptor 0.6 4.5 3.4 1.8 4.7 3.0 1.8 9.6 With unknown physiological acceptors 0.2 0.5 0.3 0.5 0.7 0.9 0.6 0.2 2.0 With NAD+ or NADP+ as acceptor 0.0 0.1 0.3 0.5 0.4 0.3 0.6 0.4 0.1 14.2 With oxygen as acceptor 0.0 0.1 0.1 0.1 0.2 0.0 0.3 0.1 0.1 0.1 2.8 With a disulide as acceptor 0.0 0.0 0.0 0.0 0.2 0.1 0.5 0.1 0.2 0.2 0.2 ∞ With oxygen as acceptor 0.2 0.0 0.0 0.1 0.1 0.2 0.3 0.3 0.1 0.1 0.2 0.1 3.5 With NAD+ or NADP+ as acceptor 0.0 0.1 0.0 0.0 0.0 0.1 0.7 0.1 0.0 0.0 0.3 0.2 0.3 6.7 With unknown physiological acceptors 1.9.3.- 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.1 0.2 0.3 0.1 0.1 0.1 5.3 With oxygen as acceptor 1.13.11.- 0.0 0.1 0.0 0.0 0.0 0.1 0.5 0.4 0.6 0.0 1.2 0.5 0.4 17.5 With incorporation of 2 atoms of oxygen 1.13.12.- 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.3 0.3 0.1 0.5 0.2 0.2 ∞ 1.14.11.- 0.0 0.0 0.0 0.1 0.0 0.1 0.2 0.1 0.3 0.0 0.2 0.1 0.1 4.3 With 2-oxoglutarate as 1 donor, and incorporation of 1 atom each of oxygen into both donors 1.14.12.- 0.2 0.0 0.0 0.0 0.1 0.1 0.2 0.0 0.2 0.4 0.4 0.2 0.2 4.5 With NADh or NADPh as 1 donor, and incorporation of 2 atoms of oxygen into 1 donor 1.14.13.- 0.5 0.5 0.1 0.0 0.3 0.3 0.6 1.0 3.0 0.6 1.7 1.4 1.0 4.9 With NAD or NADh as 1 donor, and incorporation of 1 atom of oxygen 1.14.99.- 0.1 0.0 0.0 0.0 0.0 0.1 0.0 0.1 0.3 0.3 0.4 0.2 0.2 9.2 Miscellaneous 1.17.7.- 0.0 0.1 0.0 0.0 0.0 0.1 0.1 0.1 0.1 0.0 0.0 0.1 0.0 2.1 With an iron-sulfur protein as acceptor 1.18.1.- 0.1 0.0 0.2 0.0 0.1 0.1 0.2 0.2 0.4 0.4 0.6 0.4 0.2 5.1 With NAD+ or NADP+ as acceptor 2.7.11.- 0.4 0.0 0.2 0.3 0.2 0.2 0.2 0.8 1.1 0.6 1.3 0.8 0.4 3.6 Protein-serine/threonine kinases 2.8.3.- 0.1 0.2 0.0 0.0 0.1 0.1 0.2 0.7 0.5 0.6 1.4 0.7 0.4 7.7 CoA-transferases With incorporation of 1 atom of oxygen (internal monooxygenases or internal mixed function oxidases) Bioinformatics and Biology Insights Carels et al Table 6. (Continued) ECNO. SS hS EF SO AVERAGE GC3,% 22.9 28.3 29.8 37.7 2.10.1.- 0.0 0.1 0.0 0.0 0.0 3.1.6.- 0.0 0.2 0.0 0.1 3.3.2.- 0.1 0.0 0.1 4.3.3.- 0.0 0.0 5.1.99.- 0.0 5.3.3.- SD SM MM GP CT RO AVERAGE SD AVGCR/ AVGCPa CLASS OF ENZYME FUNCTION 72.5 78.0 81.3 83.3 83.5 0.1 0.1 0.1 0.1 0.0 0.0 0.1 0.0 2.1 Molybdenumtransferases or tungstentransferases with sulide groups as acceptors 0.1 0.1 0.2 0.3 0.1 0.1 0.4 0.2 0.1 2.4 Sulfuric ester hydrolases 0.0 0.1 0.1 0.1 0.4 0.2 0.1 0.4 0.3 0.2 5.1 Ether hydrolases 0.1 0.0 0.0 0.0 0.3 0.0 0.0 0.1 0.0 0.1 0.1 4.1 Amine-lyases 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.3 0.1 0.3 0.2 0.1 ∞ Acting on other compounds 0.1 0.0 0.1 0.1 0.1 0.1 0.3 0.4 0.3 0.1 0.2 0.3 0.1 3.2 Transposing C=C bonds 5.5.1.- 0.0 0.1 0.0 0.1 0.1 0.1 0.2 0.1 0.2 0.1 0.1 0.2 0.0 2.4 Miscellaneous 6.2.1.- 0.6 0.5 0.5 0.1 0.4 0.2 0.6 2.6 1.9 0.6 3.6 1.9 1.3 4.3 Miscellaneous Abbreviations: Ct, Corynebacterium terpenotabidum; EC no., Enzyme Commission number; Ef, Enterococcus faecalis; Gp, Gordonia polyisoprenivorans; hs, Haemophilus somnus; Mm, Mycobacterium marinum; Ro, Rhodococcus opacus; Sm, Stenotrophomonas maltophilia; So, Streptococcus oralis; Ss, Staphylococcus saprophyticus. aFactor difference where Av GCr is for average of GC-rich and AvGCp is for average of GC-poor. The shading regions in Table 6 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes. 11 12 Table 7. Relative frequency (%) of enzymatic functions in GC-poor and GC-rich bacterial species found in the triatomine digestive tract according to the 4 EC number digits. ECNO. SS hS EF SO AVERAGE GC3,% 22.9 28.3 29.8 37.7 1.1.99.1 1.0 0.0 0.0 0.0 0.3 1.2.1.2 4.0 1.0 1.0 0.0 1.2.1.3 3.0 0.0 1.0 1.2.1.7 1.0 1.0 1.2.1.8 1.0 1.2.1.10 SD SM MM GP CT RO AVERAGE SD AVGCR/ AVGCPa CLASS OF ENZYME FUNCTION 72.5 78.0 81.3 83.3 83.5 0.5 2.0 1.0 1.0 1.0 2.0 1.4 0.5 5.6 Choline dehydrogenase 1.5 1.7 7.0 8.0 7.0 2.0 9.0 6.6 2.7 4.4 Formate dehydrogenase 0.0 1.0 1.4 3.0 4.0 5.0 2.0 14.0 5.6 4.8 5.6 Aldehyde dehydrogenase (NAD+) 0.0 0.0 0.5 0.6 1.0 3.0 2.0 2.0 1.0 1.8 0.8 3.6 Benzaldehyde dehydrogenase (NADP+) 0.0 0.0 0.0 0.3 0.5 5.0 0.0 1.0 1.0 2.0 1.8 1.9 7.2 Betaine-aldehyde dehydrogenase 0.0 0.0 3.0 0.0 0.8 1.5 1.0 1.0 1.0 0.0 5.0 1.6 1.9 2.1 Acetaldehyde dehydrogenase (acetylating) 1.2.1.16 1.0 1.0 0.0 0.0 0.5 0.6 1.0 1.0 2.0 1.0 2.0 1.4 0.5 2.8 Glyceraldehyde-3-phosphate dehydrogenase (NADP+) (phosphorylating) 1.2.1.27 0.0 1.0 0.0 0.0 0.3 0.5 1.0 1.0 3.0 0.0 5.0 2.0 2.0 8.0 Methylmalonate-semialdehyde dehydrogenase (acylating) 1.2.1.38 1.0 0.0 0.0 0.0 0.3 0.5 1.0 0.0 1.0 1.0 1.0 0.8 0.4 3.2 N-acetyl-g-glutamyl-phosphate reductase 1.2.1.70 1.0 1.0 0.0 0.0 0.5 0.6 1.0 1.0 1.0 1.0 1.0 1.0 0.0 2.0 Glutamyl-tRNA reductase 1.3.99.1 2.0 3.0 1.0 0.0 1.5 1.3 8.0 6.0 9.0 2.0 7.0 6.4 2.7 4.3 Succinate dehydrogenase 1.4.1.1 3.0 0.0 1.0 0.0 1.0 1.4 3.0 4.0 6.0 4.0 8.0 5.0 2.0 5.0 Alanine dehydrogenase 1.4.1.2 2.0 0.0 0.0 0.0 0.5 1.0 2.0 1.0 0.0 0.0 6.0 1.8 2.5 3.6 Glutamate dehydrogenase 1.4.1.13 2.0 0.0 1.0 0.0 0.8 1.0 3.0 3.0 5.0 2.0 6.0 3.8 1.6 5.1 Glutamate synthase (NADPh) 1.4.3.1 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 1.0 2.0 3.0 1.6 0.9 ∞ D-aspartate 1.4.4.2 2.0 0.0 0.0 0.0 0.5 1.0 2.0 3.0 0.0 2.0 3.0 2.0 1.2 4.0 Glycine dehydrogenase (decarboxylating) 1.5.3.1 0.0 0.0 0.0 0.0 0.0 0.0 4.0 1.0 5.0 1.0 5.0 3.2 2.0 ∞ Sarcosine oxidase 1.6.1.2 0.0 2.0 0.0 0.0 0.5 1.0 2.0 2.0 3.0 1.0 3.0 2.2 0.8 4.4 NAD(P)+ transhydrogenase (Re/Si-speciic) 1.7.99.4 0.0 1.0 0.0 0.0 0.3 0.5 12.0 1.0 0.0 0.0 6.0 3.8 5.2 15.2 Nitrate reductase 1.13.11.2 0.0 1.0 0.0 0.0 0.3 0.5 3.0 4.0 2.0 0.0 11.0 4.0 4.2 16.0 Catechol 2,3-dioxygenase 1.13.11.24 0.0 1.0 0.0 0.0 0.3 0.5 2.0 2.0 0.0 0.0 3.0 1.4 1.3 5.6 quercetin 2,3-dioxygenase 1.14.12.1 2.0 0.0 0.0 0.0 0.5 1.0 3.0 0.0 1.0 1.0 3.0 1.6 1.3 3.2 Anthranilate 1,2-dioxygenase (deaminating, decarboxylating) 1.14.13.1 1.0 0.0 0.0 0.0 0.3 0.5 5.0 1.0 4.0 1.0 6.0 3.4 2.3 13.6 1.14.13.8 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 4.0 1.0 3.0 2.0 1.4 ∞ Dimethyl aniline monooxygenase (N-oxide-forming) 1.14.99.3 1.0 0.0 0.0 0.0 0.3 0.5 0.0 1.0 2.0 2.0 2.0 1.4 0.9 5.6 heme oxygenase (decyclizing) oxidase Bioinformatics and Biology Insights Salicylate 1-monooxygenase Carels et al Table 7. (Continued) ECNO. SS hS EF SO AVERAGE GC3,% 22.9 28.3 29.8 37.7 1.17.7.1 0.0 1.0 0.0 0.0 0.3 1.18.1.2 1.0 0.0 2.0 0.0 1.18.1.3 0.0 0.0 0.0 2.7.11.1 1.0 0.0 2.8.3.1 0.0 2.10.1.1 SD SM MM GP CT RO AVERAGE SD AVGCR/ AVGCPa CLASS OF ENZYME FUNCTION 72.5 78.0 81.3 83.3 83.5 0.5 1.0 2.0 1.0 0.0 1.0 1.0 0.7 4.0 4-hydroxy-3-methylbut-2-en-1-yldiphosphate synthase 0.8 1.0 2.0 1.0 3.0 2.0 5.0 2.6 1.5 3.5 Ferredoxin—NADP+ reductase 0.0 0.0 0.0 1.0 1.0 1.0 1.0 6.0 2.0 2.2 ∞ Ferredoxin—NAD+ reductase 1.0 2.0 1.0 0.8 2.0 12.0 12.0 4.0 29.0 11.8 10.6 11.8 Nonspeciic serine/threonine protein kinase 0.0 0.0 0.0 0.0 0.0 1.0 9.0 4.0 1.0 17.0 6.4 6.8 ∞ Propionate CoA-transferase 0.0 1.0 0.0 0.0 0.3 0.5 1.0 2.0 1.0 0.0 1.0 1.0 0.7 4.0 Molybdopterin molybdotransferase 3.1.6.1 0.0 1.0 0.0 1.0 0.5 0.6 2.0 5.0 0.0 1.0 7.0 3.0 2.9 6.0 Arylsulfatase 3.3.2.1 1.0 0.0 1.0 0.0 0.5 0.6 1.0 4.0 1.0 0.0 1.0 1.4 1.5 2.8 Isochorismatase 4.3.3.7 0.0 0.0 1.0 0.0 0.3 0.5 5.0 0.0 0.0 1.0 0.0 1.2 2.2 4.8 4-hydroxy-tetrahydrodipicolinate synthase 6.2.1.1 2.0 0.0 0.0 0.0 0.5 1.0 2.0 2.0 6.0 1.0 15.0 5.2 5.8 10.4 6.2.1.3 1.0 1.0 1.0 0.0 0.8 0.5 4.0 5.0 6.0 2.0 9.0 5.2 2.6 6.9 Long-chain-fatty-acid-CoA ligase 6.2.1.26 0.0 1.0 1.0 0.0 0.5 0.6 1.0 1.0 1.0 0.0 3.0 1.2 1.1 2.4 o-Succinylbenzoate-CoA ligase Acetate-CoA ligase Abbreviations: Ct, Corynebacterium terpenotabidum; EC no., Enzyme Commission number; Ef, Enterococcus faecalis; Gp, Gordonia polyisoprenivorans; hs, Haemophilus somnus; Mm, Mycobacterium marinum; Ro, Rhodococcus opacus; Sm, Stenotrophomonas maltophilia; So, Streptococcus oralis; Ss, Staphylococcus saprophyticus. aFactor difference where Av GCr is for average of GC-rich and AvGCp is for average of GC-poor. The shading regions in Table 7 is to improve the contrast between GC-rich (gray) and GC-poor (white) genomes. 13 Bioinformatics and Biology Insights 14 Table 8. Enzyme reactions of Table 7 classiied by decreasing AvGCr/AvGCp. AVGCR/AVGCPa S. NO. EC NO. ENZYMATIC FUNCTION 10 to 16 1 1.13.11.2 Catechol 2,3-dioxygenase 2 1.7.99.4 Nitrate reductase 3 1.14.13.1 Salicylate 1-monooxygenase 4 2.7.11.1 Nonspeciic serine/threonine protein kinase 5 6.2.1.1 Acetate—CoA ligase 1 1.2.1.27 Methylmalonate-semialdehyde dehydrogenase 2 1.2.1.8 Betaine-aldehyde dehydrogenase 3 6.2.1.3 Long-chain-fatty-acid-CoA ligase 4 3.1.6.1 Arylsulfatase 5 1.1.99.1 Choline dehydrogenase 6 1.2.1.3 Aldehyde dehydrogenases 7 1.13.11.24 quercetin 2,3-dioxygenase 8 1.14.99.3 heme oxygenase—biliverdin-producing 9 1.4.1.13 Glutamate synthase—NADPh 10 1.4.1.1 Alanine dehydrogenase 1 4.3.3.7 4-hydroxy-tetrahydrodipicolinate synthase 2 1.2.1.2 Formate dehydrogenase 3 1.6.1.2 NAD(P)+ transhydrogenase—Re/Si-speciic 4 1.3.99.1 Succinate dehydrogenase 5 1.4.4.2 Glycine dehydrogenase—aminomethyl-transferring 6 1.17.7.1 Cytidine diphosphate-4-dehydro-6-deoxyglucose reductase 1 2.10.1.1 Molybdopterin molybdotransferase 2 1.2.1.7 Benzaldehyde dehydrogenase 3 1.4.1.2 Glutamate dehydrogenase 4 1.18.1.2 Ferredoxin—NADP+ reductase 5 1.2.1.38 N-acetyl-g-glutamyl-phosphate reductase 6 3.3.2.1 Isochorismatase 7 6.2.1.26 o-Succinylbenzoate—CoA ligase 8 1.2.1.10 Acetaldehyde dehydrogenase 9 1.2.1.70 Glutamyl-tRNA reductase 1 1.4.3.1 D-Aspartate 2 1.5.3.1 Sarcosine oxidase 3 1.14.13.8 Flavin-containing monooxygenase 4 1.18.1.3 Ferredoxin—NAD+ reductase 5 2.8.3.1 Propionate CoA-transferase 5 to <10 4 to <5 2 to <4 <2 Abbreviation: EC no., Enzyme Commission number. aFactor difference where Av GCr is for average of GC-rich and AvGCp is for average of GC-poor. oxidase Carels et al 15 with KEGG for the EC number list of Table 6. In contrast, 19 715 significant homologies (best hit) were found among the putative 35 105 cORFs with the representative species of the genera of GC-poor bacteria reported by Gumiel et al.8 Among the 19 715 sequences, 1424 (7.2%) had significant homologies with KEGG for the EC number list of Table 6. Because the sample sizes are very different, one must be concerned with a statistical consistency of the factor ~2.7 (close to the theoretical value of 3.3 found with the model) of overrepresented enzymes (Tables 6 and 7) observed in GC-rich compared with GC-poor bacteria. The Z test applied to the comparison of 2 proportions allows the formal conclusion that the null hypothesis of proportion equality must be rejected because Zobs (19) > Zth (1.96). Thus, despite a bias introduced by LB fermentation is favorable for the growth of GC-poor bacteria, the conclusion from the shotgun DNA sequencing is that overrepresented enzymatic activities are more frequent, in relative terms, in a medium fermented by GC-rich than in a medium fermented by GC-poor bacteria, as suggested by the model analysis based on complete genome sequences available from National Center for Biotechnology Information (NCBI). Discussion A recent investigation compared the microbiota of DTT in the presence or absence of T cruzi.24 Globally, it showed a predominance of GC-rich bacterial species (without considering the intracellular endosymbiont Arsenophonus)25 as previously described7,8 except for Staphylococcus which predominated in some individuals of P megistus and T infestans. However, the results of Díaz et al24 may be equivocal because the V3-V4 hypervariable region of 16S rDNA produces a less accurate quantitative description than with the V1-V3 region used by Gumiel et al,8 particularly for Staphylococcus. Also, with the V3-V4 region, differences between expected and observed frequencies of 10 to 300 times were reported by Zheng et al26 for this latter genus, whereas the measure obtained using V1-V3 region was close to the expected value. The GC level of a genome is an interesting variable to consider because it is robust in the sense that it is expected to be globally conserved at the level of the family rank.13 Thus, if one GC-poor bacterial species is present in one family, there is a major likelihood that another species of that family will also be GC-poor. The same reasoning also applies to GC-rich organisms. However, in special situations, such as in endosymbiosis where the selective constraints are not those normally encountered by the members of the family, the above tendency is violated because endosymbionts are generally GC-poor, independent of the family they belong to. The GC-poor trait of endosymbionts may be due to an evolutionary convergence induced by the peculiar constraints imposed by the intracellular environment although this is debateable.27 The fact that the luminal environment of DTT in which T cruzi thrives is very Figure 1. Relative frequency of shotgun sequences associated with bacteria found in the digestive tract of triatomines. (A) Size of reads obtained by 454 Titanium technology, (B) contig size after successive read assembling with Velvet and CAP3, (C) size of ORFs extracted from read contigs, and (D) size of coding ORFs after UFM iltering. cORFs indicate coding open reading frames; ORFs, open reading frames. 16 different compared with that of intestinal epithelial cells was sufficient to eliminate the endosymbionts, Arsenophonus, Wolbachia, Candidatus, and Rohrkolberia from the present analysis. Predominant bacterial species are GC-rich,8 which raises the question of whether a cause and effect relationship exists between a bacterial species being GC-rich and its growth success28,29 in the DTT environment. In fact, the positive correlation between GC3 and genome size suggests that, in the DTT, bacterial species with GC-rich genomes have a potentially more complex metabolism than those with GC-poor genomes, which would be an advantage for the bacteria in this niche. The most significant differences that we found between the bacterial groups were due to oxidoreductase enzymes that are much more numerous in GC-rich than GC-poor bacteria and seem to confer a metabolic advantage to GC-rich bacteria in an environment such as blood, in particular, nitrate reductases and oxygenases that are common in GC-rich bacteria. In the enzymatic reaction involving nitrate reductases (EC 1.7.99.4, KEGG map 910),30 the electron transport system is similar to that of aerobic respiration.31,32 It can be complemented by vitamin K to generate the energy required to survive in anaerobic conditions.33 Oxygenases are enzymes that oxidize a substrate by the transference of gaseous oxygen. Dioxygenases transfer both oxygen atoms of O2 into the substrate,34 whereas monooxygenases, such as phenolases (cytochrome P450 oxidases), incorporate only one atom of molecular oxygen into a substrate, such as phenols, and the other atom is reduced to H2O.35 Oxygenases are usual in soil bacteria because oxygen reactivity plays important roles in the degradation of complex substrates. In particular, ringcleaving dioxygenases catalyze key reactions in the aerobic microbial degradation of aromatic compounds. Many pathways converge to catecholic intermediates. An example of the degradadation of a complex substrate is catechol 2,3-dioxygenases (EC 1.13.11.2) that catalyzes the opening of the benzene ring (KEGG maps 361, 362, 622, 643) and converts catechol into semialdehyde (OHC-R-COOH).36,37 Ring-cleaving dioxygenases that are active toward ring compounds belong to the cupin superfamily. Cupin-type dioxygenases also involve quercetinases (flavonol 2,4-dioxygenases), which open up 2 C-C bonds of the heterocyclic ring of quercetin, a widespread plant flavonol.38 In GC-rich bacteria, several other enzymes involved in ring modification or heteroatom oxidation are also available such as (1) arylsulfatases (EC 3.1.6.1), (2) benzaldehyde dehydrogenases—NADP+ (EC 1.2.1.7),39 and (3) flavin-containing monooxygenases (EC 1.14.13.8), which can oxidize a wide array of heteroatoms, particularly soft nucleophiles, such as amines, sulfides, and phosphites from xeno-substrates, with no common structural features, to facilitate their excretion.40,41 In the DTT, oxygenases have been shown to allow the access to iron of bacteria that encode that enzymatic system via hemoglobin degradation with heme oxygenases—biliverdin-producing (EC 1.14.99.3).42 Heme oxygenase is an enzyme that Bioinformatics and Biology Insights catalyzes the degradation of heme and produces biliverdin, iron, and carbon monoxide.43–45 Biliverdin is subsequently converted to bilirubin by biliverdin reductases. Iron is an essential nutrient required for the survival of most bacteria.46 Bioavailability of iron in many environments such as soil or sea is limited by the very low solubility of the Fe3+ ion. Microbes release siderophores to scavenge iron from these mineral phases by formation of soluble Fe3+ complexes that can be taken up by active transport mechanisms. Many siderophores are nonribosomal peptides,47,48 although several are biosynthesized independently. Some pathogenic bacteria, such as S marcescens, can use heme and hemoproteins as iron sources, independently of siderophore production, by mechanisms involving outer membrane heme-binding proteins and heme transport systems.49,50 The iron-binding protein, transferrin, produces a marked increase in S marcescens hemolytic activity.51 The levels of extracellular iron available within a host are limited, with most of the free iron being complexed to high-affinity binding proteins such as transferrin. To circumvent this low iron availability, pathogens have developed sophisticated mechanisms to use the host’s iron-containing and heme-containing proteins. The mechanism by which gram-positive bacteria, such as Corynebacterium diphtheriae, acquire heme is similar to the heme transport with siderophore52 and involves iron-chelating molecules excreted in the bacterial environment. Once the heme has been transported across the outer membrane and is localized within the cytoplasm, it is degraded by heme oxygenase.53,54 In contrast to oxygenases, oxidases (that reduce molecular oxygen to hydrogen peroxide or to water) and dehydrogenases (by transferring hydrogen from one substance to another) are mainly, if not exclusively, involved in energy metabolism. Many of the hydrogenases predominating in GC-rich bacteria are involved in many different central pathways such as glycolysis, the tricarboxylic acid cycle and oxidative phosphorylation that are essential to cell success in their environment. As shown by Unrean and Srienc,55 a “cell system has a natural tendency to evolve with time towards an asymptotic state with maximum rate of entropy production.” In addition, De Martino et al56 showed that “growth rate can be explained in terms of a trade-off between the higher fitness of fast-growing phenotypes and the higher entropy of slow-growing ones.” From the results of this article and those of Unrean and Srienc55 and De Martino et al,56 it can be deduced that the success of GC-rich compared with GC-poor bacteria in the DTT is due to their enhanced ability to metabolize chemically complex substrates. The higher entropy of their metabolic networks may at least result from the predominance of hydrogenase functions in central metabolic pathways such as those for amino acid and nucleotide metabolism. In parallel with these increases in enzymatic functions, a conserved set of CoA enzymes was also found to be predominant in GC-rich bacteria and involved in different pathways such as the synthesis of chemical bond between large molecules (EC 6.2.1.1),57 toxic compound Carels et al degradation (EC 2.8.3.1, EC 1.2.1.10),58 and fatty acid (EC 6.2.1.3) and amino acid (EC 4.3.3.7)59 metabolism. The higher metabolic activity found in GC-rich bacteria suggests that signaling proteins should also be significantly increased. Indeed, a large difference was found for the nonspecific serine/ threonine protein kinases (EC 2.7.11.1), which belong to the family of transferases, specifically protein-serine/threonine kinases. These enzymes transfer phosphates to the oxygen atom of a serine or threonine side chain in proteins. This process is called phosphorylation and is known to regulate most of the cellular pathways, especially those involved in signal transduction.60 In agreement with De Martino et al,56 the size inversion of GC-rich and GC-poor bacterial population found in the shotgun sequencing analysis is not surprising because the rich LB medium is more favorable for fast-growing bacteria with small genomes and less enzymatic abilities (lower metabolic network entropy). During experiments in this study, to have sufficient DNA, it was necessary to amplify it for sequencing and bacterial culturing were necessary steps, but, of course, at a cost of a bias. The population bias favored GC-poor bacteria and demonstrates the importance of using culture-independent techniques for in situ microbiota investigation. In this respect, the strategy of describing the microbiota composition by 16S rDNA sequencing prior to any further metagenomic description is surely the best, provided that the complete genome sequences for the metagenomes investigated are available. Thus, complete genome sequences allow the construction of a model suitable to determine what can be reasonably expected from the present experiments. Despite its bias, the shotgun analysis undertaken shows that the inferences proposed through the present model are still relevant. Therefore, the species believed to be representative of their respective genera are indeed representative in the context of this work because the proportion of predominant enzymes in the experiments is similar to that of the model. Shotgun sequencing of microbiota is expensive, and the large amount of data provided can be difficult to analyze, especially when a eukaryote vector and blood meal source are involved as most of the sequences come from the host gut and not from the rare microbiota it contains.61 For instance, the genome of Rhodnius prolixus RproC1 was predicted to be about 733 Mb, whereas the average size of each bacterial genome sequence of its digestive tract is only about 4 to 5 Mb.62 Another limitation of shotgun sequencing is that the information it provides on the composition of a microbiota depends on a reference set of microbial genomes which is still only small, typically in the range of few thousand genomes.63 In contrast, large numbers of 16S rDNA gene sequences are available for comparative analyses. For example, the RDP Release 11.5 of September 30, 2016 consisted of 3 356 809 aligned and annotated 16S rDNA sequences (http://rdp.cme.msu.edu/). The 1550 bp of the 16S gene consist of 8 highly conserved regions (U1-U8) and 9 variable regions across the bacterial domain.64 Identifying the organisms populating a microbial 17 community and their relative abundances is the typical primary objective of investigations based on 16S rDNA amplicon characterization. A similarity comparison of 16S gene sequences is usually used as the gold standard for taxonomic identification at least at the genus level.65 The characterization of 16S amplicons by DGGE has been a useful technique for rapid assessment of the composition of DTT microbiota and is particularly suitable for a first-pass comparison of multiple samples.61 However, sequencing the 16S gene is currently the most common approach used in microbial classification.66 The application of next-generation sequencing to microbial ecology has shown that the diversity in microbial populations is significantly higher than previously estimated by traditional culture-based and conventional molecular methods.67 New technologies of DNA microarray (PhyloChip, Second Genome Inc, South San Francisco, USA) are now supporting microbiota investigations by 16S rDNA classification and offer the benefit of simultaneous detection of thousands of genes in a single shot.68,69 Core genes enriched for housekeeping functions are also used to enrich classification based on 16S rDNA and to improve the resolution of microbial community structure.70 Conclusions The qualitative and quantitative description of a microbiota, as adapted from Gumiel et al,8 is more precise than a blind metagenomic analysis by DNA shotgun as long as complete genome sequences exist for the bacterial genera diagnosed by 16S rDNA. This is precisely the case in the present investigation as most of the genomes in the list of bacterial species identified by 16S rDNA sequencing by Gumiel et al8 had companion species effectively sequenced that can be downloaded from the NCBI server. The most striking differences in overrepresented enzymatic functions that are found in GC-rich bacteria (prominent in the colonization of the triatomine digestive tract compared with GC-poor ones) are for the most part due to oxidoreductases. We conclude that this group of enzymatic functions allows GC-rich bacteria to outcompete GC-poor ones in an environment where the fermentation of a medium such as fresh blood may need some specific metabolic activities such as iron recycling and oxygen management. In such a context, GC-rich bacteria would have a comparative advantage in the colonization of their environment, thanks to their more complex enzymatic apparatus, however, at the cost of a larger genome that is slower to replicate. In consequence, invertebrate vectors are valuable systems in which to study the properties that may favor one particular microbial community as opposed to another.71 Acknowledgements The authors thank Norman A Ratcliffe for reading and editing the manuscript. Author Contributions NC, MG, FFdM, CJdCM, and PA conceived and designed the experiments. NC analyzed the data and wrote the first draft of Bioinformatics and Biology Insights 18 the manuscript. PA contributed to the writing of the manuscript. All authors reviewed and approved the final manuscript. Disclosures and Ethics As a requirement of publication, author(s) have provided to the publisher signed confirmation of compliance with legal and ethical obligations including but not limited to the following: authorship and contributorship, conflicts of interest, privacy and confidentiality, and (where applicable) protection of human and animal research subjects. The authors have read and confirmed their agreement with the ICMJE authorship and conflict of interest criteria. The authors have also confirmed that this article is unique and not under consideration or published in any other publication, and that they have permission from rights holders to reproduce any copyrighted material. Any disclosures are made in this section. The external blind peer reviewers report no conflicts of interest. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. World Health Organization. WHO media centre: Chagas disease (American trypanosomiasis). http://www.who.int/mediacentre/factsheets/fs340/en/. Published 2014. Accessed January 9, 2015. Moraes CB, Giardini MA, Kim H, et al. Nitroheterocyclic compounds are more eicacious than CYP51 inhibitors against Trypanosoma cruzi: implications for Chagas disease drug discovery and development. Sci Rep. 2014;4:4703. Quijano-Hernandez I, Dumonteil E. Advances and challenges towards a vaccine against Chagas disease. Hum Vaccin. 2011;7:1184–1191. Azambuja P, Feder D, Garcia ES. Isolation of Serratia marcescens in the midgut of Rhodnius prolixus: impact on the establishment of the parasite Trypanosoma cruzi in the vector. Exp Parasitol. 2004;107:89–96. Castro D, Morales C, Garcia E, Azambuja P. Inhibitory efects of d-mannose on trypanosomatid lysis induced by Serratia marcescens. Exp Parasitol. 2007;115:200–204. Garcia ES, Genta F, Azambuja P, Schaub GA. Interactions between intestinal compounds of triatomines and Trypanosoma cruzi. Trends Parasitol. 2010;26:499–505. da Mota FF, Marinho LP, Moreira CJC, et al. Cultivation-independent methods reveal diferences among bacterial gut microbiota in triatomine vectors of Chagas disease. PLoS Negl Trop Dis. 2012;6:e1631. Gumiel M, da Mota FF, Rizzo VS, et al. Characterization of the microbiota in the guts of Triatoma brasiliensis and Triatoma pseudomaculata infected by Trypanosoma cruzi in natural conditions using culture independent methods. Parasite Vector. 2015;8:245. Dennison NJ, Jupatanakul N, Dimopoulos G. he mosquito microbiota inluences vector competence for human pathogens. Curr Opin Insect Sci. 2014;3:6–13. Beard CB, Cordon-Rosales C, Durvasula RV. Bacterial symbionts of the triatominae and their potential use in control of Chagas disease transmission. Annu Rev Entomol. 2002;47:123–141. Durvasula RV, Sundaram RK, Kirsch P, et al. Genetic transformation of a corynebacterial symbiont from the Chagas disease vector Triatoma infestans. Exp Parasitol. 2008;119:94–98. Bentley SD, Parkhill J. Comparative genomic structure of prokaryotes. Annu Rev Genet. 2004;38:771–792. Takahashi M, Kryukov K, Saitou N. Estimation of bacterial species phylogeny through oligonucleotide frequency distances. Genomics. 2009;93:525–533. Bohlin J, Snipen L, Hardy SP, et al. Analysis of intra-genomic GC content homogeneity within prokaryotes. BMC Genomics. 2010;11:464. Lightield J, Fram NR, Ely B. Across bacterial phyla, distantly-related genomes with similar genomic GC content have similar patterns of amino acid usage. PLoS One. 2011;6:e17677. Wu H, Zhang Z, Hu S, Yu J. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct. 2012;7:2. Dias JCP, Ramos AN Jr, Gontijo ED, et al. II Consenso Brasileiro em doença de Chagas, 2015. Epidemiol Serv Saúde. 2016;25:7–86. Cuba CAC, Alvarenga NJ, Barretto AC, Marsden PD, Gama MP. Dipetalogaster maximus (Hemiptera, Triatominae) for xenodiagnosis of patients with serologically detectable Trypanosoma cruzi infection. Trans R Soc Trop Med Hyg. 1979;73:524–527. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. Gouy M, Delmotte S. Remote access to ACNUC nucleotide and protein sequence databases at PBIL. Biochimie. 2008;90:555–562. Zerbino DR, Birney E. Velvet: algorithms for de novo short reads assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. Huang X, Madan A. CAP3: a DNA sequence assembly program. Genome Res. 1999;9:868–877. Carels N, Frias D. Classifying coding DNA with nucleotide statistics. Bioinform Biol Insights. 2009;3:141–154. Carels N, Frias D. A statistical method without training step for the classiication of coding frame in transcriptome sequences. Bioinform Biol Insights. 2013;7:35–54. Díaz S, Villavicencio B, Correia N, Costa J, Haag KL. Triatomine bugs, their microbiota and Trypanosoma cruzi: asymmetric responses of bacteria to an infected blood meal. Parasite Vector. 2016;9:636. Hypsa V, Dale D. In vitro culture and phylogenetic analysis of “Candidatus Arsenophonus triatominarum,” an intracellular bacterium from the triatomine bug, Triatoma infestans. Int J Syst Bacteriol. 1997;47:1140–1144. Zheng W, Tsompana M, Ruscitto A, et al. An accurate and eicient experimental approach for characterization of the complex oral microbiota. Microbiome. 2015;3:48. Hildebrand F, Meyer A, Eyre-Walker A. Evidence of selection upon genomic GC-content in bacteria. PLoS Genet. 2010;6:e1001107. Ponce de Leon M, de Miranda A, Alvarez-Valin F, Carels N. he purine bias of coding sequences is determined by physicochemical constraints on proteins. Bioinform Biol Insights. 2014;8:93–108. Carels N, Ponce de Leon M. An interpretation of the ancestral codon from Miller’s amino acids and nucleotide correlations in modern coding sequences. Bioinform Biol Insights. 2015;9:37–47. Tavares P, Pereira AS, Moura JJ, Moura I. Metalloenzymes of the denitriication pathway. J Inorg Biochem. 2006;100:2087–2100. Chen J, Strous M. Denitriication and aerobic respiration, hybrid electron transport chains and co-evolution. Biochim Biophys Acta. 2013;1827:136–144. Segers FH, Kešnerová L, Kosoy M, Engel P. Genomic changes associated with the evolutionary transition of an insect gut symbiont into a blood-borne pathogen. ISME J. 2017;11:1232–1244. doi:10.1038/ismej.2016.201. Kwon O, Bhattacharyya DK, Meganathan R. Menaquinone (vitamin K2) biosynthesis: overexpression, puriication, and properties of o-succinylbenzoylcoenzyme A synthetase from Escherichia coli. J Bacteriol. 1996;178:6778–6781. Hayaishi O. An odyssey with oxygen. Biochem Bioph Res Co. 2005;338:2–6. Waterman MR. Professor Howard Mason and oxygen activation. Biochem Bioph Res Co. 2005;338:7–11. Junker F, Field JA, Bangerter F, et al. Dioxygenation and spontaneous deamination of 2-aminobenzene sulphonic acid in Alcaligenes sp. strain O-1 with subsequent meta ring cleavage and spontaneous desulphonation to 2-hydroxymuconic acid. Biochem J. 1994;300:429–436. Kobayashi S, Hayaishi O. Anthranilic acid conversion to catechol (Pseudomonas). Methods Enzymol. 1970;17:505–510. Fetzner S. Ring-cleaving dioxygenases with a cupin fold. Appl Environ Microb. 2012;78:2505–2514. Pastrorova I, de Koster CG, Boom JJ. Analytical study of free and ester bound benzoic and cinnamic acids of gum benzoin resins by GC-MS and HPLC-frit FAB-MS. Phytochem Anal. 1997;8:63–73. Ziegler D. An overview of the mechanism, substrate speciicities, and structure of FMOs. Drug Metab Rev. 2002;34:503–511. Eswaramoorthy S, Bonanno JB, Burley SK, Swaminathan S. Mechanism of action of a lavin-containing monooxygenase. Proc Natl Acad Sci U S A. 2006;103:9832–9837. Wilks A, Schmitt MP. Expression and characterization of a heme oxygenase (Hmu O) from Corynebacterium diphtheriae. Iron acquisition requires oxidative cleavage of the heme macrocycle. J Biol Chem. 1998;273:837–841. Kikuchi G, Yoshida T, Noguchi M. Heme oxygenase and heme degradation. Biochem Bioph Res Co. 2005;338:558–567. Ryter SW, Alam J, Choi AMK. Heme oxygenase-1/carbon monoxide: from basic science to therapeutic applications. Physiol Rev. 2006;86:583–650. Evans JP, Niemevz F, Buldain G, de Montellano PO. Isoporphyrin intermediate in heme oxygenase catalysis. Oxidation of alpha-meso-phenylheme. J Biol Chem. 2008;283:19530–19539. Wandersman C, Delepelaire P. Bacterial iron sources: from siderophores to hemophores. Annu Rev Microbiol. 2004;58:611–647. Hider RC, Kong X. Chemistry and biology of siderophores. Nat Prod Rep. 2010;27:637–657. Miethke M, Marahiel M. Siderophore-based iron acquisition and pathogen control. Microbiol Mol Biol Rev. 2007;71:413–451. Létofé S, Ghigo JM, Wandersman C. Iron acquisition from heme and hemoglobin by a Serratia marcescens extracellular protein. Proc Natl Acad Sci U S A. 1994;91:9876–9880. Létofé S, Ghigo JM, Wandersman C. Secretion of the Serratia marcescens HasA protein by an ABC transporter. J Bacteriol. 1994;176:5372–5377. Carels et al 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. Poole K, Braun V. Iron regulation of Serratia marcescens hemolysin gene expression. Infect Immun. 1988;56:2967–2971. Fukushima T, Allred BE, Sia AK, Nichiporuk R, Andersen UN, Raymond KN. Gram-positive siderophore-shuttle with iron-exchange from Fe-siderophore to apo-siderophore by Bacillus cereus YxeB. Proc Natl Acad Sci U S A. 2013;110:13821–13826. Matsui T, Furukawa M, Unno M, Tomita T, Ikeda-Saito M. Roles of distal Asp in heme oxygenase from Corynebacterium diphtheriae, HmuO: a water-driven oxygen activation mechanism. J Biol Chem. 2005;280:2981–2989. Lai W, Chen H, Matsui T, et al. Enzymatic ring-opening mechanism of verdoheme by the heme oxygenase: a combined X-ray crystallography and QM/MM study. J Am Chem Soc. 2010;132:12960–12970. Unrean P, Srienc F. Metabolic networks evolve towards states of maximum entropy production. Metab Eng. 2011;13:666–673. De Martino D, Capuani F, De Martino A. Growth against entropy in bacterial metabolism: the phenotypic trade-of behind empirical growth rate distributions in E. coli. Phys Biol. 2016;13:036005. doi:10.1088/1478-3975/13/3/036005. Gardner JG, Grundy FJ, Henkin TM, Escalante-Semerena JC. Control of acetyl-coenzyme A synthetase (AcsA) activity by acetylation/deacetylation without NAD(+) involvement in Bacillus subtilis. J Bacteriol. 2006;188:5460–5468. Manjasetty BA, Powlowski J, Vrielink A. Crystal structure of a bifunctional aldolase-dehydrogenase: sequestering a reactive and volatile intermediate. Proc Natl Acad Sci U S A. 2003;100:6992–6997. Mirwaldt C, Korndörfer I, Huber R. he crystal structure of dihydrodipicolinate synthase from Escherichia coli at 2.5 A resolution. J Mol Biol. 1995;246:227–239. Wolanin PW, homason PA, Stock JB. Histidine protein kinases: key signal transducers outside the animal kingdom. Genome Biol. 2002;3:reviews3013. doi:10.1186/gb-2002-3-10-reviews3013. Fraher MH, O’Toole PW, Quigley EM. Techniques used to characterize the gut microbiota: a guide for the clinician. Nat Rev Gastroenterol Hepatol. 2012;9:312–322. 19 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. Mesquita RD, Vionette-Amaral RJ, Lowenberger C, et al. Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection. Proc Natl Acad Sci U S A. 2015;112:14936–14941. Markowitz VM, Chen IM, Palaniappan K, et al. IMG: the Integrated Microbial Genomes database and comparative analysis system. Nucleic Acids Res. 2012;40:D115–D122. Jonasson J, Olofsson M, Monstein HJ. Classiication, identiication and subtyping of bacteria based on pyrosequencing and signature matching of 16S rDNA fragments. APMIS. 2002;110:263–272. Clarridge JE III. Impact of 16S rRNA gene sequence analysis for identiication of bacteria on clinical microbiology and infectious diseases. Clin Microbiol Rev. 2004;17:840–862. Spiegelman D, Whissell G, Greer CW. A survey of the methods for the characterization of microbial consortia and communities. Can J Microbiol. 2005;51:355–386. Yun JH, Roh SW, Whon TW, et al. Insect gut bacterial diversity determined by environmental habitat, diet, developmental stage, and phylogeny of host. Appl Environ Microb. 2014;80:5254–5264. Gentry TJ, Wickham GS, Schadt CW, He Z, Zhou J. Microarray applications in microbial ecology research. Microb Ecol. 2006;52:159–175. DeSantis TZ, Brodie EL, Moberg JP, Zubieta IX, Piceno YM, Andersen GL. High-density universal 16S rRNA microarray analysis reveals broader diversity than typical clone library when sampling the environment. Microb Ecol. 2007;53:371–383. Segata N, Huttenhower C. Toward an eicient method of identifying core genes for evolutionary and functional microbial phylogenies. PLoS ONE. 2011;6:e24704. Newton ILG, Sheehan KB, Lee FJ, Horton MA, Hicks RD. Invertebrate systems for hypothesis-driven microbiome research. Microbiome Sci Med. 2013;1:1– 9. doi:10.2478/micsm-2013-0001.