The main topic of this dissertation involves cluster-based computing, specifically relating to co... more The main topic of this dissertation involves cluster-based computing, specifically relating to computations performed on Beowulf clusters. I have developed a light-weight library for dynamic interoperable message passing, called the InterCluster Interface (ICI). This library not only supports computations performed over multiple clusters that are running different Message Passing Interface (MPI) implementations, but also can be used independently of MPI. In addition I developed the Backtracking Framework (BkFr) that simplifies implementations of the parallel backtracking paradigm in the single cluster environment, and supports the extension of computations over multiple clusters. BkFr uses MPI for the intra-cluster communication and ICI for the inter-cluster communication. I have also developed a template-based library of programming modules that facilitate the introduction of the rapidly emerging message passing parallel computing paradigm in upper-division undergraduate courses. A...
The development of targeted treatment options for precision medicine is hampered by a slow and co... more The development of targeted treatment options for precision medicine is hampered by a slow and costly process of drug screening. While small molecule docking simulations are often applied in conjunction with cheminformatic methods to reduce the number of candidate molecules to be tested experimentally, the current approaches suffer from high false positive rates and are computationally expensive. Here, we present a novel in silico approach for drug discovery and repurposing, dubbed connectivity enhanced Structure Activity Relationship (ceSAR) that improves on current methods by combining docking and virtual screening approaches with pharmacogenomics and transcriptional signature connectivity analysis. ceSAR builds on the landmark LINCS library of transcriptional signatures of over 20,000 drug-like molecules and ~5,000 gene knock-downs (KDs) to connect small molecules and their potential targets. For a set of candidate molecules and specific target gene, candidate molecules are first...
Rapid progress in proteomics and large-scale profiling of biological systems at the protein level... more Rapid progress in proteomics and large-scale profiling of biological systems at the protein level necessitates the continued development of efficient computational tools for the analysis and interpretation of proteomics data. Here, we present the piNET server that facilitates integrated annotation, analysis and visualization of quantitative proteomics data, with emphasis on PTM networks and integration with the LINCS library of chemical and genetic perturbation signatures in order to provide further mechanistic and functional insights. The primary input for the server consists of a set of peptides or proteins, optionally with PTM sites, and their corresponding abundance values. Several interconnected workflows can be used to generate: (i) interactive graphs and tables providing comprehensive annotation and mapping between peptides and proteins with PTM sites; (ii) high resolution and interactive visualization for enzyme-substrate networks, including kinases and their phospho-peptide...
Journal of the American Medical Informatics Association
Objective To create an online resource that informs the public of COVID-19 outbreaks in their are... more Objective To create an online resource that informs the public of COVID-19 outbreaks in their area. Materials and Methods This R Shiny application aggregates data from multiple resources that track COVID-19 and visualizes them through an interactive, online dashboard. Results The web resource, called the COVID-19 Watcher, can be accessed at https://covid19watcher.research.cchmc.org/. It displays COVID-19 data from every county and 188 metropolitan areas in the U.S. Features include rankings of the worst affected areas and auto-generating plots that depict temporal changes in testing capacity, cases, and deaths. Discussion The Centers for Disease Control and Prevention (CDC) do not publish COVID-19 data for local municipalities, so it is critical that academic resources fill this void so the public can stay informed. The data used have limitations and likely underestimate the scale of the outbreak. Conclusions The COVID-19 Watcher can provide the public with real-time updates of outb...
There are only a few platforms that integrate multiple omics data types, bioinformatics tools, an... more There are only a few platforms that integrate multiple omics data types, bioinformatics tools, and interfaces for integrative analyses and visualization that do not require programming skills. Among these, iLINCS is unique in scope and versatility of the data provided and the analytics facilitated. iLINCS (http://ilincs.org) is an integrative web-based platform for analysis of omics data and signatures of cellular perturbations. The platform facilitates analysis of user-submitted omics signatures of diseases and cellular perturbations in the context of a large compendium of pre-computed signatures (>200,000), as well as mining and re-analysis of the large collection of omics datasets (>12,000), pre-computed signatures, and their connections. Analytics workflows driven by user-friendly interfaces enable users with only conceptual understanding of the analysis strategy to execute sophisticated analyses of omics signatures, such as systems biology analyses and interpretation of s...
ABSTRACTLarge proteomics data, including those generated by mass spectrometry, are being generate... more ABSTRACTLarge proteomics data, including those generated by mass spectrometry, are being generated to characterize biological systems at the protein level. Computational methods and tools to identify and quantify peptides, proteins and post-translational modifications (PTMs) that are captured in modern mass spectrometers have matured over the years. On the other hand, tools for downstream analysis, interpretation and visualization of proteomics data sets, in particular those involving PTMs, require further improvement and integration to accelerate scientific discovery and maximize the impact of proteomics studies by connecting them better with biological knowledge across not only proteomics, but also other Omics domains. With the goal of addressing these challenges, the piNET server has been developed as a versatile web platform to facilitate mapping, annotation, analysis and visualization of peptide, PTM, and protein level quantitative data generated by either targeted, shotgun or ...
The Implicit Association Test (IAT) is widely used in psychology. Unfortunately, the IAT cannot b... more The Implicit Association Test (IAT) is widely used in psychology. Unfortunately, the IAT cannot be run within online surveys, requiring survey researchers to outsource them to third-parties. We introduce a novel method for constructing IATs using online surveys; we then empirically assess its validity. Study 1 (student n = 239) found good psychometric properties, expected IAT effects, and expected correlations with explicit measures for survey-based IATs. Study 2 (MTurk n = 818) found predicted IAT effects across four survey-based IATs (d’s = 0.82 [Black-White IAT] to 2.13 [insect-flower IAT]). Study 3 (MTurk n = 270) compared survey-based and IATs run via third-party software, yielding nearly identical results and intercorrelations expected for identical IATs. Survey-based IATs appear reliable and valid, offer numerous advantages, and make IATs accessible for online researchers. We present all materials, links to tutorials, and an open-source tool that rapidly automates survey-base...
The vast amount of RNA-seq data deposited in Gene Expression Omnibus (GEO) and Sequence Read Arch... more The vast amount of RNA-seq data deposited in Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) is still a grossly underutilized resource for biomedical research. To remove technical roadblocks for reusing these data, we have developed a web-application GREIN (GEO RNA-seq Experiments Interactive Navigator) which provides user-friendly interfaces to manipulate and analyze GEO RNA-seq data. GREIN is powered by the back-end computational pipeline for uniform processing of RNA-seq data and the large number (>6,500) of already processed datasets. The front-end user interfaces provide a wealth of user-analytics options including sub-setting and downloading processed data, interactive visualization, statistical power analyses, construction of differential gene expression signatures and their comprehensive functional characterization, and connectivity analysis with LINCS L1000 data. The combination of the massive amount of back-end data and front-end analytics options driven b...
The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program... more The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program that catalogs how human cells globally respond to chemical, genetic, and disease perturbations. Resources generated by LINCS include experimental and computational methods, visualization tools, molecular and imaging data, and signatures. By assembling an integrated picture of the range of responses of human cells exposed to many perturbations, the LINCS program aims to better understand human disease and to advance the development of new therapies. Perturbations under study include drugs, genetic perturbations, tissue micro-environments, antibodies, and disease-causing mutations. Responses to perturbations are measured by transcript profiling, mass spectrometry, cell imaging, and biochemical methods, among other assays. The LINCS program focuses on cellular physiology shared among tissues and cell types relevant to an array of diseases, including cancer, heart disease, and neurodegenera...
The main topic of this dissertation involves cluster-based computing, specifically relating to co... more The main topic of this dissertation involves cluster-based computing, specifically relating to computations performed on Beowulf clusters. I have developed a light-weight library for dynamic interoperable message passing, called the InterCluster Interface (ICI). This library not only supports computations performed over multiple clusters that are running different Message Passing Interface (MPI) implementations, but also can be used independently of MPI. In addition I developed the Backtracking Framework (BkFr) that simplifies implementations of the parallel backtracking paradigm in the single cluster environment, and supports the extension of computations over multiple clusters. BkFr uses MPI for the intra-cluster communication and ICI for the inter-cluster communication. I have also developed a template-based library of programming modules that facilitate the introduction of the rapidly emerging message passing parallel computing paradigm in upper-division undergraduate courses. A...
The development of targeted treatment options for precision medicine is hampered by a slow and co... more The development of targeted treatment options for precision medicine is hampered by a slow and costly process of drug screening. While small molecule docking simulations are often applied in conjunction with cheminformatic methods to reduce the number of candidate molecules to be tested experimentally, the current approaches suffer from high false positive rates and are computationally expensive. Here, we present a novel in silico approach for drug discovery and repurposing, dubbed connectivity enhanced Structure Activity Relationship (ceSAR) that improves on current methods by combining docking and virtual screening approaches with pharmacogenomics and transcriptional signature connectivity analysis. ceSAR builds on the landmark LINCS library of transcriptional signatures of over 20,000 drug-like molecules and ~5,000 gene knock-downs (KDs) to connect small molecules and their potential targets. For a set of candidate molecules and specific target gene, candidate molecules are first...
Rapid progress in proteomics and large-scale profiling of biological systems at the protein level... more Rapid progress in proteomics and large-scale profiling of biological systems at the protein level necessitates the continued development of efficient computational tools for the analysis and interpretation of proteomics data. Here, we present the piNET server that facilitates integrated annotation, analysis and visualization of quantitative proteomics data, with emphasis on PTM networks and integration with the LINCS library of chemical and genetic perturbation signatures in order to provide further mechanistic and functional insights. The primary input for the server consists of a set of peptides or proteins, optionally with PTM sites, and their corresponding abundance values. Several interconnected workflows can be used to generate: (i) interactive graphs and tables providing comprehensive annotation and mapping between peptides and proteins with PTM sites; (ii) high resolution and interactive visualization for enzyme-substrate networks, including kinases and their phospho-peptide...
Journal of the American Medical Informatics Association
Objective To create an online resource that informs the public of COVID-19 outbreaks in their are... more Objective To create an online resource that informs the public of COVID-19 outbreaks in their area. Materials and Methods This R Shiny application aggregates data from multiple resources that track COVID-19 and visualizes them through an interactive, online dashboard. Results The web resource, called the COVID-19 Watcher, can be accessed at https://covid19watcher.research.cchmc.org/. It displays COVID-19 data from every county and 188 metropolitan areas in the U.S. Features include rankings of the worst affected areas and auto-generating plots that depict temporal changes in testing capacity, cases, and deaths. Discussion The Centers for Disease Control and Prevention (CDC) do not publish COVID-19 data for local municipalities, so it is critical that academic resources fill this void so the public can stay informed. The data used have limitations and likely underestimate the scale of the outbreak. Conclusions The COVID-19 Watcher can provide the public with real-time updates of outb...
There are only a few platforms that integrate multiple omics data types, bioinformatics tools, an... more There are only a few platforms that integrate multiple omics data types, bioinformatics tools, and interfaces for integrative analyses and visualization that do not require programming skills. Among these, iLINCS is unique in scope and versatility of the data provided and the analytics facilitated. iLINCS (http://ilincs.org) is an integrative web-based platform for analysis of omics data and signatures of cellular perturbations. The platform facilitates analysis of user-submitted omics signatures of diseases and cellular perturbations in the context of a large compendium of pre-computed signatures (>200,000), as well as mining and re-analysis of the large collection of omics datasets (>12,000), pre-computed signatures, and their connections. Analytics workflows driven by user-friendly interfaces enable users with only conceptual understanding of the analysis strategy to execute sophisticated analyses of omics signatures, such as systems biology analyses and interpretation of s...
ABSTRACTLarge proteomics data, including those generated by mass spectrometry, are being generate... more ABSTRACTLarge proteomics data, including those generated by mass spectrometry, are being generated to characterize biological systems at the protein level. Computational methods and tools to identify and quantify peptides, proteins and post-translational modifications (PTMs) that are captured in modern mass spectrometers have matured over the years. On the other hand, tools for downstream analysis, interpretation and visualization of proteomics data sets, in particular those involving PTMs, require further improvement and integration to accelerate scientific discovery and maximize the impact of proteomics studies by connecting them better with biological knowledge across not only proteomics, but also other Omics domains. With the goal of addressing these challenges, the piNET server has been developed as a versatile web platform to facilitate mapping, annotation, analysis and visualization of peptide, PTM, and protein level quantitative data generated by either targeted, shotgun or ...
The Implicit Association Test (IAT) is widely used in psychology. Unfortunately, the IAT cannot b... more The Implicit Association Test (IAT) is widely used in psychology. Unfortunately, the IAT cannot be run within online surveys, requiring survey researchers to outsource them to third-parties. We introduce a novel method for constructing IATs using online surveys; we then empirically assess its validity. Study 1 (student n = 239) found good psychometric properties, expected IAT effects, and expected correlations with explicit measures for survey-based IATs. Study 2 (MTurk n = 818) found predicted IAT effects across four survey-based IATs (d’s = 0.82 [Black-White IAT] to 2.13 [insect-flower IAT]). Study 3 (MTurk n = 270) compared survey-based and IATs run via third-party software, yielding nearly identical results and intercorrelations expected for identical IATs. Survey-based IATs appear reliable and valid, offer numerous advantages, and make IATs accessible for online researchers. We present all materials, links to tutorials, and an open-source tool that rapidly automates survey-base...
The vast amount of RNA-seq data deposited in Gene Expression Omnibus (GEO) and Sequence Read Arch... more The vast amount of RNA-seq data deposited in Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) is still a grossly underutilized resource for biomedical research. To remove technical roadblocks for reusing these data, we have developed a web-application GREIN (GEO RNA-seq Experiments Interactive Navigator) which provides user-friendly interfaces to manipulate and analyze GEO RNA-seq data. GREIN is powered by the back-end computational pipeline for uniform processing of RNA-seq data and the large number (>6,500) of already processed datasets. The front-end user interfaces provide a wealth of user-analytics options including sub-setting and downloading processed data, interactive visualization, statistical power analyses, construction of differential gene expression signatures and their comprehensive functional characterization, and connectivity analysis with LINCS L1000 data. The combination of the massive amount of back-end data and front-end analytics options driven b...
The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program... more The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program that catalogs how human cells globally respond to chemical, genetic, and disease perturbations. Resources generated by LINCS include experimental and computational methods, visualization tools, molecular and imaging data, and signatures. By assembling an integrated picture of the range of responses of human cells exposed to many perturbations, the LINCS program aims to better understand human disease and to advance the development of new therapies. Perturbations under study include drugs, genetic perturbations, tissue micro-environments, antibodies, and disease-causing mutations. Responses to perturbations are measured by transcript profiling, mass spectrometry, cell imaging, and biochemical methods, among other assays. The LINCS program focuses on cellular physiology shared among tissues and cell types relevant to an array of diseases, including cancer, heart disease, and neurodegenera...
Uploads
Papers by Michal Kouril