ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

Explain your data by Concept Profile Analysis Web Services

[version 1; peer review: 2 approved with reservations]
PUBLISHED 25 Jul 2014
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioinformatics gateway.

Abstract

The Concept Profile Analysis technology (overlapping co-occurring concept sets based on knowledge contained in biomedical abstracts) has led to new biomedical discoveries, and users have been able to interact with concept profiles through the interactive tool “Anni” (http://biosemantics.org/anni). However, Anni provides no way for users to save their procedures, results, or related provenance. Here we present a new suite of Web Service operations that allows bioinformaticians to design and execute their own Concept Profile Analysis workflow, possibly as part of a larger bioinformatics analysis. The source code can be downloaded from ZENODO at http://www.dx.doi.org/10.5281/zenodo.10963.

Introduction

Concept Profile Analysis (CPA) has proven a powerful tool for interpreting and prioritizing results of bioinformatics analysis, and for linking data sets based on the best “educated guess” when precise links are not available. The technology uses the vector space model to relate concepts (such as genes and biological processes) mined from the literature to each other. Vectors can be compared efficiently and transparently1, and the model yields a measure of the strength of the relationship between concepts. We call these vectors “concept profiles”. The CPA algorithms have for example been successfully applied to compare microarray studies2, for predicting proteins putatively associated with muscular dystrophy pathways3, and for associating chemical structures with gene expression data4.

The standalone application Anni10 supports a number of standard CPA operations. For example, to perform pathway analysis for a gene expression experiment, a user first provides a list of gene database identifiers for the most significantly expressed genes. Anni uses these identifiers to query the concept profile database for the corresponding concept profiles, and subsequently constructs a “concept set” of these profiles. To match the list of genes with pathways, the user performs the operation “match concept sets” for the gene concept set with a predefined concept set of the category “Gene Ontology (GO) biological process”. Note that we refer here to GO concept profiles. The concept profile matching scores between the two concept sets are calculated by Anni, resulting in a ranked list of GO biological processes for the gene list. Finally, literature evidence in the form of documents containing co-mentions of the gene and biological processes can be retrieved by Anni from a supporting documents database, or from documents providing enough statistical evidence to support the gene-biological process associations without actually mentioning the gene and the biological process together in an abstract.

Here we present a new suite of Web Service (WS) operations that allows bioinformaticians to design and execute their own CPA workflow outside the Anni Web tool, possibly as part of a larger bioinformatics analysis. The WS was designed according to the outcome of an Anni usage analysis, where the common user and machine operations were identified.

Technical specifications

We implemented the CPA WS using Java, Model-View-Controller (MVC) Spring framework, and Apache Tomcat following the Java API for XML WS (JAX-WS) specifications. We compiled the Anni Java code for the different operations into separate libraries, for which wrappers were written in Java. Spring MVC was used as a WS interface to remote applications. The WS was implemented according to the JAX-WS standard, enabling an auto-generated WSDL specification and use of Java Annotations to specify operations. Apache Tomcat was used for deployment. The CPA WS uses a database of indexed PubMed records. The thesaurus behind the Anni Web application was converted to Simple Knowledge Organization System (SKOS), and the SKOS concept IDs were implemented as resolvable Unique Resource Identifiers leading to a Virtuoso Universal Server triple store.

User and machine operations as Taverna workflows

As an example on how to work with the CPA WS we implemented several workflows in the workflow management system Taverna workbench v 2.45 following the best practices for workflow design6. The whole suit of CPA workflows consists of 11 workflows collected in a myExperiment pack [http://www.myexperiment.org/packs/368]. These workflows are of two different types: 1) nine workflows calling one WS operation, and 2) two pipelines of nested workflows calling more than one WS operation. The workflows of type 1 are the building blocks to make pipelines of type 2, and were implemented with re-usability in mind.

Here we describe the workflow “Match concept profiles with predefined set” (Figure 1) in order to illustrate the design and use of the WS and workflows. The workflow invokes the WS operation “getSimilarConceptProfilesPredefined”. The operation takes three input parameters, which can be accessed using the XML splitter function in Taverna. The user specifies the concept(s) to be matched (“Query concept IDs”), the concept set to match against (“Match concept set”), and a cutoff number of matched concepts to return (“Cutoff”).

82830c00-6773-4846-bfd9-1cd74390b4dc_figure1.gif

Figure 1. Taverna workflow for matching concept(s) with a predefined set of concept profiles.

Blue boxes represent the workflow inputs and outputs, green box the WS invocation, and purple boxes the XML splitters for the inputs and outputs of the WS operation. The workflow is available at http://www.myexperiment.org/workflows/3396.

Opening the “Run workflow” window in Taverna will result in showing the structured annotations for the whole workflow and the input parameters, as well as the example values (Figure 2). WS functional annotations can be accessed via the “Details” tab in Taverna (Figure 3). When the workflow is run, it will produce a ranked list of concepts associated to the query concept(s), and their similarity scores.

82830c00-6773-4846-bfd9-1cd74390b4dc_figure2.gif

Figure 2. Taverna run window.

Detailed, structured descriptions for the whole workflow and its input parameters, with example values are shown in the window.

82830c00-6773-4846-bfd9-1cd74390b4dc_figure3.gif

Figure 3. Taverna details window.

A detailed description of the function of the WS operation is shown in the window.

The above described workflow executes the core functionality of concept profile matching. The other WS operations implement functionality such as explaining the association found (by listing the common concepts contributing most to the score) and showing the literature evidence (by retrieving the links to the abstracts in PubMed). Workflows implementing these WS can be coupled to the “Match concept profiles with predefined set” workflow to form a pipeline of nested workflows. Examples of such pipelines are the “GWAS to biomedical concept” nested workflow, which performs Single Nucleotide Polymorphism annotation (SNP), and the “Annotate gene list with top ranking concepts” nested workflow for gene annotation (Figure 4).

82830c00-6773-4846-bfd9-1cd74390b4dc_figure4.gif

Figure 4. Taverna nested workflow for gene annotation.

Blue boxes represent input and output parameters, purple boxes the local Taverna worker services, yellow boxes the Xpath services for fast XML parsing, and grey boxes the constant values. The workflow is available at http://www.myexperiment.org/workflows/3921.

Discussion

The CPA WS and workflows raise the level of reproducibility of bioinformatics experiments that make use of CPA compared to Anni, and the CPA WS can more easily be used together with other tools. For example, CPA-based SNP annotation can be performed with the CPA WS by coupling an external WS to map the SNP identifiers to Entrez gene identifiers7. With Anni, the SNP to Entrez gene identifier analysis would have to be performed separately, decreasing the reproducibility.

Some of the functionalities in Anni have not been migrated to the WS. For example, Anni provides a function for hierarchical clustering of the results. Clustering is not a CPA function by itself, but we are considering to implement workflows that perform this function. We are also working on a workflow implementation of the process that creates the data underlying the Anni WS, possibly using the recently developed text-mining workbench Argo8, allowing for more flexibility in performing CPA9. Specialization of the underlying resources for services to use in specific research domains, such as plant breeding or metabolomics, is a topic for future work.

Conclusions

By creating a WS building upon the Anni interactive tool, we made available the CPA technology in a way that users can easier integrate the technology with other software and save their procedures, results and related provenance.

Software availability

Archived source code as at the time of publication

http://www.dx.doi.org/10.5281/zenodo.1096311

Software license

Apache 2.0

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Jul 2014
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Hettne K, van Schouwen R, Mina E et al. Explain your data by Concept Profile Analysis Web Services [version 1; peer review: 2 approved with reservations]. F1000Research 2014, 3:173 (https://doi.org/10.12688/f1000research.4830.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 25 Jul 2014
Views
18
Cite
Reviewer Report 18 Sep 2014
Naoaki Okazaki, Department of System Information Sciences, Tohoku University, Sendai, Japan 
Approved with Reservations
VIEWS 18
This paper presents a suite of Web services with which users can design and execute their own workflow for Concept Profile Analysis. The paper includes examples of running the Web services on Taverna workbench.
 
The motivation of promoting the Web services ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Okazaki N. Reviewer Report For: Explain your data by Concept Profile Analysis Web Services [version 1; peer review: 2 approved with reservations]. F1000Research 2014, 3:173 (https://doi.org/10.5256/f1000research.5155.r5805)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
23
Cite
Reviewer Report 05 Aug 2014
Karin Verspoor, Computing and Information Systems Department, University of Melbourne, Melbourne, Australia 
Approved with Reservations
VIEWS 23
This article describes a suite of services that bring the previously existing concept profile analysis tool "Anni" to the web. These web services enable a range of bioinformatics analysis tasks, and can be embedded in other workflows. The authors provide ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Verspoor K. Reviewer Report For: Explain your data by Concept Profile Analysis Web Services [version 1; peer review: 2 approved with reservations]. F1000Research 2014, 3:173 (https://doi.org/10.5256/f1000research.5155.r5585)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 25 Jul 2014
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.