NOT PEER-REVIEWED
"PeerJ Preprints" is a venue for early communication or feedback before peer review. Data may be preliminary.

A peer-reviewed article of this Preprint also exists.

View peer-reviewed version

Supplemental Information

Figure 1: The two layers of the FAIR Accessor

Inspired by the LDP Container, there are two resources in the FAIR Accessor. The first resource is a Container, which responds to an HTTP GET request by providing FAIR metadata about a composite research object, and optionally a list of URLs representing MetaRecords that describe individual components within the collection. The MetaRecord resources resolve by HTTP GET to documents containing metadata about an individual data component and, optionally, a set of links structured as DCAT Distributions that lead to various representations of that data.

DOI: 10.7287/peerj.preprints.2522v2/supp-1

Figure 2: Diagram of the structure of an exemplar Triple Descriptor representing a hypothetical record of a SNP in a patient’s genome

In this descriptor, the Subject will have the URL structure http://example.org/patient/{id}, and the Subject is of type PatientRecord. The Predicate is hasVariant, and the Object will have URL structure http://identifiers.org/dbsnp/{snp} with the rdf:type from the sequence ontology “0000694” (which is the concept of a “SNP”). The two nodes shaded green are of the same ontological type, showing the iterative nature of RML, and how individual RML Triple Descriptors will be concatenated into full FAIR Profiles. The three nodes shaded yellow are the nodes that define the subject type, predicate and object type of the triple being described.

DOI: 10.7287/peerj.preprints.2522v2/supp-2

Figure 3. Integration of FAIR Projectors into the FAIR Accessor

Resolving the MetaRecord resource returns a metadata document containing multiple DCAT Distributions for a given record, as in Figure 1. When a FAIR Projector is available, additional DCAT Distributions are included in this metadata document. These Distributions contain a URL (purple text) representing a Projector, and a Triple Descriptor that describes, in RML, the structure and semantics of the Triple(s) that will be obtained from that Projector resource if it is resolved. These Triple Descriptors may be aggregated into FAIR Profiles, based on the Record that they are associated with (Record R, in the figure) to give a full mapping of all available representations of the data present in Record R.

DOI: 10.7287/peerj.preprints.2522v2/supp-3

Figure 4. A representative portion of the output from resolving the Container Resource of the FAIR Accessor, rendered into HTML by the Tabulator Firefox plugin

The three columns show the label of the Subject node of all RDF Triples (left), the label of the URI in the predicate position of each Triple (middle), and the value of the Object position (right), where blue text indicates that the value is a Resource, and black text indicates that the value is a literal.

DOI: 10.7287/peerj.preprints.2522v2/supp-4

Figure 5. A representative (incomplete) portion of the output from resolving the MetaRecord Resource of the FAIR Accessor for record C8V1L6 (at http://linkeddata.systems/Accessors/UniProtAccessor/C8V1L6), rendered into HTML by the Tabulator Firefox

The columns have the same meaning as in Figure 4.

DOI: 10.7287/peerj.preprints.2522v2/supp-5

Figure 6. Turtle representation of the subset of triples from the MetaRecord metadata pertaining to the two DCAT Distributions

Each distribution specifies an available representation (media type), and a URL from which that representation can be downloaded.

DOI: 10.7287/peerj.preprints.2522v2/supp-6

Figure 7. A portion of the output from resolving the MetaRecord Resource of the FAIR Accessor for record C8UZX9, rendered into HTML by the Tabulator Firefox plugin

The columns have the same meaning as in Figure 4. Comparing the structure of this document to that in Figure 5 shows that there are now four values for the “distribution” predicate. An RDF and HTML representation, as in Figure 5, and two additional distributions with URLs conforming to the TPF design pattern (highlighted).

DOI: 10.7287/peerj.preprints.2522v2/supp-7

Figure 8. Turtle representation of the subset of triples from the MetaRecord metadata pertaining to one of the FAIR Projector DCAT Distributions of the MetaRecord shown in Figure 7

The text is colour-coded to assist in visual exploration of the RDF. The DCAT Distribution blocks of the two Projector distributions (black bold) have multiple media-type representations (red), and are connected to an RML Map (Dark blue) by the hasMapping predicate, which is a block of RML that semantically describes the subject, predicate, and object (green, orange, and purple respectively) of the Triple Descriptor for that Projector. This block of RML is schematically diagrammed in Figure 2. The three media-types (red) indicate that the URL will respond to HTTP Content Negotiation, and may return any of those three formats.

DOI: 10.7287/peerj.preprints.2522v2/supp-8

Figure 9: Data before and after FAIR Projection

Bolded segments show how the URI structure and the semantics of the data were modified, according to the mapping defined in the Triple Descriptor (data_0896 = “Protein report” and data_1176 = “GO Concept ID”). URI structure transformations may be useful for integrative queries against datasets that utilize the Identifiers.org URI scheme such as OpenLifeData (González et al., 2014) . Semantic transformations allow integrative queries across datasets that utilize diverse and redundant ontologies for describing their data, and in this example, may also be used to add semantics where there were none before.

DOI: 10.7287/peerj.preprints.2522v2/supp-9

Additional Information

Competing Interests

The authors declare that they have no competing interests.

Author Contributions

Mark D Wilkinson conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, performed the computation work, reviewed drafts of the paper.

Ruben Verborgh conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, performed the computation work, reviewed drafts of the paper.

Luiz Olavo Bonino da Silva Santos conceived and designed the experiments, analyzed the data, reviewed drafts of the paper.

Tim Clark conceived and designed the experiments, reviewed drafts of the paper.

Morris A Swertz conceived and designed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.

Fleur D.L. Kelpin conceived and designed the experiments, analyzed the data, reviewed drafts of the paper.

Alasdair J. G. Gray conceived and designed the experiments, reviewed drafts of the paper.

Erik A. Schultes conceived and designed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.

Erik M. van Mulligen conceived and designed the experiments, reviewed drafts of the paper.

Paolo Ciccarese conceived and designed the experiments, reviewed drafts of the paper.

Arnold Kuzniar conceived and designed the experiments, analyzed the data, reviewed drafts of the paper.

Anand Gavai conceived and designed the experiments, analyzed the data, reviewed drafts of the paper.

Mark Thompson conceived and designed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.

Rajaram Kaliyaperumal conceived and designed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.

Jerven T. Bolleman analyzed the data, contributed reagents/materials/analysis tools, reviewed drafts of the paper, fixed the demonstrative query, clarified the semantics of UniProt, corrected erroneous ontological annotations;.

Michel Dumontier conceived and designed the experiments, performed the experiments, analyzed the data, wrote the paper, reviewed drafts of the paper.

Data Deposition

The following information was supplied regarding data availability:

The manuscript describes a set of practices and behaviors that combine third-party technologies and standards in a novel manner. This does not (necessarily) require novel, dedicated software, and therefore a repository is not provided. The paper uses only public data for its demonstration, and the query to retrieve that data from-source is provided in the manuscript text (the curator of that data is UniProt, the data is being used/republished with their explicit permission, and a member of their team is a co-author on the manuscript).

Funding

The lead author is supported by the Fundacion BBVA + UPM Isaac Peral programme, and the Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R. RV is a postdoctoral fellow of the Research Foundation – Flanders. Additional support for FAIR Skunkworks members comes from European Union funded projects ELIXIR-EXCELERATE (H2020 no. 676559), ADOPT BBMRI-ERIC (H2020 no. 676550) and CORBEL (H2020 no. 654248). Portions of this work have been funded by Netherlands Organisation for Scientific Research (Odex4all project), Stichting Topconsortium voor Kennis en Innovatie High Tech Systemen en Materialen (FAIRdICT project), BBMRI-NL, RD-Connect and ELIXIR (Rare disease implementation study FP7 no. 305444). UniProt is mainly supported by the National Institutes of Health (NIH), National Human Genome Research Institute (NHGRI) and National Institute of General Medical Sciences (NIGMS) grant U41HG007822. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI. The FAIR Data engineering team is supported by the Dutch Techcentre for Life Sciences and collaborates closely with the Biosemantics department at Leiden University Medical Center. ELIXIR sponsored several hackathons that initiated the project, and significant portions of this work were then undertaken at the BioHackathon 2016, supported by the Integrated Database Project (Ministry of Education, Culture, Sports Science and Technology, Japan), the National Bioscience Database Center (NBDC - Japan), and the Database Center for Life Sciences (DBCLS - Japan). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


Add your feedback

Before adding feedback, consider if it can be asked as a question instead, and if so then use the Question tab. Pointing out typos is fine, but authors are encouraged to accept only substantially helpful feedback.

Some Markdown syntax is allowed: _italic_ **bold** ^superscript^ ~subscript~ %%blockquote%% [link text](link URL)
 
By posting this you agree to PeerJ's commenting policies
5 Citations   Views   Downloads