Skip to content

sparql ly_InterMines

gmicklem edited this page Jul 30, 2013 · 3 revisions

Sub-group project: sparql-ly InterMines: exposing InterMine databases to the semantic web

Participants (please chose someone who will be the primary contact for the group):

Context of the hacking activity - what is the problem you are trying to solve (references to the primary literature appreciated!) - one paragraph

InterMine [A] is an open source graph-based data warehouse system built on top of postgresql. Through a collaboration (the InterMOD consortium [B]), with most of the main animal Model Organism Databases (MODs) there are now InterMine databases available for budding yeast (SGD) [C], rat (RGD), zebrafish (ZFIN), mouse (MGI), nematode (WormBase) and Fly (InterMine group) [D], with further MOD InterMine instances expected. Extensive data from the modENCODE project [E] are also available through modMine [F]. In order to expose these rich data as RDF with associated sparql endpoints it is necessary

  • to describe the services and
  • to provide a sparql endpoint

What was your approach and what were the outcomes - one paragraph

We have written code that uses existing InterMine RESTful web services to interrogate the FlyMine database and generate a VoID description of the database. Further work is required to adjust the core InterMine data model to include additional database meta-data items. This will then allow the automatic generation of VoID descriptions for any InterMine database. Further work is also required to ensure that appropriate standards are adhered to, especially for RDF predicates. In addition to the above developments, we have made progress in creating a Sesame[G]-based sparql endpoint for InterMine databases to complement the existing web application and web services. At the moment the endpoint only supports a small range of simple queries. In future we hope that such endpoints will make available the rich data assembled and curated by the world wide Model Organism Database community. In the process this should provide opportunities for interoperation and also a mechanism for federation across different systems.

Further details - in progress:

  • Made adapter for JBrowse to run on Intermine services, using simple rest service.
    
  • Auto generates ontology
    
  • Runs SPARQL against InterMine rest services. Slow but correct! Later will make it fast.
    
  • No need for new infrastructure but works correctly already.
    
  • Need to go for query federalization.
    
  • Developing D3 based visualisations for Intermine (which is graphbased datastore)
    

Availability of the software & references:

Will be available via interMine github repository (see http://www.intermine.org)

References:

[A] InterMine

Smith RN, Aleksic J, Butano D, Carr A, Contrino S, Hu F, Lyne M, Lyne R, Kalderimis A, Rutherford K, Stepan R, Sullivan J, Wakeling M, Watkins X, Micklem G. (2012) InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics. Dec 1;28(23):3163-5. PMID: 23023984

[B] InterMOD

Sullivan J, Karra K, Moxon SAT, Vallejos A, Motenko H, Wong JD, Aleksic J, Balakrishnan R, Binkley G, Harris T, Hitz B, Jayaraman P, Lyne R, Neuhauser S, Pich C, Smith RN, Trinh Q, Cherry JM, Richardson J, Stein L, Twigger S, Westerfield M, Worthey E, Micklem G (2013) InterMOD: integrated data and tools for the unification of model organism research. Scientific Reports 3:1802. doi: 10.1038/srep01802. PMID: 23652793

[C] YeastMine

Balakrishnan R, Park J, Karra K, Hitz BC, Binkley G, Hong EL, Sullivan J, Micklem G, Michael Cherry J. (2012) YeastMine--an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit. Database (Oxford).Mar 20;2012:bar062. PMCID: PMC3308152

[D] FlyMine

Lyne R, Smith R, Rutherford K, Wakeling M, Riley T, Guillier F, Ji W, Mclaren P, Woodbridge M, Janssens H, Watkins X, Rana D, Varley A, Lilley K, Russell S, Ashburner M, Mizuguchi K, and Micklem G (2007) FlyMine: An Integrated database for Drosophila and Anopheles genomics. Genome Biology 8, R129 PMID: 17615057.

[E] modENCODE marker paper

Celniker SE, Dillon LAL, Gerstein MB, Gunwales KC, Henikoff S, Karpen GH, Kellis M, Lai EC, Lieb JD, MacAlpine DM, Micklem G, Piano F, Snyder M, Stein L, White KP, Waterston RH & modENCODE Consortium (2009) Unlocking the secrets of the genome. Nature 459, 927-930 PMID: 19536255.

[F] modMine

ontrino S, Smith RN, Butano D, Carr A, Hu F, Lyne R, Rutherford K, Kalderimis A, Sullivan J, Carbon S, Kephart E, Lloyd P, Stinson E, Washington N, Perry M, Ruzanov P, Zha Z, Lewis SE, Stein LD, Micklem G (2012) modMine: flexible access to modENCODE data. Nucleic Acids Research Jan;40(Database issue):D1082-8. PMID: 22080565

[G] Sesame

http://www.openrdf.org/

Clone this wiki locally