-
Notifications
You must be signed in to change notification settings - Fork 0
sparql ly_InterMines
- Gos Micklem [email protected] (happy to be primary contact until the others volunteer)
- Alex Kalderimis: [email protected]
- Jerven Bolleman: [email protected]
Context of the hacking activity - what is the problem you are trying to solve (references to the primary literature appreciated!) - one paragraph
InterMine [A] is an open source graph-based data warehouse system built on top of postgresql. Through a collaboration (the InterMOD consortium [B]), with most of the main animal Model Organism Databases (MODs) there are now InterMine databases available for budding yeast (SGD) [C], rat (RGD), zebrafish (ZFIN), mouse (MGI), nematode (WormBase) and Fly (InterMine group) [D], with further MOD InterMine instances expected. Extensive data from the modENCODE project [E] are also available through modMine [F]. In order to expose these rich data as RDF with associated sparql endpoints it is necessary
- to describe the services and
- to provide a sparql endpoint
We have written code that uses existing InterMine RESTful web services to interrogate the FlyMine database and generate a VoID description of the database. Further work is required to adjust the core InterMine data model to include additional database meta-data items. This will then allow the automatic generation of VoID descriptions for any InterMine database. Further work is also required to ensure that appropriate standards are adhered to, especially for RDF predicates. In addition to the above developments, we have made progress in creating a Sesame[G]-based sparql endpoint for InterMine databases to complement the existing web application and web services. At the moment the endpoint only supports a small range of simple queries. In future we hope that such endpoints will make available the rich data assembled and curated by the world wide Model Organism Database community. In the process this should provide opportunities for interoperation and also a mechanism for federation across different systems.
-
Made adapter for JBrowse to run on Intermine services, using simple rest service.
-
Auto generates ontology
-
Runs SPARQL against InterMine rest services. Slow but correct! Later will make it fast.
-
No need for new infrastructure but works correctly already.
-
Need to go for query federalization.
-
Developing D3 based visualisations for Intermine (which is graphbased datastore)
Will be available via interMine github repository (see http://www.intermine.org)
[A] InterMine
Smith RN, Aleksic J, Butano D, Carr A, Contrino S, Hu F, Lyne M, Lyne R, Kalderimis A, Rutherford K, Stepan R, Sullivan J, Wakeling M, Watkins X, Micklem G. (2012) InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data. Bioinformatics. Dec 1;28(23):3163-5. PMID: 23023984
[B] InterMOD
Sullivan J, Karra K, Moxon SAT, Vallejos A, Motenko H, Wong JD, Aleksic J, Balakrishnan R, Binkley G, Harris T, Hitz B, Jayaraman P, Lyne R, Neuhauser S, Pich C, Smith RN, Trinh Q, Cherry JM, Richardson J, Stein L, Twigger S, Westerfield M, Worthey E, Micklem G (2013) InterMOD: integrated data and tools for the unification of model organism research. Scientific Reports 3:1802. doi: 10.1038/srep01802. PMID: 23652793
[C] YeastMine
Balakrishnan R, Park J, Karra K, Hitz BC, Binkley G, Hong EL, Sullivan J, Micklem G, Michael Cherry J. (2012) YeastMine--an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit. Database (Oxford).Mar 20;2012:bar062. PMCID: PMC3308152
[D] FlyMine
Lyne R, Smith R, Rutherford K, Wakeling M, Riley T, Guillier F, Ji W, Mclaren P, Woodbridge M, Janssens H, Watkins X, Rana D, Varley A, Lilley K, Russell S, Ashburner M, Mizuguchi K, and Micklem G (2007) FlyMine: An Integrated database for Drosophila and Anopheles genomics. Genome Biology 8, R129 PMID: 17615057.
[E] modENCODE marker paper
Celniker SE, Dillon LAL, Gerstein MB, Gunwales KC, Henikoff S, Karpen GH, Kellis M, Lai EC, Lieb JD, MacAlpine DM, Micklem G, Piano F, Snyder M, Stein L, White KP, Waterston RH & modENCODE Consortium (2009) Unlocking the secrets of the genome. Nature 459, 927-930 PMID: 19536255.
[F] modMine
ontrino S, Smith RN, Butano D, Carr A, Hu F, Lyne R, Rutherford K, Kalderimis A, Sullivan J, Carbon S, Kephart E, Lloyd P, Stinson E, Washington N, Perry M, Ruzanov P, Zha Z, Lewis SE, Stein LD, Micklem G (2012) modMine: flexible access to modENCODE data. Nucleic Acids Research Jan;40(Database issue):D1082-8. PMID: 22080565
[G] Sesame