H-SPOOL: A SPARQL-based ETL framework for OLAP over linked data with dimension hierarchy extraction
International Journal of Web Information Systems
ISSN: 1744-0084
Article publication date: 15 August 2016
Abstract
Purpose
Linked data (LD) has promoted publishing information, and links published information. There are increasing number of LD datasets containing numerical data such as statistics. For this reason, analyzing numerical facts on LD has attracted attentions from diverse domains. This paper aims to support analytical processing for LD data.
Design/methodology/approach
This paper proposes a framework called H-SPOOL which provides series of SPARQL (SPARQL Protocol and RDF Query Language) queries extracting objects and attributes from LD data sets, converts them into star/snowflake schemas and materializes relevant triples as fact and dimension tables for online analytical processing (OLAP).
Findings
The applicability of H-SPOOL is evaluated using exiting LD data sets on the Web, and H-SPOOL successfully processes the LD data sets to ETL (Extract, Transform, and Load) for OLAP. Besides, experiments show that H-SPOOL reduces the number of downloaded triples comparing with existing approach.
Originality/value
H-SPOOL is the first work for extracting OLAP-related information from SPARQL endpoints, and H-SPOOL drastically reduces the amount of downloaded triples.
Keywords
Acknowledgements
This research was partly supported by the program Research and Development on Real World Big Data Integration and Analysis of the Ministry of Education, Culture, Sports, Science and Technology, Japan.
Citation
Komamizu, T., Amagasa, T. and Kitagawa, H. (2016), "H-SPOOL: A SPARQL-based ETL framework for OLAP over linked data with dimension hierarchy extraction", International Journal of Web Information Systems, Vol. 12 No. 3, pp. 359-378. https://doi.org/10.1108/IJWIS-03-2016-0014
Publisher
:Emerald Group Publishing Limited
Copyright © 2016, Emerald Group Publishing Limited