skip to main content
research-article

A fine-grained evaluation of SPARQL endpoint federation systems

Published: 01 January 2016 Publication History

Abstract

The Web of Data has grown enormously over the last years. Currently, it comprises a large compendium of interlinked and distributed datasets from multiple domains. Running complex queries on this compendium often requires accessing data from different endpoints within one query. The abundance of datasets and the need for running complex query has thus motivated a considerable body of work on SPARQL query federation systems, the dedicated means to access data distributed over the Web of Data. However, the granularity of previous evaluations of such systems has not allowed deriving of insights concerning their behavior in different steps involved during federated query processing. In this work, we perform extensive experiments to compare state-of-the-art SPARQL endpoint federation systems using the comprehensive performance evaluation framework FedBench. In addition to considering the tradition query runtime as an evaluation criterion, we extend the scope of our performance evaluation by considering criteria, which have not been paid much attention to in previous studies. In particular, we consider the number of sources selected, the total number of SPARQL ASK requests used, the completeness of answers as well as the source selection time. Yet, we show that they have a significant impact on the overall query runtime of existing systems. Moreover, we extend FedBench to mirror a highly distributed data environment and assess the behavior of existing systems by using the same performance criteria. As the result we provide a detailed analysis of the experimental outcomes that reveal novel insights for improving current and future SPARQL federation systems.

References

[1]
M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo and E. Ruckhaus, ANAPSID: An adaptive query processing engine for SPARQL endpoints, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer, Berlin Heidelberg, 2011, pp. 18–34.
[2]
Z. Akar, T.G. Halaç, E.E. Ekinci and O. Dikenelli, Querying the Web of interlinked datasets using VoID descriptions, in: Linked Data on the Web (LDOW2012), C. Bizer et al., eds, CEUR Workshop Proceedings, Vol. 937, 2012.
[3]
F. Amorim, Join reordering and bushy plans, 2013, https://www.simple-talk.com/sql/performance/join-reordering-and-bushy-plans/, Accessed: June 16, 2014.
[4]
C. Basca and A. Bernstein, Avalanche: Putting the spirit of the Web Back into Semantic Web querying, in: 6th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2010), A. Fokoue, Y. Guo and T. Liebig, eds, CEUR Workshop Proceedings, Vol. 669, 2010, pp. 64–79.
[5]
H. Betz, F. Gropengießer, K. Hose and K.-U. Sattler, Learning from the history of distributed query processing: A heretic view on Linked Data management, in: 3rd International Workshop on Consuming Linked Data (COLD 2012), J.F. Sequeda, A. Harth and O. Hartig, eds, CEUR Workshop Proceedings Vol. 905, 2012.
[6]
C. Bizer and A. Schultz, The Berlin SPARQL benchmark, International Journal on Semantic Web and Information Systems (IJSWIS), 5 (2009), IGI Global, 1–24.
[7]
O. Görlitz and S. Staab, SPLENDID: SPARQL endpoint federation exploiting VoID descriptions, in: 2nd International Workshop on Consuming Linked Data (COLD 2011), O. Hartig, A. Harth and J.F. Sequeda, eds, CEUR Workshop Proceedings, Vol. 782, 2011.
[8]
O. Görlitz and S. Staab, Federated data management and query optimization for Linked Open Data, in: New Directions in Web Data Management 1, A. Vakali and L. Jain, eds, Studies in Computational Intelligence, Vol. 331, Springer, Berlin, Heidelberg, 2011, pp. 109–137.
[9]
Y. Guo, Z. Pan and J. Heflin, LUBM: A benchmark for OWL knowledge base systems, in: Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 3, Elsevier, 2005, pp. 158–182.
[10]
O. Hartig, An overview on execution strategies for Linked Data queries, in: Datenbank-Spektrum, Vol. 13, Springer, 2013, pp. 89–99.
[11]
A. Hasnain, R. Fox, S. Decker and H.F. Deus, Cataloguing and linking life sciences LOD cloud, in: 1st International Workshop on Ontology Engineering in a Data-Driven World (OEDW 2012) Collocated with 8th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2012), 2012.
[12]
A. Hasnain, M. Kamdar, P. Hasapis, D. Zeginis, J. Warren, N. Claude, H. Deus, D. Ntalaperas, K. Tarabanis, M. Mehdi and S. Decker, Linked biomedical dataspace: Lessons learned integrating data for drug discovery, in: The Semantic Web – ISWC 2014, P. Mika, T. Tudorache, A. Bernstein, C. Welty, C. Knoblock, D. Vrandečić, P. Groth, N. Noy, K. Janowicz and C. Goble, eds, Lecture Notes in Computer Science, Vol. 8796, Springer International Publishing, 2014, pp. 114–130.
[13]
A. Hasnain, S. Sana e Zainab, M. Kamdar, Q. Mehmood, J. Warren, N. Claude, Q. Fatimah, H. Deus, M. Mehdi and S. Decker, A roadmap for navigating the life sciences Linked Open Data Cloud, in: Semantic Technology, T. Supnithi, T. Yamaguchi, J.Z. Pan, V. Wuwongse and M. Buranarach, eds, Lecture Notes in Computer Science, Vol. 8943, Springer International Publishing, 2015, pp. 97–112.
[14]
Y.E. Ioannidis and Y.C. Kang, Left-deep vs. Bushy Trees: An analysis of strategy spaces and its implications for query optimization, in: Proc. of the 1991 ACM SIGMOD International Conference on Management of Data, SIGMOD’91, C. James and K. Roger, eds, ACM, New York, NY, USA, 1991, pp. 168–177.
[15]
M.R. Kamdar, D. Zeginis, A. Hasnain, S. Decker and H.F. Deus, ReVeaLD: A user-driven domain-specific interactive search platform for biomedical research, Journal of Biomedical Informatics 47 (2014), Elsevier, 112–130.
[16]
Z. Kaoudi, M. Koubarakis, K. Kyzirakos, I. Miliaraki, M. Magiridou and A. Papadakis-Pesaresi, Atlas: Storing, updating and querying RDF(S) data on top of DHTs, in: Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 8, Elsevier, 2010, pp. 271–277.
[17]
Y. Khan, M. Saleem, A. Iqbal, M. Mehdi, A. Hogan, P. Hasapis, A.-C.N. Ngomo, S. Decker and R. Sahay, SAFE: Policy aware SPARQL query federation over RDF data cubes, in: Proc. of the 7th International Workshop on Semantic Web Applications and Tools for Life Sciences, A. Paschke, A. Burger, P. Romano, M.S. Marshall and A. Splendiani, eds, CEUR Workshop Proceedings, Vol. 1320, December 2014.
[18]
G. Ladwig and T. Tran, Linked Data query processing strategies, in: The Semantic Web – ISWC 2010, P. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Pan, I. Horrocks and B. Glimm, eds, Lecture Notes in Computer Science, Vol. 6496, Springer, Berlin, Heidelberg, 2010, pp. 453–469.
[19]
G. Ladwig and T. Tran, SIHJoin: Querying remote and local Linked Data, in: The Semantic Web: Research and Applications, G. Antoniou, M. Grobelnik, E. Simperl, B. Parsia, D. Plexousakis, P. De Leenheer and J. Pan, eds, Lecture Notes in Computer Science, Vol. 6643, Springer, Berlin, Heidelberg, 2011, pp. 139–153.
[20]
S. Lynden, I. Kojima, A. Matono and Y. Tanimura, ADERIS: An adaptive query processor for joining federated SPARQL endpoints, in: On the Move to Meaningful Internet Systems (OTM2011), Part II, R. Meersman, T. Dillon, P. Herrero, A. Kumar, M. Reichert, L. Qing, B.-C. Ooi, E. Damiani, D.C. Schmidt, J. White, M. Hauswirth, P. Hitzler and M. Mohania, eds, LNCS, Vol. 7045, Springer, Heidelberg, 2011, pp. 808–817.
[21]
G. Montoya, M.-E. Vidal and M. Acosta, A heuristic-based approach for planning federated SPARQL queries, in: 3rd International Workshop on Consuming Linked Data (COLD 2012), J.F. Sequeda, A. Harth and O. Hartig, eds, CEUR Workshop Proceedings, Vol. 905, 2012.
[22]
G. Montoya, M.-E. Vidal, O. Corcho, E. Ruckhaus and C. Buil-Aranda, Benchmarking federated SPARQL query engines: Are existing testbeds enough? in: The Semantic Web – ISWC 2012, Part II, P. Cudre Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J.X. Parreira, J. Hendler, G. Schreiber, A. Bernstein and E. Blomqvist, eds, LNCS, Vol. 7650, Springer, Heidelberg, 2012, pp. 313–324.
[23]
M. Morsey, J. Lehmann, S. Auer and A.-C. Ngonga Ngomo, DBpedia SPARQL benchmark – Performance assessment with real queries on real data, in: International Semantic Web Conference (ISWC2011), Part I, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, LNCS, Vol. 7031, Springer, Heidelberg, 2011, pp. 454–469.
[24]
A. Nikolov, A. Schwarte and C. Hütter, Fedsearch: Efficiently combining structured queries and full-text search in a SPARQL federation, in: The Semantic Web – ISWC 2013, H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J. Parreira, L. Aroyo, N. Noy, C. Welty and K. Janowicz, eds, Lecture Notes in Computer Science, Vol. 8218, Springer, Berlin, Heidelberg, 2013, pp. 427–443.
[25]
B. Quilitz and U. Leser, Querying distributed RDF data sources with SPARQL, in: The Semantic Web: Research and Applications, S. Bechhofer, M. Hauswirth, J. Hoffmann and M. Koubarakis, eds, Lecture Notes in Computer Science, Vol. 5021, Springer, Berlin, Heidelberg, 2008, pp. 524–538.
[26]
N.A. Rakhmawati, J. Umbrich, M. Karnstedt, A. Hasnain and M. Hausenblas, Querying over federated SPARQL endpoints – A state of the art survey, CoRR, 2013.
[27]
M. Saleem, R. Maulik, I. Aftab, S. Shanmukha, H. Deus and A.-C. Ngonga Ngomo, Fostering serendipity through Big Linked Data, in: Semantic Web Challenge at International Semantic Web Conference, 2013.
[28]
M. Saleem and A.-C. Ngonga Ngomo, HiBISCuS: Hypergraph-based source selection for SPARQL endpoint federation, in: The Semantic Web: Trends and Challenges, V. Presutti, C. d’Amato, F. Gandon, M. d’Aquin, S. Staab and A. Tordai, eds, Lecture Notes in Computer Science, Vol. 8465, Springer International Publishing, 2014, pp. 176–191.
[29]
M. Saleem, A.-C. Ngonga Ngomo, J. Xavier Parreira, H. Deus and M. Hauswirth, DAW: Duplicate-AWare federated query processing over the Web of Data, in: The Semantic Web – ISWC 2013, H. Alani, L. Kagal, A. Fokoue, P. Groth, C. Biemann, J. Parreira, L. Aroyo, N. Noy, C. Welty and K. Janowicz, eds, Lecture Notes in Computer Science, Vol. 8218, Springer, Berlin, Heidelberg, 2013, pp. 574–590.
[30]
M. Saleem, S.S. Padmanabhuni, A.-C.N. Ngomo, J.S. Almeida, S. Decker and H.F. Deus, Linked cancer genome atlas database, in: Proc. of the 9th International Conference on Semantic Systems, M. Sabou, E. Blomqvist, T. Di Noia, H. Sack and T. Pellegrini, eds, ACM, New York, NY, USA, 2013, pp. 129–134.
[31]
M. Schmidt, O. Görlitz, P. Haase, G. Ladwig, A. Schwarte and T. Tran, FedBench: A benchmark suite for federated semantic data query processing, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer, Berlin, Heidelberg, 2011, pp. 585–600.
[32]
M. Schmidt, T. Hornung, G. Lausen and C. Pinkel, SP2Bench: A SPARQL performance benchmark, in: Proc. of the 25th International Conference on Data Engineering ICDE, IEEE, 2009, pp. 222–233.
[33]
A. Schwarte, P. Haase, K. Hose, R. Schenkel and M. Schmidt, FedX: Optimization techniques for federated query processing on Linked Data, in: The Semantic Web – ISWC 2011, L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. Noy and E. Blomqvist, eds, Lecture Notes in Computer Science, Vol. 7031, Springer, Berlin, Heidelberg, 2011, pp. 601–616.
[34]
A. Schwarte, P. Haase, M. Schmidt, K. Hose and R. Schenkel, An experience report of large scale federations, CoRR, 2012.
[35]
P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R.A. Lorie and T.G. Price, Access path selection in a relational database management system, in: Proc. of the 1979 ACM SIGMOD International Conference on Management of Data, SIGMOD’79, ACM, New York, NY, USA, 1979, pp. 23–34.
[36]
J. Umbrich, A. Hogan, A. Polleres and S. Decker, Link traversal querying for a Diverse Web of Data, Semantic Web Journal (SWJ), IOS Press, 2014, accepted for publication.
[37]
J. Umbrich, K. Hose, M. Karnstedt, A. Harth and A. Polleres, Comparing data summaries for processing live queries over Linked Data, World Wide Web Journal 14 (2011), Springer US, 495–544.
[38]
X. Wang, T. Tiropanis and H.C. Davis, LHD: Optimising Linked Data query processing using parallelisation, in: Proc. of the WWW2013 Workshop on Linked Data on the Web, C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas and S. Auer, eds, CEUR Workshop Proceedings, Vol. 996, 2013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Semantic Web
Semantic Web  Volume 7, Issue 5
2016
93 pages
ISSN:1570-0844
EISSN:2210-4968
Issue’s Table of Contents

Publisher

IOS Press

Netherlands

Publication History

Published: 01 January 2016

Author Tags

  1. SPARQL federation
  2. Web of Data
  3. RDF

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media