skip to main content
research-article

On measuring the lattice of commonalities among several linked datasets

Published: 01 August 2016 Publication History

Abstract

A big number of datasets has been published according to the principles of Linked Data and this number keeps increasing. Although the ultimate objective is linking and integration, it is not currently evident how connected the current LOD cloud is. Measurements (and indexes) that involve more than two datasets are not available although they are important: (a) for obtaining complete information about one particular URI (or set of URIs) with provenance (b) for aiding dataset discovery and selection, (c) for assessing the connectivity between any set of datasets for quality checking and for monitoring their evolution over time, (d) for constructing visualizations that provide more informative overviews. Since it would be prohibitively expensive to perform all these measurements in a naïve way, in this paper we introduce indexes (and their construction algorithms) that can speedup such tasks. In brief, we introduce (i) a namespace-based prefix index, (ii) a sameAs catalog for computing the symmetric and transitive closure of the owl:sameAs relationships encountered in the datasets, (iii) a semantics-aware element index (that exploits the aforementioned indexes), and finally (iv) two lattice-based incremental algorithms for speeding up the computation of the intersection of URIs of any set of datasets. We discuss the speedup obtained by the introduced indexes and algorithms through comparative results and finally we report measurements about connectivity of the LOD cloud that have never been carried out so far.

References

[1]
C. Aggarwal, Y. Xie, and P. S. Yu. Gconnect: A connectivity index for massive disk-resident graphs. Proceedings of the VLDB Endowment, 2(1):862--873, 2009.
[2]
S. Auer, J. Demter, M. Martin, and J. Lehmann. LODStats - an Extensible Framework for High-Performance Dataset Analytics. In Proceedings of EKAW, volume 7603, pages 353--362. 2012.
[3]
N. Bikakis and T. K. Sellis. Exploration and visualization in the web of big linked data: A survey of the state of the art. In LWDM, 2016.
[4]
C. Bizer, P. Boncz, M. L. Brodie, and O. Erling. The meaningful use of big data: four perspectives--four challenges. ACM SIGMOD Record, 40(4):56--60, 2012.
[5]
L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. Doshi, and J. Sachs. Swoogle: a search and metadata engine for the semantic web. In Proceedings of CIKM, pages 652--659, 2004.
[6]
M. B. Ellefi, Z. Bellahsene, S. Dietze, and K. Todorov. Dataset recommendation for data linking: an intensional approach. In Proceedingss of ESWC, volume 9678, pages 36--51, 2016.
[7]
A. Harth, J. Umbrich, A. Hogan, and S. Decker. YARS2: A federated repository for querying graph structured data from the web. In ISWC, volume 4825, pages 211--224, 2007.
[8]
J. Hendler. Data integration for heterogenous datasets. Big data, 2(4):205--215, 2014.
[9]
A. Hogan, A. Harth, and S. Decker. Performing object consolidation on the semantic web data graph. In Proceedings 3 of the WWW Workshop I3, 2007.
[10]
A. Hogan, A. Harth, J. Umbrich, S. Kinsella, A. Polleres, and S. Decker. Searching and browsing linked data with swse: The semantic web search engine. Web semantics: science, services and agents on the world wide web, 9(4):365--401, 2011.
[11]
T. Jech. Set theory. Springer Science & Business Media, 2013.
[12]
M. H. Keith Alexander, Richard Cyganiak and J. Zhao. Describing linked datasets with the VoID vocabulary, W3C interest group note, 2011.
[13]
G. Klyne and J. J. Carroll. Resource description framework (rdf): Concepts and abstract syntax. 2006.
[14]
M. Mountantonakis, N. Minadakis, Y. Marketakis, P. Fafalios, and Y. Tzitzikas. Quantifying the connectivity of a semantic warehouse and understanding its evolution over time. IJSWIS, 12(3), 2016.
[15]
M. Nentwig, T. Soru, A.-C. N. Ngomo, and E. Rahm. Linklion: A link repository for the web of data. In The Semantic Web: ESWC Satellite Events, volume 8798, pages 439--443. 2014.
[16]
T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The VLDB Journal, 19(1):91--113, 2010.
[17]
D. Oguz, B. Ergenc, S. Yin, O. Dikenelli, and A. Hameurlain. Federated query processing on linked data: a qualitative survey and open challenges. The Knowledge Engineering Review, 30(5):545--563, 2015.
[18]
P. Peng, L. Zou, M. T. Özsu, L. Chen, and D. Zhao. Processing SPARQL queries over distributed RDF graphs. The VLDB Journal, 25(2):243--268, 2016.
[19]
E. Prud' Hommeaux, A. Seaborne, et al. SPARQL query language for RDF. W3C recommendation, 15, 2008.
[20]
L. Rietveld, W. Beek, and S. Schlobach. LOD lab: Experiments at LOD scale. In Proceedings of ISWC, volume 9367, pages 339--355, 2015.
[21]
M. Schmachtenberg, C. Bizer, and H. Paulheim. Adoption of the linked data best practices in different topical domains. In Porceedings of ISWC, volume 8796, pages 245--260. 2014.
[22]
R. Tarjan. Depth-first search and linear graph algorithms. In Twelfth Annual Symposium on Switching and Automata Theory, pages 114--121, 1971.
[23]
Y. Theoharis, Y. Tzitzikas, D. Kotzinos, and V. Christophides. On graph features of semantic web schemas. Knowledge and Data Engineering, 20(5):692--702, 2008.
[24]
G. Tummarello, E. Oren, and R. Delbru. Sindice.com: Weaving the open linked data. In Proceedings of ISWC, volume 4825, pages 547--560, 2007.
[25]
A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality assessment for linked data: A survey. Semantic Web Journal, 7(1):63--93, 2016.
[26]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM computing surveys, 38(2):6, 2006.
[27]
K. Zoumpatianos, S. Idreos, and T. Palpanas. Indexing for interactive exploration of big data series. In Proceedings of the ACM SIGMOD, pages 1555--1566, 2014.

Cited By

View all
  1. On measuring the lattice of commonalities among several linked datasets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 9, Issue 12
    August 2016
    345 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 August 2016
    Published in PVLDB Volume 9, Issue 12

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 23 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media