tutorial

The W3C PROV family of specifications for modelling provenance metadata

Authors:

Paolo Missier,

Khalid Belhajjame,

James CheneyAuthors Info & Claims

EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology

Pages 773 - 776

https://doi.org/10.1145/2452376.2452478

Published: 18 March 2013 Publication History

Get Access

Abstract

Provenance, a form of structured metadata designed to record the origin or source of information, can be instrumental in deciding whether information is to be trusted, how it can be integrated with other diverse information sources, and how to establish attribution of information to authors throughout its history. The PROV set of specifications, produced by the World Wide Web Consortium (W3C), is designed to promote the publication of provenance information on the Web, and offers a basis for interoperability across diverse provenance management systems. The PROV provenance model is deliberately generic and domain-agnostic, but extension mechanisms are available and can be exploited for modelling specific domains. This tutorial provides an account of these specifications. Starting from intuitive and informal examples that present idiomatic provenance patterns, it progressively introduces the relational model of provenance along with the constraints model for validation of provenance documents, and concludes with example applications that show the extension points in use.

References

[1]

P. Agrawal, O. Benjelloun, et al. Trio: a system for data, uncertainty, and lineage. In Proceedings of the 32nd international conference on Very large data bases, VLDB '06, pages 1151--1154. VLDB Endowment, 2006.

Digital Library

Google Scholar

[2]

K. Belhajjame et al. Workflow-centric research objects: First class citizens in scholarly discourse. In Proceedings of Sepublica 2012, pages 1--12, Hersonissos, 2012.

Google Scholar

[3]

P. Buneman, S. Khanna, and W. C. Tan. Why and Where: A Characterization of Data Provenance. In ICDT, pages 316--330, 2001.

Digital Library

Google Scholar

[4]

I. Celino, S. Contessa, M. Corubolo, et al. Linking smart cities datasets with human computation - the case of UrbanMatch. In P. Cudré-Mauroux et al., editors, ISWC, volume 7650 of Lecture Notes in Computer Science, pages 34--49. Springer, 2012.

Digital Library

Google Scholar

[5]

J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, 1:379--474, 2009.

Digital Library

Google Scholar

[6]

J. Cheney, A. Finkelstein, B. Ludaescher, and S. Vansummeren. Principles of Provenance (Dagstuhl Seminar 12091). Dagstuhl Reports, 2(2):84--113, 2012.

Google Scholar

[7]

L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. DBNotes: a post-it system for relational databases based on provenance. In SIGMOD, 2005.

Digital Library

Google Scholar

[8]

V. Cuevas-Vicenttin, S. Dey, and B. Ludaescher. Modeling and querying scientific workflow provenance in the D-OPM. In WORKS. ACM, 2012.

Digital Library

Google Scholar

[9]

E. Deelman, D. Gannon, M. S. Shields, and I. Taylor. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Comp. Syst., 25(5):528--540, 2009.

Digital Library

Google Scholar

[10]

M. Ebden, T. D. Huynh, L. Moreau, et al. Network analysis on provenance graphs from a crowdsourcing application. In Groth and Frew {12}, pages 168--182.

Digital Library

Google Scholar

[11]

R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005.

Digital Library

Google Scholar

[12]

P. T. Groth and J. Frew, editors. 4th International Provenance and Annotation Workshop, IPAW 2012, Santa Barbara, CA, USA, June 19--21, 2012, volume 7525 of Lecture Notes in Computer Science. Springer, 2012.

Crossref

Google Scholar

[13]

G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD Conference, 2010.

Digital Library

Google Scholar

[14]

P. Missier and K. Belhajjame. A PROV encoding for provenance analysis using deductive rules. In Procs. IPAW'12, Santa Barbara, California, 2012. Springer-Verlag, Lecture Notes in Computer Science.

Digital Library

Google Scholar

[15]

P. Missier, S. Soiland-Reyes, S. Owen, et al. Taverna, reloaded. In Procs. SSDBM 2010, volume 6187 of Lecture Notes in Computer Science, pages 471--481, Heidelberg, Germany, 2010. Springer.

Digital Library

Google Scholar

[16]

H. Yang, D. T. Michaelides, C. Charlton, et al. Deep: A provenance-aware executable document system. In Groth and Frew {12}, pages 24--38.

Digital Library

Google Scholar

Cited By

View all

Meroño-Peñuela ASimperl EKurteva AReklos I(2025)KG.GOV: Knowledge graphs as the backbone of data governance in AIJournal of Web Semantics10.1016/j.websem.2024.10084785(100847)Online publication date: May-2025
https://doi.org/10.1016/j.websem.2024.100847
Dooley DWeber MIbanescu LLange MChan LSoldatova LYang CWarren RShimizu CMcGinty HHsiao W(2024)Food process ontology requirementsSemantic Web10.3233/SW-22309615:4(1133-1164)Online publication date: 4-Oct-2024
https://doi.org/10.3233/SW-223096
Pina DChapman AKunstmann Lde Oliveira DMattoso M(2024)DLProv: A Data-Centric Support for Deep Learning Workflow AnalysesProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663337(77-85)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3650203.3663337
Show More Cited By

Index Terms

The W3C PROV family of specifications for modelling provenance metadata

Recommendations

Modeling uncertain provenance and provenance of uncertainty in W3C PROV
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide Web

This paper describes how to model uncertain provenance and provenance of uncertain things in a flexible and unintrusive manner using PROV, W3C's new standard for provenance. Three new attributes with clearly defined values and semantics are proposed. ...
Modeling Information Diffusion in Social Media as Provenance with W3C PROV
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide Web

In recent years, research in information diffusion in social media has attracted a lot of attention, since the produced data is fast, massive and viral. Additionally, the provenance of such data is equally important because it helps to judge the ...
An Ecore Metamodel for the W3C PROV Provenance Data Model
SBSI '24: Proceedings of the 20th Brazilian Symposium on Information Systems

Context: In contemporary Information Systems development, effective management of data provenance has become pivotal for ensuring transparency, accountability, and reproducibility, particularly in the context of data-intensive applications.

Problem: ...

Reviews

Reviewer: Yingjie Li

Provenance is information about entities, activities, and people involved in producing a piece of data or a thing, which can be used to assess its quality, reliability or trustworthiness. This paper focuses on a new approach using the standard PROV model recommended by the World Wide Web Consortium (W3C) to model provenance. The W3C PROV model defines a core model for provenance representation. Individuals involved in the semantic web, provenance, and ontology field will want to study this work. The first part of the paper provides an intuitive overview of the W3C PROV model with an example involving a complete account of PROV relations with three types of instances: entities, activities, and agents. In PROV, physical, digital, conceptual, or other kinds of things are called entities. [...] Activities are how entities come into existence and how their attributes change to become new entities. [...] An agent takes a role in an activity such that the agent can be assigned some degree of responsibility for the activity taking place. [1] In addition, the validity of the provenance statements is defined with reference to a set of constraints that the statements must satisfy. For instance, when two entities use the predicate prov:wasDerivedFrom , it implies that the first entity precedes the second one. The second part of the paper presents a number of applications that use the PROV model to capture provenance information. In Dictionary, the PROV model asserts the membership of a word in a dictionary and records the change (insertion and removal) history of the words. In Scientific Workflows, the PROV model captures information about the data products used and generated by the steps that compose the workflows. As a result, people can easily debug workflows and reproduce the workflow results. In Executable Documents, the PROV model captures the provenance of each research object to trace its evolution over time. In Smart Cities, the PROV model records the provenance information about citizens and their contributions to assist in the verification of collected data. The paper would have been more complete if the authors had provided a deeper analysis of how the applications apply the PROV model in terms of provenance modeling and querying. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology

March 2013

793 pages

ISBN:9781450315975

DOI:10.1145/2452376

General Chair:
Giovanna Guerrini
Università di Genova, Italy
,
Program Chair:
Norman W. Paton
University of Manchester, UK

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Tutorial

Conference

EDBT/ICDT '13

EDBT/ICDT '13: Joint 2013 EDBT/ICDT Conferences

March 18 - 22, 2013

Genoa, Italy

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

131
Total Citations
View Citations
676
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)7

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Meroño-Peñuela ASimperl EKurteva AReklos I(2025)KG.GOV: Knowledge graphs as the backbone of data governance in AIJournal of Web Semantics10.1016/j.websem.2024.10084785(100847)Online publication date: May-2025
https://doi.org/10.1016/j.websem.2024.100847
Dooley DWeber MIbanescu LLange MChan LSoldatova LYang CWarren RShimizu CMcGinty HHsiao W(2024)Food process ontology requirementsSemantic Web10.3233/SW-22309615:4(1133-1164)Online publication date: 4-Oct-2024
https://doi.org/10.3233/SW-223096
Pina DChapman AKunstmann Lde Oliveira DMattoso M(2024)DLProv: A Data-Centric Support for Deep Learning Workflow AnalysesProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663337(77-85)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3650203.3663337
Han RZheng MByna STang HDong BDai DChen YKim DHassoun JThorsley D(2024)PROV-IO: A Cross-Platform Provenance Framework for Scientific Data on HPC SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.337455535:5(844-861)Online publication date: May-2024
https://doi.org/10.1109/TPDS.2024.3374555
Fotos NDelgado J(2024)Towards Privacy-Enhancing Provenance Annotations for Images2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647277(3785-3791)Online publication date: 27-Oct-2024
https://doi.org/10.1109/ICIP51287.2024.10647277
Abbasi WTaweel ASaracino A(2024)A Provenance-Driven Approach for Detecting Revenue Leakage in Telecom2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00069(574-582)Online publication date: 8-Jul-2024
https://doi.org/10.1109/EuroSPW61312.2024.00069
Almuntashiri AIbàńez LChapman A(2024)LLMs for the Post-Hoc Creation of Provenance2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00068(562-566)Online publication date: 8-Jul-2024
https://doi.org/10.1109/EuroSPW61312.2024.00068
Sacco LSopranzetti CFiore S(2024)Enabling Provenance Tracking in Workflow Management Systems2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825405(4402-4409)Online publication date: 15-Dec-2024
https://doi.org/10.1109/BigData62323.2024.10825405
Kleinsteuber EAl Mustafa TZander FKönig-Ries BBabalou S(2024)Managing Provenance Data in Knowledge Graph Management PlatformsDatenbank-Spektrum10.1007/s13222-023-00463-024:1(43-52)Online publication date: 5-Feb-2024
https://doi.org/10.1007/s13222-023-00463-0
Nica RGötz SMoltó G(2024)CMK: Enhancing Resource Usage Monitoring across Diverse Bioinformatics Workflow Management SystemsJournal of Grid Computing10.1007/s10723-024-09777-z22:3Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s10723-024-09777-z
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Modeling uncertain provenance and provenance of uncertainty in W3C PROV

Modeling Information Diffusion in Social Media as Provenance with W3C PROV

An Ecore Metamodel for the W3C PROV Provenance Data Model

Reviews

Access critical reviews of Computing literature here