skip to main content
10.1145/2452376.2452478acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
tutorial

The W3C PROV family of specifications for modelling provenance metadata

Published: 18 March 2013 Publication History

Abstract

Provenance, a form of structured metadata designed to record the origin or source of information, can be instrumental in deciding whether information is to be trusted, how it can be integrated with other diverse information sources, and how to establish attribution of information to authors throughout its history. The PROV set of specifications, produced by the World Wide Web Consortium (W3C), is designed to promote the publication of provenance information on the Web, and offers a basis for interoperability across diverse provenance management systems. The PROV provenance model is deliberately generic and domain-agnostic, but extension mechanisms are available and can be exploited for modelling specific domains. This tutorial provides an account of these specifications. Starting from intuitive and informal examples that present idiomatic provenance patterns, it progressively introduces the relational model of provenance along with the constraints model for validation of provenance documents, and concludes with example applications that show the extension points in use.

References

[1]
P. Agrawal, O. Benjelloun, et al. Trio: a system for data, uncertainty, and lineage. In Proceedings of the 32nd international conference on Very large data bases, VLDB '06, pages 1151--1154. VLDB Endowment, 2006.
[2]
K. Belhajjame et al. Workflow-centric research objects: First class citizens in scholarly discourse. In Proceedings of Sepublica 2012, pages 1--12, Hersonissos, 2012.
[3]
P. Buneman, S. Khanna, and W. C. Tan. Why and Where: A Characterization of Data Provenance. In ICDT, pages 316--330, 2001.
[4]
I. Celino, S. Contessa, M. Corubolo, et al. Linking smart cities datasets with human computation - the case of UrbanMatch. In P. Cudré-Mauroux et al., editors, ISWC, volume 7650 of Lecture Notes in Computer Science, pages 34--49. Springer, 2012.
[5]
J. Cheney, L. Chiticariu, and W.-C. Tan. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, 1:379--474, 2009.
[6]
J. Cheney, A. Finkelstein, B. Ludaescher, and S. Vansummeren. Principles of Provenance (Dagstuhl Seminar 12091). Dagstuhl Reports, 2(2):84--113, 2012.
[7]
L. Chiticariu, W.-C. Tan, and G. Vijayvargiya. DBNotes: a post-it system for relational databases based on provenance. In SIGMOD, 2005.
[8]
V. Cuevas-Vicenttin, S. Dey, and B. Ludaescher. Modeling and querying scientific workflow provenance in the D-OPM. In WORKS. ACM, 2012.
[9]
E. Deelman, D. Gannon, M. S. Shields, and I. Taylor. Workflows and e-science: An overview of workflow system features and capabilities. Future Generation Comp. Syst., 25(5):528--540, 2009.
[10]
M. Ebden, T. D. Huynh, L. Moreau, et al. Network analysis on provenance graphs from a crowdsourcing application. In Groth and Frew {12}, pages 168--182.
[11]
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005.
[12]
P. T. Groth and J. Frew, editors. 4th International Provenance and Annotation Workshop, IPAW 2012, Santa Barbara, CA, USA, June 19--21, 2012, volume 7525 of Lecture Notes in Computer Science. Springer, 2012.
[13]
G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD Conference, 2010.
[14]
P. Missier and K. Belhajjame. A PROV encoding for provenance analysis using deductive rules. In Procs. IPAW'12, Santa Barbara, California, 2012. Springer-Verlag, Lecture Notes in Computer Science.
[15]
P. Missier, S. Soiland-Reyes, S. Owen, et al. Taverna, reloaded. In Procs. SSDBM 2010, volume 6187 of Lecture Notes in Computer Science, pages 471--481, Heidelberg, Germany, 2010. Springer.
[16]
H. Yang, D. T. Michaelides, C. Charlton, et al. Deep: A provenance-aware executable document system. In Groth and Frew {12}, pages 24--38.

Cited By

View all

Recommendations

Reviews

Yingjie Li

Provenance is information about entities, activities, and people involved in producing a piece of data or a thing, which can be used to assess its quality, reliability or trustworthiness. This paper focuses on a new approach using the standard PROV model recommended by the World Wide Web Consortium (W3C) to model provenance. The W3C PROV model defines a core model for provenance representation. Individuals involved in the semantic web, provenance, and ontology field will want to study this work. The first part of the paper provides an intuitive overview of the W3C PROV model with an example involving a complete account of PROV relations with three types of instances: entities, activities, and agents. In PROV, physical, digital, conceptual, or other kinds of things are called entities. [...] Activities are how entities come into existence and how their attributes change to become new entities. [...] An agent takes a role in an activity such that the agent can be assigned some degree of responsibility for the activity taking place. [1] In addition, the validity of the provenance statements is defined with reference to a set of constraints that the statements must satisfy. For instance, when two entities use the predicate prov:wasDerivedFrom , it implies that the first entity precedes the second one. The second part of the paper presents a number of applications that use the PROV model to capture provenance information. In Dictionary, the PROV model asserts the membership of a word in a dictionary and records the change (insertion and removal) history of the words. In Scientific Workflows, the PROV model captures information about the data products used and generated by the steps that compose the workflows. As a result, people can easily debug workflows and reproduce the workflow results. In Executable Documents, the PROV model captures the provenance of each research object to trace its evolution over time. In Smart Cities, the PROV model records the provenance information about citizens and their contributions to assist in the verification of collected data. The paper would have been more complete if the authors had provided a deeper analysis of how the applications apply the PROV model in terms of provenance modeling and querying. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology
March 2013
793 pages
ISBN:9781450315975
DOI:10.1145/2452376

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Tutorial

Conference

EDBT/ICDT '13

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)72
  • Downloads (Last 6 weeks)7
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media