skip to main content
survey

Provenance Analytics for Workflow-Based Computational Experiments: A Survey

Published: 23 May 2018 Publication History

Abstract

Until not long ago, manually capturing and storing provenance from scientific experiments were constant concerns for scientists. With the advent of computational experiments (modeled as scientific workflows) and Scientific Workflow Management Systems, produced and consumed data, as well as the provenance of a given experiment, are automatically managed, so provenance capturing and storing in such a context is no longer a major concern. Similarly to several existing big data problems, the bottom line is now on how to analyze the large amounts of provenance data generated by workflow executions and how to be able to extract useful knowledge of this data. In this context, this article surveys the current state of the art on provenance analytics by presenting the key initiatives that have been taken to support provenance data analysis. We also contribute by proposing a taxonomy to classify elements related to provenance analytics.

References

[1]
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. 1997. The Lorel query language for semistructured data. Int. J. Dig. Libr. 1, 1 (1997), 68--88.
[2]
M. Abouelhoda, S. Issa, and M. Ghanem. 2012. Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support. BMC Bioinformatics 13 (2012), 77.
[3]
V. Abramova and J. Bernardino. 2013. NoSQL databases: MongoDB vs cassandra. In Proceedings of the International C* Conference on Computer Science and Software Engineering (C3S2E’13). ACM, New York, NY, 14--22.
[4]
I. Altintas, M. K. Anand, T. N. Vuong, S. Bowers, B. Ludäscher, and P. M. A. Sloot. 2011. A data model for analyzing user collaborations in workflow-driven escience. Int. J. Comput. Appl. 18 (2011), 160--179.
[5]
I. Altintas, O. Barney, and E. Jaeger-frank. 2006. Provenance collection support in the Kepler scientific workflow system. In Proceedings of the International Provenance and Annotation Workshop (IPAW’06). 118--132.
[6]
M. K. Anand, S. Bowers, I. Altintas, and B. Ludäscher. 2010. Approaches for exploring and querying scientific workflow provenance graphs. In Provenance and Annotation of Data and Processes. Lecture Notes in Computer Science, Vol. 6378. Springer, 17--26.
[7]
M. K. Anand, S. Bowers, and B. Ludäscher. 2009. A navigation model for exploring scientific workflow provenance graphs. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science (WORKS’09). ACM, New York, NY, 1--10.
[8]
M. K. Anand, S. Bowers, and B. Ludäscher. 2010. Techniques for efficiently querying scientific workflow provenance graphs. In Proceedings of the International Conference on Extending Database Technology (EDBT’10). ACM, New York, NY, 287--298.
[9]
M. K. Anand, S. Bowers, and B. Ludäscher. 2012. Database support for exploring scientific workflow provenance graphs. In Scientific and Statistical Database Management. Lecture Notes in Computer Science, Vol. 7338. Springer, 343--360.
[10]
M. K. Anand, S. Bowers, T. McPhillips, and B. Ludäscher. 2009. Exploring scientific workflow provenance using hybrid queries over nested data and lineage graphs. In Scientific and Statistical Database Management. Lecture Notes in Computer Science, Vol. 5566. Springer, 237--254.
[11]
Z. Bao, S. Cohen-Boulakia, S. B. Davidson, A. Eyal, and S. Khanna. 2009b. Differencing provenance in scientific workflows. In Proceedings of the IEEE International Conference on Data Engineering. 808--819.
[12]
Z. Bao, S. Cohen-Boulakia, S. B. Davidson, and P. Girard. 2009a. PDiffView: Viewing the difference in provenance of workflow results. Proc. VLDB Endow. 2, 2 (2009), 1638--1641.
[13]
A. Berglund, S. Boag, D. Chamberlin, M. F. Fernández, M. Kay, J. Robie, and J. Siméon (Eds.). 2010. XML Path Language (XPath) 2.0 (2nd ed.). W3C.
[14]
O. Biton, S. Cohen-Boulakia, S. B. Davidson, and C. S. Hara. 2008. Querying and managing provenance through user views in scientific workflows. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08). 1072--1081.
[15]
O. Biton, S. Cohen-Boulakia, and S. B. Davidson. 2007. Zoom*userviews: Querying relevant provenance in workflow systems. In Proceedings of the International Conference on Very Large Data Bases (VLDB’07). 1366--1369.
[16]
S. Boag, D. Chamberlin, M. F. Fernandez, D. Florescu, J. Robie, and J. Simeon. 2010. XQuery 1.0: An XML Query Language.
[17]
M. A. Borkin, C. S. Yeh, M. Boyd, P. Macko, K. Z. Gajos, M. Seltzer, and H. Pfister. 2013. Evaluation of filesystem provenance visualization tools. IEEE Trans. Vis. Comput. Graph. 19, 12 (2013), 2476--2485.
[18]
R. Bose and J. Frew. 2004. Composing lineage metadata with XML for custom satellite-derived data products. In Proceedings of the 16th International Conference on Scientific and Statistical Database Management. 275--284.
[19]
R. Bose and J. Frew. 2005. Lineage retrieval for scientific data processing: A survey. Comput. Surv. 37, 1 (2005), 1--28.
[20]
J. Broekstra, A. Kampman, and F. van Harmelen. 2002. Sesame: A generic architecture for storing and querying RDF and RDF schema. In The Semantic Web—ISWC 2002. Lecture Notes in Computer Science, Vol. 2342. Springer, 54-68.
[21]
S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo. 2006a. Using Provenance to Streamline Data Exploration Through Visualization. Technical Report 2006-016. SCI Institute, University of Utah.
[22]
S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo. 2006b. VisTrails: Visualization meets data management. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 745--747.
[23]
J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. 2004. Jena: Implementing the semantic web recommendations. In Proceedings of the International World Wide Web Conference on Alternate Track Papers and Posters (WWW Alt.’04). ACM, New York, NY, 74--83.
[24]
S. Ceri, G. Gottlob, and L. Tanca. 1989. What you always wanted to know about datalog (and never dared to ask). IEEE Trans. Knowl. Data Eng. 1, 1 (1989), 146--166.
[25]
A. Chebotko, X. Lu, S.and Fei, and F. Fotouhi. 2010. RDFProv: A relational RDF store for querying and managing scientific workflow provenance. Data Knowl. Eng. 69, 8 (2010), 836--865.
[26]
P. Chen, B. Plale, and M. Aktas. 2012. Temporal Data Mining of Scientific Data Provenance. Technical Report. Indiana University Computer Science, Bloomington.
[27]
P. Chen, B. Plale, and M. S. Aktas. 2014. Temporal representation for mining scientific data provenance. Fut. Gener. Comput. Syst. 36 (2014), 363--378.
[28]
P. Chen, B. Plale, Y. W. Cheah, D. Ghoshal, S. Jensen, and Y. Luo. 2012. Visualization of network data provenance. In Proceedings of the 2012 19th International Conference on High Performance Computing (HiPC’12). 1--9.
[29]
K. Cheung and J. Hunter. 2006. Provenance explorer—customized provenance views using semantic inferencing. In Proceedings of the International Semantic Web Conference. 215--227.
[30]
S. Cohen, S. Cohen-Boulakia, and S. Davidson. 2006. Towards a model of provenance and user views in scientific workflows. In Data Integration in the Life Sciences. Lecture Notes in Computer Science, Vol. 4075. Springer, 264--279.
[31]
S. Cohen-Boulakia, O. Biton, S. Cohen, and S. Davidson. 2008. Addressing the provenance challenge using ZOOM. Concurr. Comput.: Pract. Exper. 20, 5 (2008), 497--506.
[32]
O. Corcho, D. Garijo Verdejo, K. Belhajjame, J. Zhao, P. Missier, D. Newman, R. Palma, S. Bechhofer, E. Garcia Cuesta, J. M. Gomez-Perez, G. Klyne, M. Roos, J. E. Ruiz, S. Soiland-Reyes, L. Verdes-Montenegro, D. De Roure, and C. Goble. 2012. Workflow-centric research objects: First class citizens in scholarly discourse. In Proceedings of the Workshop on Semantic Publishing. 1--12.
[33]
F. Costa, V. Silva, D. de Oliveira, K. Ocaña, E. Ogasawara, J. Dias, and M. Mattoso. 2013. Capturing and querying workflow runtime provenance with PROV: A practical approach. In Proceedings of the EDBT/ICDT Workshops. ACM, New York, NY, 282--289.
[34]
S. M. S. da Cruz, M. Campos, and M. Mattoso. 2009. Towards a taxonomy of provenance in scientific workflow management systems. In Proceedings of the IEEE International Workshop on Scientific Workflows.
[35]
V. Cuevas-Vicenttín, S. Dey, M. L. Y. Wang, T. Song, and B. Ludäscher. 2012. Modeling and querying scientific workflow provenance in the D-OPM. In Proceedings of the Conference on High Performance Computing, Networking, Storage, and Analysis (SCC’12). 119--128.
[36]
V. Cuevas-Vicenttín, P. Kianmajd, B. Ludäscher, P. Missier, F. Chirigati, Y. Wei, D. Koop, and S. Dey. 2014. The PBase scientific workflow provenance repository. Int. J. Dig. Cur. 9, 2 (2014), 28--38.
[37]
S. M. S. da Cruz, P. M. Barros, P. M. Bisch, M. L. M. Campos, and M. Mattoso. 2008. Provenance services for distributed workflows. In Proceedings of the International Symposium on Cluster Computing and the Grid. 526--533.
[38]
S. Davidson, Y. Chen, P. Sun, and S. Cohen-Boulakia. 2009. On User Views in Scientific Workflow Systems, Vol. 526. NJIT.
[39]
S. Davidson, S. Cohen-Boulakia, A. Eyal, B. Ludascher, T. McPhillips, S. Bowers, M. K. Anand, and J. Freire. 2007. Provenance in scientific workflow systems. Bull. IEEE Comput. Soc. Techn. Commit. Data Eng. 30, 4 (2007), 44--50.
[40]
S. B. Davidson and J. Freire. 2008. Provenance and scientific workflows: Challenges and opportunities. In Proceedings of the ACM SIGMOD Conference. ACM, New York, NY, 1345--1350.
[41]
D. De Roure, C. Goble, and R. Stevens. 2009. The design and realisation of the virtual research environment for social sharing of workflows. Fut. Gener. Comput. Syst. 25, 5 (2009), 561--567.
[42]
E. Deelman, G. Singh, M. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and Daniel S. Katz. 2005. Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13, 3 (2005), 219--237.
[43]
N. Del Rio and P. da Silva. 2007. Probe-It! Visualization support for provenance. In Advances in Visual Computing. Springer, 732--741.
[44]
S. Dey, V. Cuevas-Vicenttín, Kohler, E. Gribkoff, M. Wang, and Ludäscher. 2013. On implementing provenance-aware regular path queries with relational query engines. In Proceedings of the Joint EDBT/ICDT 2013 Workshops (EDBT’13). ACM, New York, NY, 214--223.
[45]
S. Dey, S. Köhler, S. Bowers, and Ludäscher. 2012. Datalog as a Lingua Franca for provenance querying and reasoning. In Proceedings of the TaPP Conference.
[46]
R. Elmasri and S. Navathe. 2010. Fundamentals of Database Systems (6th ed.). Addison-Wesley.
[47]
T. Fahringer, R. Prodan, Rubing D. F. Nerieri, S. Podlipnig, Jun Q. M. Siddiqui, Hong-Linh T. A. Villazon, and M. Wieczorek. 2005. ASKALON: A grid application development and computing environment. In Proceedings of the IEEE/ACM International Workshop on Grid Computing. IEEE, Los Alamitos, CA, 122--131.
[48]
I. Foster, Vockler, M. Wilde, and Y. Zhao. 2002. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the International Conference on Scientific and Statistical Database Management. 37--46.
[49]
J. Freire, D. Koop, E. Santos, and C. T. Silva. 2008. Provenance for computational tasks: A survey. Comput. Sci. Eng. 10, 3 (2008), 11--21.
[50]
J. Freire and C. Silva. 2008a. Towards enabling social analysis of scientific data. In Proceedings of the CHI Social Data Analysis Workshop. ACM, New York, NY, 3977--3980.
[51]
J. Freire and C. T. Silva. 2008b. Simplifying the Design of Workflows for Large-Scale Data Exploration and Visualization. University of Utah.
[52]
J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. 2006. Managing rapidly-evolving scientific workflows. In Provenance and Annotation of Data. Lecture Notes in Computer Science, Vol. 4145. Springer, 10--18.
[53]
J. Frew and R. Bose. 2001. Earth system science workbench: A data management infrastructure for earth science products. In Proceedings of the 13th International Conference on Scientific and Statistical Database Management (SSDBM’01). 180--189.
[54]
J. Frew, D. Metzger, and P. Slaughter. 2008. Automatic capture and reconstruction of computational provenance. Concurr. Comput. Pract. Exp. 20, 5 (2008), 485--496.
[55]
L. M. R. Gadelha, M. Wilde, M. Mattoso, and I. Foster. 2012. MTCProv: A practical provenance query framework for many-task scientific computing. Distrib. Parallel Datab. 30, 5--6 (2012), 351--370.
[56]
W. Gaspar, R. Braga, and F. Campos. 2011. SciProv: An architecture for semantic query in provenance metadata on e-science context. In Information Technology in Bio- and Medical Informatics. Lecture Notes in Computer Science, Vol. 6865. Springer, 68--81.
[57]
J. C. A. R. Gonçalves, D. de Oliveira, K. A. C. S. Ocaña, E. S. Ogasawara, and M. Mattoso. 2012. Using domain-specific data to enhance scientific workflow steering queries. In Proceedings of the 4th International Provenance and Annotation Workshop (IPAW’12). 152--167.
[58]
L. A. Goodman. 1961. Snowball sampling. Ann. Math. Stat. 32, 1 (1961), 148--170.
[59]
P. J. Guo and M. Seltzer. 2012. BURRITO: Wrapping your lab notebook in computational infrastructure. In Proceedings of the Conference on Theory and Practice of Provenance. 1--4. http://dl.acm.org/citation.cfm?id=2342875.2342882
[60]
C. Hansen, C. R. Johnson, V. Pascucci, and C. T. Silva. 2011. Visualization for data-intensive science. In The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, 151--161.
[61]
M. Herschel and M. Hlawatsch. 2016. Provenance: On and behind the screens. In Proceedings of the International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, 2213--2217.
[62]
M. Hlawatsch, M. Burch, F. Beck, J. Freire, C. Silva, and D. Weiskopf. 2015. Visualizing the evolution of module workflows. In Proceedings of the 2015 19th International Conference on Information Visualisation. 40--49.
[63]
R. Hoekstra and P. Groth. 2014. PROV-O-Viz: Understanding the role of activities in provenance. In Provenance and Annotation of Data and Processes. Lecture Notes in Computer Science, Vol. 8628. Springer, 215--220.
[64]
D. Holland, U. Braun, D. Maclean, K. Muniswamy-Reddy, and M. Seltzer. 2008a. Choosing a data model and query language for provenance. In Proceedings of the International Provenance and Annotation Workshop (IPAW’08). 1--8.
[65]
D. A. Holland, M. I. Seltzer, U. Braun, and K.-K. Muniswamy-Reddy. 2008b. PASSing the provenance challenge. Concurr. Comput.: Pract. Exp. 20, 5 (2008), 531--540.
[66]
J. Howe. 2006. The rise of crowdsourcing. Wired 14, 6 (2006), 1--4.
[67]
H. Hu, Z. Liu, and H. Hu. 2012. Reconstructing unsound data provenance view in scientific workflow. In Web Technologies and Applications. Lecture Notes in Computer Science, Vol. 7234. Springer, 212--220.
[68]
D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, and T. Oinn. 2006. Taverna: A tool for building and running workflows of services. Nucl. Acids Res. 34, 2 (2006), 729--732.
[69]
L. Karsai, A. Fekete, J. Kay, and P. Missier. 2016. Clustering provenance facilitating provenance exploration through data abstraction. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics (HILDA’16). ACM, New York, NY, 6:1--6:5.
[70]
T. Kohwalter, T. Oliveira, J. Freire, E. Clua, and L. Murta. 2016. Prov viewer: A graph-based visualization tool for interactive exploration of provenance data. In Provenance and Annotation of Data and Processes. Lecture Notes in Computer Science, Vol 9672. Springer, 71--82.
[71]
C. Lim, S. Lu, A. Chebotko, and F. Fotouhi. 2011. OPQL: A first OPM-level query language for scientific workflow provenance. In Proceedings of the IEEE International Conference on Services Computing (SCC’11). 136--143.
[72]
C. Lim, S. Lu, A. Chebotko, F. Fotouhi, and A. Kashlev. 2013. OPQL: Querying scientific workflow provenance at the graph level. Data Knowl. Eng. 88 (2013), 37--59.
[73]
C. Lin, S. Lu, Z. Lai, X. Chebotko, A. Fei, J. Hua, and F. Fotouhi. 2008. Service-oriented architecture for VIEW: A visual scientific workflow management system. In Proceedings of the IEEE International Conference on Services Computing (SEE’08). IEEE, Los Alamitos, CA, 335--342.
[74]
A. Marinho, M. Mattoso, C. Werner, V. Braganholo, and L. Murta. 2011. Challenges in managing implicit and abstract provenance data: Experiences with ProvManager. In Proceedings of the TaPP Conference. 1--6.
[75]
P. Mates, E. Santos, J. Freire, and C. T. Silva. 2011. CrowdLabs: Social analysis and visualization for the sciences. In Scientific and Statistical Database Management. Lecture Notes in Computer Science, Vol. 6809. Springer, 555--564.
[76]
M. Mattoso, C. Werner, G. H. Travassos, V. Braganholo, E. Ogasawara, D. Oliveira, S. Cruz, W. Martinho, and L. Murta. 2010. Towards supporting the life cycle of large scale scientific experiments. Int. J. Bus. Process Integr. Manage. 5, 1 (2010), 79--92.
[77]
D. McGuinness, P. Silva, and C. Chang. 2004. IW-Base: Provenance Metadata Infrastructure for Explaining and Trusting Answers from the Web. Technical Report. Zuberee, West Tatra Mountains.
[78]
J. Miller. 2013. Graph database applications and concepts with Neo4j. In Proceedings of the SAIS Conference (SAIS’13).
[79]
L. Miller. 2001. RDF Squish Query Language. Retrieved from http://ilrt.org/discovery/2001/02/squish/.
[80]
P. Missier, J. Bryans, C. Gamble, V. Curcin, and R. Danger. 2013. Provenance Graph Abstraction by Node Grouping. Technical Report. School of Computing Science, University of Newcastle upon Tyne.
[81]
P. Missier, J. Bryans, C. Gamble, V. Curcin, and R. Danger. 2014. ProvAbs: Model, policy, and tooling for abstracting PROV graphs. In Provenance and Annotation of Data and Processes. Lecture Notes in Computer Science. Springer, 3--15.
[82]
P. Missier, N. W. Paton, and K. Belhajjame. 2010. Fine-grained and efficient lineage querying of collection-based workflow provenance. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT’10). ACM, New York, NY, 299--310.
[83]
P. Missier, S. Woodman, H. Hiden, and P. Watson. 2013. Provenance and data differencing for workflow reproducibility analysis. Concurr. Comput. Pract. Exper. 28, 4, 995--1015.
[84]
L. Moreau. 2015. Aggregation by provenance types: A technique for summarising provenance graphs. In Graphs as Models. University of Twente, Netherlands, 129--144.
[85]
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. V. den Bussche. 2011. The open provenance model core specification (v1.1). Fut. Gener. Comput. Syst. 27, 6 (2011), 743--756.
[86]
L. Moreau, B. Ludaedscher, I. Altintas, R. S. Barga, S. Bowers, S. Callahan, G. Chin, B. Clifford, S. Cohen, S. Cohen-Boulakia, S. Davidson, E. Deelman, L. Digiampietri, I. Foster, J. Freire, J. Frew, J. Futrelle, T. Gibson, Y. Gil, C. Goble, J. Golbeck, P. Groth, D. A. Holland, S. Jiang, J. Kim, D. Koop, A. Krenek, T. McPhillips, G. Mehta, S. Miles, D. Metzger, S. Munroe, J. Myers B. 2008. Special issue: The first provenance challenge. Concurr. Comput. Pract. Exp. 20, 5 (2008), 409--418.
[87]
L. Moreau and P. Missier. 2013. PROV-DM: The PROV Data Model. Retrieved from http://www.w3.org/TR/2013/REC-prov-dm-20130430.
[88]
Vitor C. Neves, Vanessa Braganholo, and Leonardo Murta. 2013. Implicit provenance gathering through configuration management. In Proceedings of the International Workshop on Software Engineering for Computational Science and Engineering. 92--95.
[89]
V. C. Neves, D. Oliveira, K. A. C. S. Ocaña, V. Braganholo, and L. Murta. 2017. Managing provenance of implicit data flows in scientific experiments. ACM Trans. Internet. Tech. Submitted.
[90]
E. Ogasawara, J. Dias, V. Silva, F. Chirigati, D. Oliveira, F. Porto, P. Valduriez, and M. Mattoso. 2013. Chiron: A parallel engine for algebraic scientific workflows. Concurr. Comput. 25, 16 (2013), 2327--2341.
[91]
T. Oinn, M. Addis, J. Ferris, D. Marvin, M. Senger, M. Greenwood, T. Carver, K. Glover, M. R. Pocock, A. Wipat, and P. Li. 2004. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 17 (2004), 3045--3054.
[92]
D. Oliveira, E. Ogasawara, F. Baião, and M. Mattoso. 2010. SciCumulus: A lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In Proceedings of the International Conference on Cloud Computing (CLOUD’10). 378--385.
[93]
W. Oliveira, P. Missier, K. Ocaña, D. de Oliveira, and V. Braganholo. 2016. Analyzing provenance across heterogeneous provenance graphs. In Provenance and Annotation of Data and Processes. Lecture Notes in Computer Science, Vol. 9672. Springer, 57--70.
[94]
E. Prud’hommeaux and A. Seaborne. 2008. SPARQL Query Language for RDF. Retrieved from http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/.
[95]
D. A. Quan and R. Karger. 2004. How to make a semantic web browser. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, New York, NY, 255--265.
[96]
E. D. Ragan, A. Endert, J. Sanyal, and J. Chen. 2016. Characterizing provenance in visualization and data analysis: An organizational framework of provenance types and purposes. IEEE Trans. Vis. Comput. Graph. 22, 1 (2016), 31--40.
[97]
N. Del Rio, P. P. da Silva, and H. Porras. 2010. Browsing proof markup language provenance: Enhancing the experience. In Provenance and Annotation of Data and Processes. Lecture Notes in Computer Science, Vol. 6378. Springer, 274--276.
[98]
D. De Roure, K. Belhajjame, J. Missier, P.and Manuel, R. Palma, J. E. Ruiz, K. Hettne, G. Klyne, M. Roos, and C. Goble. 2011. Towards the preservation of scientific workflows. In Proceedings of the 8th International Conference on Preservation of Digital Objects (iPRES’11). ACM, New York, NY, 1--4.
[99]
C. Scheidegger, D. Koop, E. Santos, H. Vo, S. Callahan, J. Freire, and C. Silva. 2008. Tackling the provenance challenge one layer at a time. Concurr. Comput.: Pract. Exp. 20, 5 (2008), 473--483.
[100]
M. I. Seltzer and P. Macko. 2011. Provenance map orbiter: Interactive exploration of large provenance graphs. In Proceedings of the 3rd USENIX Workshop on Theory and Practice of Provenance (TaPP’11).
[101]
P. Shannon, A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski, and T. Ideker. 2003. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 11 (2003), 2498--2504.
[102]
C. T. Silva, J. Freire, and S. P. Callahan. 2007. Provenance for visualizations: Reproducibility and beyond. Comput. Sci. Eng. 9, 5 (2007), 82--89.
[103]
C. T. Silva and J. Freire. 2008. Software infrastructure for exploratory visualization and data analysis: Past, present, and future. J. Phys.: Conf. Ser. 125, 1 (2008), 012100.
[104]
Y. L. Simmhan, B. Plale, and D. Gannon. 2006. A framework for collecting provenance in data-centric scientific workflows. In Proceedings of the IEEE International Conference on Web Services (ICWS’06). 427--436.
[105]
Y. L. Simmhan, B. Plale, and D. Gannon. 2005a. A survey of data provenance in e-science. ACM SIGMOD Rec. 34, 3 (2005), 31--36.
[106]
Y. L. Simmhan, B. Plale, and D. Gannon. 2005b. A Survey of Data Provenance Techniques. Technical report. Computer Science Department, Indiana University.
[107]
Y. L. Simmhan, B. Plale, and D. Gannon. 2008. Query capabilities of the karma provenance framework. Concurr. Comput.: Pract. Exp. 20, 5 (2008), 441--451.
[108]
Y. L. Simmhan, D. Plale, B. Gannon, and S. Marru. 2006. Performance evaluation of the karma provenance framework for scientific workflows. In Provenance and Annotation of Data. Lecture Notes in Computer Science, Vol. 4145. Springer, 222--236.
[109]
H. Stitz, S. Luger, M. Streit, and N. Gehlenborg. 2016. AVOCADO: Visualization of workflow-derived data provenance for reproducible biomedical research. Comput. Graph. Forum 35, 3, 481--490.
[110]
P. Sun, Z. Liu, S. Natarajan, S. B. Davidson, and Y. Chen. 2009. WOLVES: Achieving correct provenance analysis by detecting and resolving unsound workflow views. Proc. VLDB Endow. 2, 2 (2009), 1614--1617.
[111]
I. Suriarachchi, Q. Zhou, and B. Plale. 2015. Komadu: A capture and visualization system for scientific data provenance. J. Open Res. Softw. 3, 1 (2015), e4.
[112]
F. B. Viegas, M. Wattenberg, F. van Ham, J. Kriss, and M. McKeon. 2007. ManyEyes: A site for visualization at Internet scale. IEEE Trans. Vis. Comput. Graph. 13, 6 (2007), 1121--1128.
[113]
P. Watson, H. Hiden, and S. Woodman. 2010. E-science central for CARMEN: Science as a service. Concurr. Comput.: Pract. Exp. 22, 17 (2010), 2369--2380.
[114]
S. Woodman, H. Hiden, P. Watson, and P. Missier. 2011. Achieving reproducibility by combining provenance with service and workflow versioning. In Proceedings of the Workshop on Workflows in Support of Large-Scale Science (WORKS’11). ACM, New York, NY, 127--136.
[115]
J. Zhao, C. Goble, R. Stevens, and D. Turi. 2008. Mining Taverna’s semantic web of provenance. Concurr. Comput.: Pract. Exp. 20, 5 (2008), 463--472.
[116]
J. Zhao, C. Wroe, C. Goble, R. Stevens, D. Quan, and M. Greenwood. 2004. Using semantic web technologies for representing E-science provenance. In Proceedings of the Conference on the Semantic Web (ISWC’04). Lecture Notes in Computer Science, Vol. 3298. Springer, 92--106.
[117]
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, V. Nefedova, I. Raicu, T. Stef-Praun, and M. Wilde. 2007. Swift: Fast, reliable, loosely coupled parallel computation. In Proceedings of the IEEE World Congress on Services.
[118]
Y. Zhao, M. Wilde, and I. Foster. 2006. Applying the virtual data provenance model. In Provenance and Annotation of Data. Lecture Notes in Computer Science, Vol. 4145. Springer, 148--161. http://link.springer.com/chapter/10.1007/11890850_16

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 51, Issue 3
May 2019
796 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3212709
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2018
Accepted: 01 January 2018
Revised: 01 June 2017
Received: 01 December 2016
Published in CSUR Volume 51, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Provenance
  2. data analytics
  3. scientific experiments
  4. scientific workflows

Qualifiers

  • Survey
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)7
Reflects downloads up to 26 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media