skip to main content
research-article

Collaborative data sharing via update exchange and provenance

Published: 05 September 2013 Publication History

Abstract

Recent work [Ives et al. 2005] proposed a new class of systems for supporting data sharing among scientific and other collaborations: this new collaborative data sharing system connects heterogeneous logical peers using a network of schema mappings. Each peer has a locally controlled and edited database instance, but wants to incorporate related data from other peers as well. To achieve this, every peer's data and updates propagate along the mappings to the other peers. However, this operation, termed update exchange, is filtered by trust conditions—expressing what data and sources a peer judges to be authoritative—which may cause a peer to reject another's updates. In order to support such filtering, updates carry provenance information.
This article develops methods for realizing such systems: we build upon techniques from data integration, data exchange, incremental view maintenance, and view update to propagate updates along mappings, both to derived and optionally to source instances. We incorporate a novel model for tracking data provenance, such that curators may filter updates based on trust conditions over this provenance. We implement our techniques in a layer above an off-the-shelf RDBMS, and we experimentally demonstrate the viability of these techniques in the Orchestra prototype system.

References

[1]
Bairoch, A. and Apweiler, R. 2000. The swiss-prot protein sequence database and its supplement trembl. Nucleic Acids Res. 28, 1, 45--48.
[2]
Bancilhon, F. and Spyratos, N. 1981. Update semantics of relational views. ACM Trans. Datab. Syst. 6, 4, 557--575.
[3]
Benjelloun, O., Sarma, A. D., Halevy, A. Y., and Widom, J. 2006. ULDBs: Databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06).
[4]
Bernstein, P. A., Giunchiglia, F., Kementsietsidis, A., Mylopoulos, J., Serafini, L., and Zaihrayeu, I. 2002. Data management for peer-to-peer computing: A vision. In Proceedings of the 5th International Workshop on the Web and Databases (WebDB'02).
[5]
Bohannon, A., Pierce, B. C., and Vaughan, J. A. 2006. Relational lenses: A language for updateable views. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'06). 338--347.
[6]
Boulakia, S. C., Biton, O., Davidson, S. B., and Froidevaux, C. 2007. BioGuideSRS: Querying multiple sources with a user-centric perspective. Bioinf. 23, 10, 1301--1303.
[7]
Buneman, P., Khanna, S., and Tan, W. C. 2001. Why and where: A characterization of data provenance. In Proceedings of the 8th International Conference on Database Theory (ICDT'01). 316--330.
[8]
Calvanese, D., Giacomo, G. D., Lenzerini, M., and Rosati, R. 2004. Logical foundations of peer-to-peer data integration. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'04). 241--251.
[9]
Carey, M. J., Florescu, D., Ives, Z. G., Lu, Y., Shanmugasundaram, J., Shekita, E., and Subramanian, S. 2000. XPERANTO: Publishing object-relational data as xml. In Proceedings of the 3rd International Workshop on the Web and Databases (WebDB'00).
[10]
Cheney, J., Chiticariu, L., and Tan, W. C. 2009. Provenance in databases: Why, how, and where. Foundat. Trends Databases 1, 4.
[11]
Chiticariu, L. and Tan, W.-C. 2006. Debugging schema mappings with routes. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06). 79--90.
[12]
Cui, Y. 2001. Lineage tracing in data warehouses. Ph.D. thesis, Stanford University.
[13]
Cui, Y. and Widom, J. 2001. Lineage tracing for general data warehouse transformations. Int. J. Very Large Databases 12, 1, 41--58.
[14]
Dayal, U. and Bernstein, P. A. 1982. On the correct translation of update operations on relational views. ACM Trans. Datab. Syst. 7, 3, 381--416.
[15]
Deutsch, A., Popa, L., and Tannen, V. 2006. Query reformulation with constraints. ACM SIGMOD Rec. 35, 1, 65--73.
[16]
Deutsch, A. and Tannen, V. 2005. Xml queries and constraints, containment and reformulation. Theor. Comput. Sci. 336, 1, 57--87.
[17]
Duschka, O. M. and Genesereth, M. R. 1997. Answering recursive queries using views. In Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'97). 109--116.
[18]
Enderton, H. B. 1972. A Mathematical Introduction to Logic 1st Ed. Academic Press.
[19]
Fagin, R., Kolaitis, P., Miller, R. J., and Popa, L. 2005. Data exchange: Semantics and query answering. In Proceedings of the 9th International Conference on Database Theory (ICDT'05). 207--224.
[20]
Fan, H. and Poulovassilis, A. 2005. Using schema transformation pathways for data lineage tracing. In Proceedings of the 22nd British National Conference on Databases (BNCOD'05). Vol. 1.
[21]
Friedman, M., Levy, A. Y., and Millstein, T. D. 1999. Navigational plans for data integration. In Proceedings of the 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference (AAAI'99). 67--73.
[22]
Fuxman, A., Kolaitis, P. G., Miller, R. J., and Tan, W.-C. 2005. Peer data exchange. ACM Trans. Datab. Syst. 31, 4, 1454--1498.
[23]
Gatterbauer, W. and Suciu, D. 2010. Data conflict resolution using trust relationships. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10).
[24]
Green, T. J., Karvounarakis, G., Ives, Z. G., and Tannen, V. 2007a. Update exchange with mappings and provenance. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB'07). 675--686.
[25]
Green, T. J., Karvounarakis, G., and Tannen, V. 2007b. Provenance semirings. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'07). 31--40.
[26]
Green, T. J., Taylor, N., Karvounarakis, G., Biton, O., Ives, Z., and Tannen, V. 2007c. Orchestra: Facilitating collaborative data sharing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07).
[27]
Gupta, A. and Mumick, I. S. 1995. Maintenance of materialized views: Problems, techniques, and applications. Data Engin. Bull. 18, 2.
[28]
Gupta, A., Mumick, I. S., and Subrahmanian, V. S. 1993. Maintaining views incrementally. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'93).
[29]
Halevy, A. Y., Ives, Z. G., Suciu, D., and Tatarinov, I. 2003. Schema mediation in peer data management systems. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03).
[30]
He, H., Wang, H., Yang, J., and Yu, P. S. 2007. Blinks: Ranked keyword searches on graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07).
[31]
Hernich, A. and Schweikardt, N. 2007. CWA-solutions for data exchange settings with target dependencies. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'07).
[32]
Ives, Z., Khandelwal, N., Kapur, A., and Cakir, M. 2005. Orchestra: Rapid, collaborative sharing of dynamic data. In Proceedings of the Conference on Innovative Data Systems Research (CIDR'05).
[33]
Karvounarakis, G. 2009. Provenance for collaborative data sharing. Ph.D. thesis, University of Pennsylvania.
[34]
Karvounarakis, G. and Ives, Z. G. 2008. Bidirectional mappings for data and update exchange. In Proceedings of the 11th International Workshop on the Web and Databases (WebDB'08).
[35]
Karvounarakis, G., Ives, Z. G., and Tannen, V. 2010. Querying data provenance. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10).
[36]
Keller, A. M. 1985. Algorithms for translating view updates to database updates for views involving selections, projections, and joins. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'85).
[37]
Kementsietsidis, A., Arenas, M., and Miller, R. J. 2003. Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03).
[38]
Kot, L. and Koch, C. 2009. Cooperative update exchange in the youtopia system. Proc. VLDB Endow. 2, 1, 193--204.
[39]
Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'02). 233--246.
[40]
Levy, A. Y., Rajaraman, A., and Ordille, J. J. 1996. Querying heterogeneous information sources using source descriptions. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96). 251--262.
[41]
Libkin, L. 2006. Data exchange and incomplete information. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'06). 60--69.
[42]
Lu, J. J., Moerkotte, G., Schue, J., and Subrahmanian, V. 1995. Efficient maintenance of materialized mediated views. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'95). 340--351.
[43]
McBrien, P. J. and Poulovassilis, A. 2006. P2P query reformulation over both-as-view data transformation rules. In Proceedings of the 4th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P'06).
[44]
Mork, P., Shaker, R., Halevy, A., and Tarczy-Hornoch, P. 2002. PQL: A declarative query language over dynamic biological schemata. In Proceedings of the American Medical Informatics Association Symposium (AMIA'02).
[45]
Mumick, I. S., Pirahesh, H., and Ramakrishnan, R. 1990. The magic of duplicates and aggregates. In Proceedings of the 16th International Conference on Very Large Databases (VLBD'90). 264--277.
[46]
Mumick, I. S. and Shmueli, O. 1993. Finiteness properties of database queries. In Proceedings of the 4th Australian Database Conference.
[47]
Popa, L., Velegrakis, Y., Miller, R. J., Hernandez, M. A., and Fagin, R. 2002. Translating web data. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02). 598--609.
[48]
Talukdar, P. P., Jacob, M., Mehmood, M. S., Crammer, K., Ives, Z. G., Pereira, F., and Guha, S. 2008. Learning to create data-integrating queries. Proc. VLDB Endow. 1, 1, 785--796.
[49]
Taylor, N. E. and Ives, Z. G. 2006. Reconciling while tolerating disagreement in collaborative data sharing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 13--24.
[50]
Taylor, N. E. and Ives, Z. G. 2010. Reliable storage and querying for collaborative data sharing systems. In Proceedings of the 26th IEEE International Conference on Data Engineering (ICDE'10). 40--51.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 38, Issue 3
August 2013
266 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2508020
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 September 2013
Accepted: 01 June 2013
Revised: 01 April 2013
Received: 01 April 2012
Published in TODS Volume 38, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data exchange
  2. data provenance
  3. schema mappings
  4. systems
  5. update exchange

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)1
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media