skip to main content
research-article
Public Access

Capturing Missing Tuples and Missing Values

Published: 11 May 2016 Publication History

Abstract

Databases in real life are often neither entirely closed-world nor entirely open-world. Databases in an enterprise are typically partially closed, in which a part of the data is constrained by master data that contains complete information about the enterprise in certain aspects. It has been shown that, despite missing tuples, such a database may turn out to have complete information for answering a query.
This article studies partially closed databases from which both tuples and attribute values may be missing. We specify such a database in terms of conditional tables constrained by master data, referred to as c-instances. We first propose three models to characterize whether a c-instance T is complete for a query Q relative to master data. That is, depending on how missing values in T are instantiated, the answer to Q in T remains unchanged when new tuples are added. We then investigate three problems, to determine (a) whether a given c-instance is complete for a query Q, (b) whether there exists a c-instance that is complete for Q relative to master data available, and (c) whether a c-instance is a minimal-size database that is complete for Q. We establish matching lower and upper bounds on these problems for queries expressed in a variety of languages in each of the three models for specifying relative completeness.

Supplementary Material

a10-deng-apndx.pdf (deng.zip)
Supplemental movie, appendix, image and software files for, Capturing Missing Tuples and Missing Values

References

[1]
Serge Abiteboul and Oliver M. Duschka. 1998. Complexity of answering queries using materialized views. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’98). ACM, New York, NY, 254--263.
[2]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley, New York, NY.
[3]
Serge Abiteboul, Paris C. Kanellakis, and Gösta Grahne. 1991. On the representation and querying of sets of possible worlds. Theoretical Computer Science 78, 1, 159--187.
[4]
Marcelo Arenas, Leopoldo Bertossi, and Jan Chomicki. 1999. Consistent query answers in inconsistent databases. In Proceedings of the 18th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’99). ACM, New York, NY, 68--79.
[5]
Marcelo Arenas, Jorge Pérez, Juan L. Reutter, and Cristian Riveros. 2009. Composition and inversion of schema mappings. SIGMOD Record 38, 3, 17--28.
[6]
Yang Cao, Ting Deng, Wenfei Fan, and Floris Geerts. 2014. On the data complexity of relative information completeness. Information Systems 45, 18--34.
[7]
Jan Chomicki. 2007. Consistent query answering: Five easy pieces. In Proceedings of the 11th International Conference on Database Theory (ICDT’07). Springer-Verlag, New York, NY, 1--17.
[8]
Charles Elkan. 1990. Independence of logic database queries and update. In Proceedings of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, NY, 154--160.
[9]
Wenfei Fan. 2008. Dependencies revisited for improving data quality. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’08). ACM, New York, NY, 159--170.
[10]
Wenfei Fan and Floris Geerts. 2009. Relative information completeness. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’09). ACM, New York, NY, 97--106.
[11]
Wenfei Fan and Floris Geerts. 2010a. Capturing missing tuples and missing values. In Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’10). ACM, New York, NY, 169--178.
[12]
Wenfei Fan and Floris Geerts. 2010b. Relative information completeness. ACM Transactions on Database Systems 35, 4, Article 27, 44 pages.
[13]
Wenfei Fan and Floris Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers, San Francisco, CA.
[14]
Georg Gottlob and Roberto Zicari. 1988. Closed world databases opened through null values. In Proceedings of the 14th International Conference on Very Large Data Bases (VLDB’88). Morgan Kaufmann Publishers Inc., San Francisco, CA, 50--61. http://dl.acm.org/citation.cfm?id=645915.671794
[15]
Gösta Grahne. 1991. The Problem of Incomplete Information in Relational Databases. Lecture Notes in Computer Science, Vol. 554. Springer.
[16]
Tomasz Imieliński and Witold Lipski, Jr. 1984. Incomplete information in relational databases. Journal of the ACM 31, 4, 761--791.
[17]
Phokion G. Kolaitis. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’05). ACM, New York, NY, 61--75.
[18]
Willis Lang, Rimma V. Nehme, Eric Robinson, and Jeffrey F. Naughton. 2014. Partial results in database systems. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD’14). ACM, New York, NY, 1275--1286.
[19]
Maurizio Lenzerini. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’02). ACM, New York, NY, 233--246.
[20]
Alon Levy, Inderpal Singh Mumick, Yehoshua Sagiv, and Oded Shmueli. 1993. Equivalence, query-reachability and satisfiability in datalog extensions. In Proceedings of the 12th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’93). ACM, New York, NY, 109--122.
[21]
Alon Y. Levy. 1996. Obtaining complete answers from incomplete databases. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB’96). Morgan Kaufmann Publishers Inc., San Francisco, CA, 402--412. http://dl.acm.org/citation.cfm?id=645922.673332
[22]
Alon Y. Levy and Yehoshua Sagiv. 1993. Queries independent of updates. In Proceedings of the 19th International Conference on Very Large Data Bases (VLDB’93). Morgan Kaufmann Publishers Inc., San Francisco, CA, 171--181. http://dl.acm.org/citation.cfm?id=645919.672674
[23]
Leonid Libkin. 2014. Incomplete data: What went wrong, and how to fix it. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’14). ACM, New York, NY, 1--13.
[24]
David Loshin. 2008. Master Data Management. Morgan Kaufmann Publishers Inc., San Francisco, CA.
[25]
Microsoft. 2008. SQL Server 2008 R2 Master Data Services. Retrieved April 3, 2016 from http://www.microsoft.com/sqlserver/2008/en/us/MDS.aspx.
[26]
Donald W. Miller Jr., John D. Yeast, and Robin L. Evans. 2005. Missing prenatal records at a birth center: A communication problem quantified. In AMIA Annual Symposium Proceedings (AMIA’05). 535--539.
[27]
Amihai Motro. 1989. Integrity = validity + completeness. ACM Transactions on Database Systems 14, 4, 480--502.
[28]
Dan Olteanu, Christoph Koch, and Lyublena Antova. 2008. World-set decompositions: Expressiveness and efficient algorithms. Theoretical Computer Science 403, 2--3, 265--284.
[29]
Christos H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley, New York, NY.
[30]
John Radcliffe and Andrew White. 2008. Key issues for Master Data Management. Gartner. Retrieved April 3, 2016 from https://www.gartner.com/doc/590507/key-issues-master-data-management.
[31]
Luc Segoufin and Victor Vianu. 2005. Views and queries: Determinacy and rewriting. In Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS’05). ACM, New York, NY, 49--60.
[32]
Mark Spielmann. 2000. Abstract State Machines: Verification Problems and Complexity. Ph.D. Dissertation. RWTH Aachen University, Aachen, Germany.
[33]
Boris Trakhtenbrot. 1950. The impossibility of an algorithm for the decidability problem on finite classes. Doklady Akademii Nauk SSSR 70, 4, 569--572.
[34]
Ron van der Meyden. 1998. Logical approaches to incomplete information: A survey. In Logics for Databases and Information Systems, J. Chomicki and G. Saake (Eds.). Kluwer, 307--356.
[35]
Moshe Vardi. 1986. On the integrity of databases with incomplete information. In Proceedings of the 5th ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS’86). ACM, New York, NY, 252--266.
[36]
Michael Wooldridge and Paul E. Dunne. 2004. On the computational complexity of qualitative coalitional games. Artificial Intelligence 158, 1, 27--73.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 41, Issue 2
Invited Paper from SIGMOD 2014 and Regular Papers
June 2016
271 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2936309
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2016
Accepted: 01 December 2015
Revised: 01 October 2015
Received: 01 March 2015
Published in TODS Volume 41, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Incomplete information
  2. complexity
  3. master data management
  4. partially closed databases
  5. relative completeness

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)11
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media