skip to main content
research-article

Determining the Currency of Data

Published: 01 December 2012 Publication History

Abstract

Data in real-life databases become obsolete rapidly. One often finds that multiple values of the same entity reside in a database. While all of these values were once correct, most of them may have become stale and inaccurate. Worse still, the values often do not carry reliable timestamps. With this comes the need for studying data currency, to identify the current value of an entity in a database and to answer queries with the current values, in the absence of reliable timestamps.
This article investigates the currency of data. (1) We propose a model that specifies partial currency orders in terms of simple constraints. The model also allows us to express what values are copied from other data sources, bearing currency orders in those sources, in terms of copy functions defined on correlated attributes. (2) We study fundamental problems for data currency, to determine whether a specification is consistent, whether a value is more current than another, and whether a query answer is certain no matter how partial currency orders are completed. (3) Moreover, we identify several problems associated with copy functions, to decide whether a copy function imports sufficient current data to answer a query, whether a copy function can be extended to import necessary current data for a query while respecting the constraints, and whether it suffices to copy data of a bounded size. (4) We establish upper and lower bounds of these problems, all matching, for combined complexity and data complexity, and for a variety of query languages. We also identify special cases that warrant lower complexity.

References

[1]
Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley.
[2]
Berti-Equille, L., Sarma, A. D., Dong, X., Marian, A., and Srivastava, D. 2009. Sailing the information ocean with awareness of currents: Discovery and application of source dependence. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research.
[3]
Bertossi, L. 2006. Consistent query answering in databases. SIGMOD Rec. 35, 2, 68--76.
[4]
Bodirsky, M. and Kára, J. 2010. The complexity of temporal constraint satisfaction problems. J. ACM 57, 2.
[5]
Buneman, P., Cheney, J., Tan, W., and Vansummeren, S. 2008. Curated databases. In Proceedings of the 27th Symposium on Principles of Database Systems. 1--12.
[6]
Cheney, J., Chiticariu, L., and Tan, W. C. 2009. Provenance in databases: Why, how, and where. Found. Trends Datab. 1, 4, 379--474.
[7]
Chomicki, J. 2007. Consistent query answering: Five easy pieces. In Proceedings of the 11th International Conference on Database Theory. 1--17.
[8]
Chomicki, J. and Toman, D. 2005. Time in database systems. In Handbook of Temporal Reasoning in Artificial Intelligence, M. Fisher, D. Gabbay, and L. Vila Eds., Elsevier.
[9]
Clifford, J., Dyreson, C. E., Isakowitz, T., Jensen, C. S., and Snodgrass, R. T. 1997. On the semantics of “now” in databases. ACM Trans. Datab. Syst. 22, 2, 171--214.
[10]
Codd, E. F. 1979. Extending the database relational model to capture more meaning. ACM Trans. Datab. Syst. 4, 4, 397--434.
[11]
Deutsch, A., Nash, A., and Remmel, J. B. 2008. The chase revisited. In Proceedings of the 27th Symposium on Principles of Database Systems. 149--158.
[12]
Dong, X., Berti-Equille, L., and Srivastava, D. 2009. Truth discovery and copying detection in a dynamic world. Proc. VLDB Endov. 2, 1, 562--573.
[13]
Dong, X., Berti-Equille, L., Hu, Y., and Srivastava, D. 2010. Global detection of complex copying relationships between sources. Proc. VLDB Endov. 3, 1, 1358--1369.
[14]
Dyreson, C. E., Jensen, C. S., and Snodgrass, R. T. 2009. Now in temporal databases. In Encyclopedia of Database Systems, L. Liu and M. T. Özsu Eds., Springer.
[15]
Eckerson, W. W. 2002. Data quality and the bottom line: Achieving business success through a commitment to high quality data. Data Warehousing Institute.
[16]
Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. 2007. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Engin. 19, 1, 1--16.
[17]
Fan, W. and Geerts, F. 2011. Relative information completeness. ACM Trans. Datab. Syst. 35, 4.
[18]
Fan, W., Geerts, F., Jia, X., and Kementsietsidis, A. 2008. Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Datab. Syst. 33, 1.
[19]
Fan, W., Geerts, F., Li, J., and Xiong, M. 2011a. Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Engin. 23, 5, 683--698.
[20]
Fan, W., Geerts, F., and Wijsen, J. 2011b. Determining the currency of data. In Proceedings of the 30th Symposium on Principles of Database Systems. 71--82.
[21]
Garey, M. and Johnson, D. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company.
[22]
Grahne, G. 1991. The Problem of Incomplete Information in Relational Databases. Springer.
[23]
Grohe, M. and Schwandtner, G. 2009. The complexity of datalog on linear orders. Logical Methods Comput. Sci. 5, 1.
[24]
Imieliński, T. and Lipski Jr, W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791.
[25]
Knowledge Integrity. 2003. Two sides to data decay. DM Review.
[26]
Kolaitis, P. G. 2005. Schema mappings, data exchange, and metadata management. In Proceedings of the 24th Symposium on Principles of Database Systems. 61--75.
[27]
Koubarakis, M. 1994. Database models for infinite and indefinite temporal information. Inf. Syst. 19, 2, 141--173.
[28]
Koubarakis, M. 1997. The complexity of query evaluation in indefinite temporal constraint databases. Theor. Comput. Sci. 171, 1--2, 25--60.
[29]
Lenzerini, M. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st Symposium on Principles of Database Systems. 233--246.
[30]
Li, P., Dong, X. L., Mauricio, A., and Srivastava, D. 2011. Linking temporal records. Proc. VLDB Endow. 4, 11, 956--967.
[31]
Papadimitriou, C. H. 1994. Computational Complexity. Addison-Wesley.
[32]
Schwalb, E. and Vila, L. 1998. Temporal constraints: A survey. Constraints 3, 2--3, 129--149.
[33]
Snodgrass, R. T. 1999. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann.
[34]
Stockmeyer, L. J. 1976. The polynomial-time hierarchy. Theor. Comput. Sci. 3, 1, 1--22.
[35]
van der Meyden, R. 1997. The complexity of querying indefinite data about linearly ordered domains. J. Comput Syst. Sci. 54, 1, 113--135.
[36]
van der Meyden, R. 1998. Logical approaches to incomplete information: A survey. In Logics for Databases and Information Systems, J. Chomicki and G. Saake Eds., Kluwer.
[37]
Vianu, V. 1987. Dynamic functional dependencies and database aging. J. ACM 34, 1, 28--59.
[38]
Zhang, H., Diao, Y., and Immerman, N. 2010. Recognizing patterns in streams with imprecise timestamps. Proc. VLDB Endow. 3, 1, 244--255.

Cited By

View all

Recommendations

Reviews

Elizabeth A. Unger

Dirty data refers to inaccurate, incomplete, or erroneous data in a computer system. One form of dirty data is created when relational databases contain multiple relations for an entity. Determining which of the entity attribute values are correct or most probably correct, especially when the values do not carry timestamps, is the subject of this theoretical study. The authors propose a model using simple constraints to specify partial currency orders in the presence of copy functions. A simple copy function from one relation to another is assumed. Exploring the issues that occur when the copy function is used to create one or more tuples is the primary focus of the paper. The authors discuss what information and functions are needed to decide whether the imported information is sufficient to answer a query, and if it is not, to determine whether the copy function can be extended to bring in necessary current data for a query. Of course, this is done while assuring the constraints are met and the query set is size bounded. The paper identifies seven problems associated with data currency and currency preservation. The upper and lower bounds of these problems are determined for a variety of query languages. These results are of theoretic value and may be useful to database programmers as questions of the currency of data arise. However, most of the problems explored are intractable. The authors propose further studies using heuristic algorithms and incremental analysis to lower the complexity. A final concern is raised regarding the fundamental differences between data consistency and data currency. This paper is very readable and well organized, and the authors have made a significant intellectual addition to the study of the challenges faced in managing databases that are modified over their lifetimes. Data and database researchers and data administrators will find this paper worthwhile. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 37, Issue 4
December 2012
345 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/2389241
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2012
Accepted: 01 April 2012
Revised: 01 February 2012
Received: 01 October 2011
Published in TODS Volume 37, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Currency
  2. data quality

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)4
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media