skip to main content
article
Free access

Meaningful change detection in structured data

Published: 01 June 1997 Publication History

Abstract

Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much more challenging than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional “atomic” insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP-hard, in this paper we present a heuristic change detection algorithm that yields close to “minimal” descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of computing a minimum-cost edge cover of a bipartite graph. We study the quality of the solution produced by our algorithm, as well as the running time, both analytically and experimentally.

References

[1]
S. Chawathe and H. Garcia-Molina. Meaningful change detection in structured data. Available at URL http://w~ra-db, stanford, edu, 1997. Extended version.
[2]
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The Tsimmis project: Integration of heterogeneous information sources. In Proceedings of lOOth Anniversary Meeting of the information Processing Society of Japan, pages 7-18, Tokyo, Japan, October 1994.
[3]
S. Chawathe, A. Rajaraman, H. Garcia- Mofina, and J. Widom. Change detection in hierarchically structured information. In Proceedings o} the A CM SIGMOD International Conference on Management o} Data, pages 493-504, Montreal, Quebec, June 1996.
[4]
M. Haertel, D. Hayes, R. Stallman, L. Tower, P. Eggert., and W. Davison. The GNU diff program. Texinfo system documentation. Available by anonymous FTP from prep. ai .mit. edu.
[5]
E. Lawler. Combinatorial Optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.
[6]
W. Labio and H. Garcia-Molina. Efficient snapshot differential algorithms for data warehousing. In Proceedings of the International Conference on Very Large Data Bases, Bombay, India, September 1996.
[7]
E. Myers. An O(UO) difference algorithm and its variations. Algorithmica, 1(2):251-266, 1986.
[8]
C. Papadimitriou and K. Steiglitz. Combinatorial Optimization. Prentice-Hall, 1982.
[9]
E. Rothberg. The wmatch program for finding a maximum-weight matching for undirected graphs. Live OR collection. Available at URL http ://w~. orsoc, org. uk.
[10]
D. Shasha, J. Wang, K. Zhang, and F. Shih. Exact and approximate algorithms for unordered tree matching. IEEE Transactions on Systems, Man, and Cybernetics, 24(4):668-678, April 1994.
[11]
D. Shasha and K. Zhang. Fast algorithms for the unit cost editing distance between trees. Journal o} Algorithms, 11:581-621, 1990.
[12]
R. Wagner. On the complexity of the extended string-to-string correction problem. In Seventh A CM Symposium on the Theory o} Computation, 1975.
[13]
R. Wagner and M. Fischer. The string-to-string correction problem. Journal o} the Association of Computing Machinery, 21(1):168-173, January 1974.
[14]
S. Wu, U. Manber, and G.Myers. An O(NP) sequence comparison algorithm, ln}ormation Processing Letters, 35:317-323, September 1990.
[15]
J. Widom and J. Ullman. The C3 project: Changes, consistency, and configurations in heterogeneous distributed information systems. Unpublished manuscript; available at URL http://www-db, stan~ord, edu, 1995.
[16]
K. Zhang and D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 18(6):1245-1262, 1989.
[17]
K. Zhang, J. Wang, and D. Shasha. On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science, 1995.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 26, Issue 2
June 1997
583 pages
ISSN:0163-5808
DOI:10.1145/253262
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data
    June 1997
    594 pages
    ISBN:0897919114
    DOI:10.1145/253260
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1997
Published in SIGMOD Volume 26, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)155
  • Downloads (Last 6 weeks)26
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media