skip to main content
10.1145/1353343.1353358acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free access

Schema mapping verification: the spicy way

Published: 25 March 2008 Publication History

Abstract

Schema mapping algorithms rely on value correspondences - i.e., correspondences among semantically related attributes - to produce complex transformations among data sources. These correspondences are either manually specified or suggested by separate modules called schema matchers. The quality of mappings produced by a mapping generation tool strongly depends on the quality of the input correspondences. In this paper, we introduce the Spicy system, a novel approach to the problem of verifying the quality of mappings. Spicy is based on a three-layer architecture, in which a schema matching module is used to provide input to a mapping generation module. Then, a third module, the mapping verification module, is used to check candidate mappings and choose the ones that represent better transformations of the source into the target. At the core of the system stands a new technique for comparing the structure and actual content of trees, called structural analysis. Experimental results show that, by carefully designing the comparison algorithm, it is possible to achieve both good scalability and high precision in mapping selection.

References

[1]
The Ontology Alignment Evaluation Initiative - 2007. http://oaei.ontologymatching.org/2007/.
[2]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[3]
V. V. Anshelevich. A Hierarchical Approach to Computer Hex. Artif. Intell., 134(1--2):101--120, 2002.
[4]
D. Aumueller, H. Do, Massmann S., and E. Rahm. Schema and Ontology Matching with COMA++. In Proc. of ACM SIGMOD, pages 906--908, 2005.
[5]
P. A. Bernstein and S. Melnik. Model management 2.0: Manipulating richer mappings. In Proc. of ACM SIGMOD, pages 1--12, 2007.
[6]
A. Bilke and F. Naumann. Schema Matching using Duplicates. In Proc. of ICDE, pages 69--80, 2005.
[7]
P. Bohannon, E. Elnahrawy, W. Fan, and M. Flaster. Putting Context into Schema Matching. In Proc. of VLDB, pages 307--318. VLDB Endowment, 2006.
[8]
A. Bonifati, E. Q. Chang, T. Ho, L. Lakshmanan, and R. Pottinger. HePToX: Marrying XML and Heterogeneity in Your P2P Databases. In Proc. of VLDB, pages 1267--1270, 2005.
[9]
L. Chiticariu and W. C. Tan. Debugging Schema Mappings with Routes. In Proc. of VLDB, pages 79--90, 2006.
[10]
P. R. Clayton. Fundamentals of Electric Circuit Analysis. John Wiley & Sons, 2001.
[11]
R. Dhamankar, Y. Lee, A. H. Doan, A. Halevy, and P. Domingos. iMAP: Discovering Complex Semantic Matches between Database Schemas. In Proc. of ACM SIGMOD, pages 383--394, 2004.
[12]
H. H. Do, S. Melnik, and E. Rahm. Comparison of Schema Matching Evaluations. In Proc. of the 2nd GI Workshop on Web Databases, pages 221--237, 2002.
[13]
H. H. Do and E. Rahm. COMA - A System for Flexible Combination of Schema Matching Approaches. In Proc. of VLDB, pages 610--621, 2002.
[14]
A. H. Doan, P. Domingos, and A. Halevy. Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. In Proc. of ACM SIGMOD, pages 509--520, 2001.
[15]
P. G. Doyle and J. L. Snell. Random Walks and Electric Networks. In Proc. of the Mathematical Associations of America, 1984.
[16]
C. Faloutsos. Indexing multimedia databases. In Proc. of ACM SIGMOD, page 467, New York, NY, USA, 1995. ACM Press.
[17]
A. Fuxman, M. A. Hernández, C. T. Howard, R. J. Miller, P. Papotti, and L. Popa. Nested Mappings: Schema Mapping Reloaded. In Proc. of VLDB, pages 67--78, 2006.
[18]
A. Gal. Managing Uncertainty in Schema Matching with Top-K Schema Mappings. J. of Data Semantics, VI:90--114, 2006.
[19]
A. Gal. Why is Schema Matching Tough and What We Can Do About It. Sigmod Record, 35(4):2--5, 2006.
[20]
A. Gal. The Generation Y of XML Schema Matching (Panel Description). In Proceedings of XML Database Symposium, pages 137--139, 2007.
[21]
L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio Grows Up: from Research Prototype to Industrial Tool. In Proc. of ACM SIGMOD, pages 805--810, 2005.
[22]
J. Kang and J. F. Naughton. On Schema Matching with Opaque Column Names and Data Values. In Proc. of ACM SIGMOD, pages 205--216, 2003.
[23]
W. S. Li and C. Clifton. SEMINT: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases using Neural Networks. Data and Know. Eng., 33(1):49--84, 2000.
[24]
S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching. In Proc. of ICDE, pages 117--128, 2002.
[25]
R. J. Miller, L. M. Haas, and M. A. Hernandez. Schema Mapping as Query Discovery. In Proc. of VLDB, pages 77--99, 2000.
[26]
T. Milo and S. Zohar. Using Schema Matching to Simplify Heterogeneous Data Translation. In Proc. of VLDB, pages 122--133, 1998.
[27]
F. Naumann, C.-T. Ho, X. Tian, L. M. Haas, and N. Megiddo. Attribute Classification Using Feature Analysis. In Proc. of ICDE, page 271, 2002.
[28]
C. R. Palmer and C. Faloutsos. Electricity-Based External Similarity of Categorical Attributes. In Proc. of PAKDD, pages 486--500, 2003.
[29]
R. Pierce. An Introduction to Information Theory. Dover Publications, 1980.
[30]
L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernandez, and R. Fagin. Translating Web Data. In Proc. of VLDB, pages 598--609, 2002.
[31]
E. Rahm and P. A. Bernstein. A Survey of Approaches to Automatic Schema Matching. VLDB J., 10:334--350, 2001.
[32]
P. Shvaiko and J. Euzenat. A Survey of Schema Based Matching Approaches. J. of Data Semantics, IV - LNCS 3730:146--171, 2005.
[33]
W. Su, J. Wang, and F. Lochovsky. Holistic Schema Matching for Web Query Interfaces. In Proc. of EDBT, pages 77--94, 2006.
[34]
L. L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. Data Driven Understanding and Refinement of Schema Mappings. In Proc. of ACM SIGMOD, pages 485--496, 2001.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '08: Proceedings of the 11th international conference on Extending database technology: Advances in database technology
March 2008
762 pages
ISBN:9781595939265
DOI:10.1145/1353343
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

EDBT '08

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)19
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media