skip to main content
article

Matching large schemas: Approaches and evaluation

Published: 01 September 2007 Publication History

Abstract

Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas.

References

[1]
H.H. Do, S. Melnik, E. Rahm, Comparison of schema matching evaluations, in: Proceedings of the International Workshop Web and Databases, Lecture Notes in Computer Science, vol. 2593, Springer, Berlin, 2003.
[2]
A.H. Doan, A. Halevy, Semantic integration research in the database community: a brief survey. AI Magazine, Special Issue on Semantic Integration, 2005.
[3]
Y. Kalfoglou, M. Schorlemmer, Ontology mapping-the state of the art, Knowl. Eng. Rev. 18(1) (2003).
[4]
N.F. Noy, Semantic integration-a survey of ontology-based approaches, SIGMOD Rec. 33(4) (2004).
[5]
E. Rahm, P.A. Bernstein, A survey of approaches to automatic schema matching, VLDB J. 10(4) (2001).
[6]
H.H. Do, E. Rahm, COMA-a system for flexible combination of match algorithms, in: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), 2002.
[7]
D. Aumüller, H.H.Do, S. Massmann, E. Rahm, Schema and ontology matching with COMA++ (software demonstration), in: Proceedings of the 24th ACM SIGMOD International Conference on Management of Data (SIGMOD), 2005.
[8]
E. Rahm, H.H. Do, S. Massmann, Matching large XML schemas, SIGMOD Rec. 33(4) (2004).
[9]
J. Euzenat, An API for ontology alignment, in: Proceedings of the third International Semantic Web Conference (ISWC), 2004.
[10]
D. Lee, W. Chu, Comparative analysis of six XML schema languages, ACM SIGMOD Rec. 29(3) (2000).
[11]
Hall, P. and Dowling, G., Approximate string matching. ACM Comput. Survey. v12 i4. 381-402.
[12]
Navarro, G., A guided tour to approximate string matching. ACM Comput. Surveys. v33 i1. 31-88.
[13]
H.H. Do, Schema matching and mapping-based data integration, Dissertation, University of Leipzig, Germany, 2005. http://lips.informatik.uni-leipzig.de:80/pub/2006-4.
[14]
J. Madhavan, P.A. Bernstein, E. Rahm, Generic schema matching with Cupid, in: Proceedings of the 27th Inernational Conference on Very Large Data Bases (VLDB), 2001.
[15]
P.A. Bernstein, S. Melnik, M. Petropoulos, C. Quix, Industrial-strength schema matching, ACM SIGMOD Rec. 33(4) (2004).
[16]
J. Lu, J. Wang, S. Wang, An experiment on the matching and reuse of XML schemas, in: Proceedings of the Inernational Conference on Web Engineering (ICWE), Lecture Notes on Computer Science, vol. 3579, Springer, Berlin, 2005.
[17]
P. Shvaiko, J. Euzenat, A classification of schema-based matching approaches, in: Proceedings of the Meaning Coordination and Negotiation Workshop at ISWC'04, 2004.
[18]
Li, W.S. and Clifton, C., SemInt-a tool for identifying attribute correspondences in heterogeneous databases using neural network. Data Knowl. Eng. v33 i1. 49-84.
[19]
A.H. Doan, P. Domingos, A. Halevy, Reconciling schemas of disparate data sources-a machine-learning approach, in: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2001.
[20]
L. Palopoli, G. Terracina, D. Ursino, The system DIKE-towards the semi-automatic synthesis of cooperative information systems and data warehouses, ADBIS-DASFAA, 2000.
[21]
S. Melnik, H. Garcia-Molina, E. Rahm, Similarity flooding-a versatile graph matching algorithm, in: Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[22]
Bergamaschi, S., Castano, S., Vincini, M. and Beneventano, D., Semantic integration of heterogeneous information sources. Data Knowl. Eng. v36 i3. 215-249.
[23]
M. Lee, L. Yang, W. Hsu, X. Yang, Xclust-clustering XML schemas for effective integration, in: Proceedings of the International Conference on Information and Knowledge Management (CIKM), 2002.
[24]
J. Berlin, A. Motro. Autoplex, Automated discovery of content for virtual databases, in: Proceedings of the nineth International Conference on Cooperative Information Systems (CoopIS), 2001.
[25]
J. Berlin, A. Motro, Database schema matching using machine learning with feature selection, in: Proceedings of the 14th International Conference on Advanced Information Systems Engineering (CAiSE), 2002.
[26]
A. Bilke, F. Naumann, Schema matching using duplicates, in: Proceedings of the 21st International Conference on Data Engineering (ICDE), 2005.
[27]
B. He, K. Chang, Statistical schema matching across Web query interfaces, in: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) 2003.
[28]
H. He, W. Meng, C. Yu, Z. Wu, WISE-Integrator-an automatic integrator of Web search interfaces for E-commerce, in: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), 2003.
[29]
A.H. Doan, J. Madhavan, P. Domingos, A. Halevy, Learning to map between ontologies on the semantic web, in: Proceedings of the 11th International World Wide Web Conference (WWW), 2002.
[30]
M, Ehrig, S. Staab: QOM-quick ontology mapping, in: Proceedings of the Third International Semantic Web Conference (ISWC), 2004.
[31]
J. Euzenat, D. Loup, M. Touzani, P. Valtchev, Ontology alignment with OLA, in: Proceedings of the Third International Workshop on Evaluation of Ontology-based Tools (EON), 2004.
[32]
T. Hoshiai, Y. Yamane, D. Nakamura, H. Tsuda. A semantic category matching approach to ontology alignment, in: Proceedings of the Third International Workshop on Evaluation of Ontology-based Tools (EON), 2004.
[33]
Noy, N.F. and Musen, M.A., The PROMPT suite: interactive tools for ontology merging and mapping. Int. J. Human-Comput. Stud. v59 i6. 983-1024.
[34]
L.M. Haas, M.A. Hernández, H. Ho, L. Popa, M. Roth, Clio grows up: from research prototype to industrial tool, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, 2005, pp. 805-810.
[35]
F. Naumann, C.T. Ho, X. Tian, L.M. Haas, N. Megiddo, Attribute classification using feature analysis (poster), in: Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[36]
L. Popa, M. Hernández, Y. Velegrakis, R. Miller, Mapping XML and relational schemas with Clio (software demonstration), in: Proceedings of the 18th International Conference on Data Engineering (ICDE), 2002.
[37]
R. Dhamankar, Y. Lee, A. Doan, A. Halevy, P. Domingos, iMAP-discovering complex semantic matches between database schemas, in: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) 2004.
[38]
E. Dragut, R. Lawrence, Composing mappings between schemas using a reference ontology, in: Proceedings of the International Conference on Ontologies, Databases, and Applications of Semantics (ODBASE), 2004.
[39]
M. Sayyadian, Y. Lee, A. Doan, A. Rosenthal, eTuner-tuning schema matching software using synthetic scenarios, in: Proceedings of the 31rst International Conference on Very Large Databases (VLDB), 2005.
[40]
K.W. Tu, Y. Yu, CMC-combining multiple schema-matching strategies based on credibility prediction, in: Proceedings of the 10th International Conference on Database Systems for Advanced Applications (DASFAA), 2005.
[41]
D. Beneventano, S. Bergamaschi, F. Guerra, M. Vincini, Synthesizing an integrated ontology. IEEE Internet Computing Magazine, September-October, 2003.
[42]
F. Giunchiglia, P. Shvaiko, M. Yatskevich: S-Match, an algorithm and an implementation of semantic matching, in: Proceedings of the First European Semantic Web Symposium (ESWS), 2004.
[43]
J. Madhavan, P.A. Bernstein, A.H. Doan, A. Halevy, Corpus-based schema matching, in: Proceedings of the 21st International Conference on Data Engineering (ICDE), 2005.
[44]
L. Xu, D. Embley, Discovering direct and indirect matches for schema elements, in: Proceedings of the Eighth International Conference on Database Systems for Advanced Applications (DASFAA), 2003.
[45]
A. Bonifati, E.Q. Chang, T. Ho, L.V.S. Lakshmanan, R. Pottinger, HePToX-marrying XML and heterogeneity in Your P2P databases (software demonstration), in: Proceedings of the 31st International Conference on Very Large Databases (VLDB), 2005.
[46]
P. Mork, P.A. Bernstein, Adapting a generic match algorithm to align ontologies of human anatomy, in: Proceedings of the 20th International Conference on Data Engineering (ICDE), 2004.
[47]
N.F. Noy, M.A. Musen, Evaluating ontology-mapping tools: requirements and experience, in: Proceedings of the International Workshop on Evaluation of Ontology-based Tools (EON), 2002.
[48]
W.S. Li, C. Clifton, S.Y. Liu, Database Integration Using Neural Networks: Implementation and Experiences. Knowl. Inform. Systems 2(1) (2000).
[49]
W.S. Li, C. Clifton, Semantic integration in heterogeneous databases using neural networks, in: Proceedings of the 20th International Conference on Very Large Databases (VLDB), 1994.
[50]
P. Avesani, F. Giunchiglia, M. Yatskevich, A large scale taxonomy mapping evaluation, in: Proceedings of the fourth Intl. Semantic Web Conference (ISWC), 2005.
[51]
F. Giunchiglia, M. Yatskevich, E. Giunchiglia, Efficient semantic matching, in: Proceedings of the Second European Semantic Web Conference (ESWC), 2005.
[52]
EON Ontology Alignment Contest, http://co4.inrialpes.fr/align/Contest/.
[53]
J. Euzenat, Introduction to the EON experiment, in: Proceedings of the third International Workshop Evaluation of Ontology-based Tools (EON), 2004.
[54]
M. Ehrig, Y. Sure, Ontology alignment-Karlsruhe, in: Proceedings of the third International Workshop Evaluation of Ontology-based Tools (EON), 2004.
[55]
N.F. Noy, M.A. Musen, Using prompt ontology-comparison tools in the EON ontology alignment contest, in: Proceedings of the Third International Workshop Evaluation of Ontology-based Tools (EON), 2004.
[56]
S. Castano, V.D. Antonellis A Schema analysis and reconciliation tool environment, in: Proceedings of the International Database Engineering and Applications Symposium (IDEAS), 1999.
[57]
D.W. Embley, D. Jackman, L. Xu, Multifaceted exploitation of metadata for attribute match discovery in information integration, in: Proceedings of the International Workshop on Information Integration on the Web (WIIW), 2001.
[58]
J. Kang, J. Naughton, On schema matching with opaque column names and data values, in: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2003.
[59]
T. Milo, S. Zohar: Using schema matching to simplify heterogeneous data translation, in: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB) 1998.
[60]
P. Mitra, G. Wiederhold, Resolving terminological heterogeneity in ontologies, in: Proceedings of the ECAI'02 Workshop on Ontologies and Semantic Interoperability, 2002.
[61]
Van Rijsbergen, C.J., Information Retrieval. 1979. second ed. Butterworths, London.
[62]
J. Wang, J. Wen, F. Lochovsky, W. Ma, Instance-based schema matching for web databases by domain-specific query probing, in: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Systems
Information Systems  Volume 32, Issue 6
September, 2007
148 pages

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 01 September 2007

Author Tags

  1. Data integration
  2. Model management
  3. Schema integration
  4. Schema matching
  5. Schema matching evaluation

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media