skip to main content
research-article

Instance matching benchmarks in the era of Linked Data

Published: 01 August 2016 Publication History

Abstract

The goal of this survey is to present the state of the art instance matching benchmarks for Linked Data. We introduce the principles of benchmark design for instance matching systems, discuss the dimensions and characteristics of an instance matching benchmark, provide a comprehensive overview of existing benchmarks, as well as benchmark generators, discuss their advantages and disadvantages, as well as the research directions that should be exploited for the creation of novel benchmarks, to answer the needs of the Linked Data paradigm.

References

[1]
C. Li, L. Jin, S. Mehrotra, Supporting efficient record linkage for large data sets using mapping techniques, WWW, 2006.
[2]
Z. Dragisic, K. Eckert, J. Euzenat, D. Faria, A. Ferrara, R. Granada, V. Ivanova, E. Jimenez-Ruiz, A. Oskar Kempf, P. Lambrix, S. Montanelli, H. Paulheim, D. Ritze, P. Shvaiko, A. Solimando, C. Trojahn, O. Zamaza, B. Cuenca Grau, Results of the ontology alignment evaluation initiative 2014, in: Proceedings 9th ISWC Workshop on Ontology Matching, OM, 2014.
[3]
I. Bhattacharya, L. Getoor, Entity resolution in graphs, in: Mining Graph Data, Wiley and Sons, 2006.
[4]
S. Whang, D. Menestrina, G. Koutrika, M. Theobald, H. Garcia-Molina, Entity resolution with iterative blocking, in: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 219-232.
[5]
S. Sarawagi, A. Bhamidipaty, Interactive deduplication using active learning, in: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, 2002, pp. 269-278.
[6]
M. Hernández, S. Stolfo, Real-world data is dirty: data cleansing and the merge/purge problem, Data Mining Knowl. Discov. 2 (1998) 9-37.
[7]
A. Morris, Y. Velegrakis, P. Bouquet, Entity Identification on the Semantic Web, SWAP, 2008.
[8]
J. Noessner, M. Niepert, C. Meilicke, H. Stuckenschmidt, Leveraging Terminological Structure for Object Reconciliation, in: Proceedings ESWC, 2010.
[9]
C. Bizer, T. Heath, T. Berners-Lee, Linked data--the story so far, Int. J. Semant. Web Inf. Syst. 5 (3) (2009) 1-22. http://dx.doi.org/10.4018/jswis.2009081901.
[10]
G. Flouris, D. Manakanatas, H. Kondylakis, D. Plexousakis, G. Antoniou, Ontology change: classification and survey (2008), in: Knowledge Engineering Review, KER 2008, pp. 117-152.
[11]
A.K. Elmagarmid, P. Ipeirotis, V. Verykios, Duplicate record detection: A survey, IEEE Trans. Knowl. Data Eng. (2007).
[12]
Parag, P. Domingos, Multi-relational record linkage, MRDM workshop colocated with KDD, 2004, pp. 31-48.
[13]
E. Ioannou, N. Rassadko, Y. Velegrakis, On generating benchmark data for entity matching, J. Data Semant. http://dx.doi.org/10.1007/s13740-012-0015-8.
[14]
P. Calado, M. Herschel, L. Leitão, An Overview of XML Duplicate Detection Algorithms, vol. 255, 2010, pp. 193-224.
[15]
T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space first ed. in: Synthesis Lectures on the Semantic Web: Theory and Technology, 2011, Morgan & Claypool, http://dx.doi.org/10.2200/S00334ED1V01Y201102WBE001.
[16]
American Productivity & Quality Centre, 1993.
[17]
J. Gray, Benchmark Handbook: For Database and Transaction Processing Systems, Publisher M. Kaufmann, ISBN: 1558601597, 1991.
[18]
R. Isele, C. Bizer, Active learning of expressive linkage rules using genetic programming, Web Semant. J. 23 (2013).
[19]
A.-C. Ngonga Ngomo, K. Lyko, EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming, ESWC, 2012, Heraklion, Crete, DOI http://dx.doi.org/10.1007/978-3-642-30284-8_17.
[20]
C. Böhm, G. de Melo, F. Naumann, G. Weikum, LINDA: Distributed Web-of-Data-Scale Entity Matching, CIKM'12, October 29-November 2, 2012, Maui, HI, USA.
[21]
J. Volz, C. Bizer, M. Gaedke, G. Kobilarov, Silk-A Link Discovery Framework for the Web of Data, 2nd Workshop about Linked Data on the Web, LDOW, 2009.
[22]
O. Hassanzadeh, M. Consens, Linked movie data base, in: Proceedings of the WWW2009 Workshop on Linked Data on the Web, LDOW, 2009.
[23]
A. Ferrara, S. Montanelli, J. Noessner, H. Stuckenschmidt, Benchmarking matching applications on the semantic web, in: Proceedings of 8th Extended Semantic Web Conference, ESWC 2011, 2011.
[24]
A. Ferrara, D. Lorusso, S. Montanelli, G. Varese, Towards a benchmark for instance matching, in: Proceedings 3th ISWC workshop on Ontology Matching, OM, 2008.
[25]
R.O. Nambiar, M. Poess, A. Masland, H.R. Taheri, M. Emmerton, F. Carman, M. Majdalany, TPC Benchmark Roadmap, Selected Topics in Performance Evaluation and Benchmarking, vol. 7755, 2012, pp. 1-20, http://dx.doi.org/10.1007/978-3-642-36727-4_1.
[26]
C. Goutte, E. Gaussier, A probabilistic interpretation of precision, recall, and F-score, with implication for evaluation, in: Proceedings of the 27th European Conference on Information Retrieval, ECIR, 2005.
[27]
T. Saveta, E. Daskalaki, G. Flouris, I. Fundulaki, M. Herschel, A.-C. Ngonga Ngomo, Pushing the Limits of Instance Matching Systems: A Semantics-Aware Benchmark for Linked Data, WWW, 2015.
[28]
T. Saveta, E. Daskalaki, G. Flouris, I. Fundulaki, M. Herschel, A.-C. Ngonga Ngomo, LANCE: Piercing to the Heart of Instance Matching Tool, ISWC, 2015, pp. 375-391.
[29]
I. Fundulaki, N. Martinez, R. Angles, B. Bishop, V. Kotsev, D2.2.2 Data Generator, Technical Report, Linked Data Benchmark Council, 2013, Available at http://ldbc.eu/results/deliverables.
[30]
J. Euzenat, A. Ferrara, L. Hollink, A. Isaac, C. Joslyn, V. Malaise, C. Meilicken, A. Nikolov, J. Pane, M. Sabou, F. Scharffe, P. Shvaiko, V.S.H. Stuckenschmidt, O. Svab-Zamazal, V. Svatek, C. Trojahn, G. Vouros, S. Wang, Results of the ontology alignment evaluation initiative 2009, in: Proceedings 4th ISWC Workshop on Ontology Matching, OM, 2009.
[31]
J. David, J. Euzenat, F. Scharffe, C. Trojahn, The alignment API 4.0, Semant. Web J. 2 (1) (2011) 3-10.
[32]
Md.H. Seddiqui, M. Aono, Anchor-Flood: Results for OAEI 2009, in: Proceedings 4th ISWC Workshop on Ontology Matching, OM, 2009, http://disi.unitn.it/~p2p/OM-2009/oaei09_paper1.pdf.
[33]
Y.R. Jean-Mary, E.P. Shironoshita, M.R. Kabuka, ASMOV: results for OAEI 2009, in: Proceedings 4th ISWC Workshop on Ontology Matching, OM, 2009.
[34]
M. Nagy, M. Vargas-Vera, P. Stolarski, DSSim results for OAEI 2009, in: Proceedings 4th ISWC Workshop on Ontology Matching, OM, 2009.
[35]
M.A. Hernandez, S.J. Stolfo, The merge/purge problem for large databases, SIGMOD Rec. 24 (2) (1995) 127-138.
[36]
H. Stoermer, N. Rassadko, Results of OKKAM feature based entity matching algorithm for instance matching contest of OAEI 2009, in: Proceedings 4th ISWC Workshop on Ontology Matching, OM, 2009, http://ceur-ws.org/Vol-551/oaei09_paper10.pdf.
[37]
X. Zhang, Q. Zhong, F. Shi, J. Li, J. Tang, RiMOM results for OAEI 2009, in: Proceedings 4th ISWC Workshop on Ontology Matching, OM, 2009.
[38]
E. Daskalaki, D. Plexousakis, OtO matching system: a multi-strategy approach to instance matching, in: Advanced Information Systems Engineering: 24th International Conference, 2012 Gdansk, Poland.
[39]
J. Euzenat, A. Ferrara, C. Meilicke, J. Pane, F. Schare, P. Shvaiko, H. Stuckenschmidt, O. Svab-Zamazal, V. Svatek, C. Trojahn, Results of the ontology alignment evaluation initiative 2010, in: Proceedings 5th ISWC Workshop on Ontology Matching, OM, 2010.
[40]
Y.R. Jean-Mary, E.P. Shironoshita, M.R. Kabuka, ASMOV: results for OAEI 2010, in: Proceedings 5th ISWC Workshop on Ontology Matching, OM, 2010.
[41]
J. Wang, X. Zhang, L. Hou, Y. Zhao, J. Li, Y. Qi, J. Tang, RiMOM results for OAEI 2010, in: Proceedings 5th ISWC Workshop on Ontology Matching, OM, 2010.
[42]
J. Noessner, M. Niepert, CODI: Combinatorial optimization for data integration--results for OAEI 2010, in: Proceedings 5th ISWC Workshop on Ontology Matching, OM, 2010.
[43]
W. Hu, J. Chen, C. Cheng, Y. Qu, ObjectCoref & Falcon-AO: results for OAEI 2010, in: Proceedings 5th ISWC Workshop on Ontology Matching, OM, 2010.
[44]
F. Sais, N. Niraula, N. Pernelle, M.C. Rousset, LN2R--a knowledge based reference reconciliation system: OAEI 2010 Results, in: Proceedings 5th ISWC Workshop on Ontology Matching, OM, 2010.
[45]
J. Euzenat, A. Ferrara, Willem Robert van Hage, L. Hollink, C. Meilicke, A. Nikolov, F. Scharffe, P. Shvaiko, H. Stuckenschmidt, O. Svab-Zamazal, C. Trojahn, Final results of the ontology alignment evaluation initiative 2011, in: Proceedings 6th ISWC Workshop on Ontology Matching, OM, 2011.
[46]
J. Huber, T. Sztyler, J. Noessner, C. Meilicke, CODI: Combinatorial optimization for data integration--results for OAEI 2011, in: Proceedings 6th ISWC workshop on ontology matching, OM, 2011.
[47]
J.L. Aguirre, K. Eckert, J. Euzenat, A. Ferrara, W.R. van Hage, L. Hollink, C. Meilicke, A. Nikolov, D. Ritze, F. Scharffe, P. Shvaiko, O. Svab-Zamazal, C. Trojahn, E. Jimenez-Ruiz, B. Cuenca Grau, B. Zapilko, Results of the ontology alignment evaluation initiative 2012, in: Proceedings 7th ISWC Workshop on Ontology Matching, OM, 2012.
[48]
E. Jimenez-Ruiz, B. Cuenca Grau, I. Horrocks, LogMap and LogMapLt results for OAEI 2012, in: Proceedings 7th ISWC Workshop on Ontology Matching, OM, 2012.
[49]
A. Taheri, M. Shamsfard, SBUEI: results for OAEI 2012, in: Proceedings 7th ISWC Workshop on Ontology Matching, OM, 2012.
[50]
B.C. Grau, Z. Dragisic, K. Eckert, J. Euzenat, A. Ferrara, R. Granada, V. Ivanova, E. Jimenez-Ruiz, A.O. Kempf, P. Lambrix, A. Nikolov, H. Paulheim, D. Ritze, F. Schare, P. Shvaiko, C. Trojahn, O. Zamazal, Results of the ontology alignment evaluation initiative 2013, in: Proceedings 8th ISWC Workshop on Ontology Matching, OM, 2013.
[51]
E. Jimenez-Ruiz, B. Cuenca Grau, I. Horrocks, LogMap and LogMapLt results for OAEI 2013, in: Proceedings 7th ISWC Workshop on Ontology Matching, OM, 2013.
[52]
Q. Zheng, C. Shao, J. Li, Z. Wang, L. Hu, RiMOM2013 results for OAEI 2013, in: Proceedings 8th ISWC Workshop on Ontology Matching, OM, 2013.
[53]
K. Nguyen, R. Ichise, SLINT+ results for OAEI 2013 Instance Matching, in: Proceedings 8th ISWC Workshop on Ontology Matching, OM, 2013.
[54]
I.F. Cruz, C. Stroe, F. Caimi, A. Fabiani, C. Pesquita, F.M. Couto, M. Palmonari, Using agreementmaker to align ontologies for OAEI 2011, in: Proceedings 6th ISWC Workshop on Ontology Matching, OM, 2011.
[55]
A. Khiat, M. Benaissa, InsMT/InsMTL results for OAEI 2014 instance matching, in: Proceedings 9th ISWC Workshop on Ontology Matching, OM, 2014.
[56]
E. Jimenez-Ruiz, B. Cuenca Grau, W. Xia, A. Solimando, X. Chen, V. Cross, Y. Gong, S. Zhang, A. Chennai-Thiagarajan, LogMap family results for OAEI 2014, in: Proceedings 9th ISWC Workshop on Ontology Matching, OM, 2014.
[57]
C. Shao, L. Hu, J. Li, RiMOM-IM results for OAEI 2014, in: Proceedings 9th ISWC Workshop on Ontology Matching, OM, 2014.
[58]
M. Cheatham, Z. Dragisic, J. Euzenat, et al., Results of the ontology alignment evaluation initiative 2015, in: Proceedings 10th ISWC Workshop on Ontology Matching, OM, 2015.
[59]
E. Jimenez-Ruiz, C. Grau, A. Solimando, V. Cross, LogMap family results for OAEI 2015, in: Proceedings 10th ISWC Workshop on Ontology Matching, OM, 2015.
[60]
A. Khiat, M. Benaissa, M.-A. Belfedhal, STRIM results for OAEI 2015 instance matching evaluation, in: Proceedings 10th ISWC Workshop on Ontology Matching, OM, 2015.
[61]
S. Damak, H. Souid, M. Kachroudi, S. Zghal, EXONA results for OAEI 2015, in: Proceedings 10th ISWC Workshop on Ontology Matching, OM, 2015.
[62]
A. Khiat, M. Benaissa, InsMT+ results for OAEI 2015: Instance matching, in: Proceedings 10th ISWC Workshop on Ontology Matching, OM, 2015.
[63]
W. Wang, P. Wang, Lily results for OAEI 2015, in: Proceedings 10th ISWC Workshop on Ontology Matching, OM, 2015.
[64]
Y. Zhang, J. Li, RiMOM results for OAEI 2015, in: Proceedings 10th ISWC Workshop on Ontology Matching, OM, 2015.
[65]
K. Zaiss, S. Conrad, S.A. Vater, Benchmark for testing instance-based ontology matching methods, in: KMIS, 2010.
[66]
K. Zaiss, Instance-based ontology matching and the evaluation of matching systems (Dissertation) Heinrich-Heine-Universiat Dusseldorf, http://docserv.uni-duesseldorf.de/servlets/DerivateServlet/Derivate-18253/DissKatrinZai%C3%9F.pdf.
[67]
J. Volz, C. Bizer, M. Gaedke, G. Kobilarov, Discovering and maintaining links on the Web of Data, in: Proceedings of the 8th International Semantic Web Conference, ISWC-2009, pp. 650-665, Chantilly, USA.
[68]
O. Hassanzadeh, R. Xin, R.J Miller, A. Kementsietsidis, L. Lim, M. Wang, Linkage query writer, PVLDB, 2009.
[69]
R. Isele, C. Bizer, Learning expressive linkage rules using genetic programming, in: Proceedings of the VLDB Endowment 2012, vol. 5, pp. 1638-1649.
[70]
X. Niu, S. Rong, Y. Zhang, H. Wang, Zhishi.links results for OAEI 2011, in: Proceedings 6th ISWC Workshop on Ontology Matching, OM, 2011.
[71]
S. Araujo, A.d. Vries, D. Schwabe, SERIMI results for OAEI 2011, in: Proceedings 6th ISWC Workshop on Ontology Matching, OM, 2011.
[72]
A.-M. Olteanu-Raimond, S. Mustiere, A. Ruas, Knowledge formalization for vector data matching using belief theory, J. Spat. Inf. Sci. (10) (2015).
[73]
A. Gray, P. Groth, A. Loizou, S. Askjaer, C. Brenninkmeijer, K. Burger, C. Chichester, C.T. Evelo, C. Goble, L. Harland, S. Pettifer, M. Thompson, A. Waagmeester, A.J. Williams, Applying linked data approaches to pharmacology: Architectural decisions and implementation, Semant. Web 5 (2) (2014).
[74]
S. Homoceanu, J.-C. Kalo, W.-T. Balke, Putting instance matching to the test: is instance matching ready for reliable data linking? in: 21st International Symposium on Methodologies for Intelligent Systems, ISMIS, 2014.
[75]
V. Christophides, V. Efthymiou, K. Stefanidis, Entity Resolution in the Web of Data, Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan & Claypool Publishers, 2015.
[76]
P. Christen, Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Data-Centric Systems and Applications, Springer, ISBN: 978-3-642-31163-5, 2012.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Web Semantics: Science, Services and Agents on the World Wide Web
Web Semantics: Science, Services and Agents on the World Wide Web  Volume 39, Issue C
August 2016
97 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 August 2016

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media