Performance Evaluation of Similarity Measures on Similar and Dissimilar Text Retrieval

Victor U. Thompson; Christo Panchev; Michael Michael Oakes

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Performance Evaluation of Similarity Measures on Similar and Dissimilar Text Retrieval

Topics: Information Extraction

In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1 KDIR: SSTM, 577-584, 2015 , Lisbon, Portugal

Authors: Victor U. Thompson ; Christo Panchev and Michael Michael Oakes

Affiliation: University of Sunderland, United Kingdom

Keyword(s): Candidate Selection, Information Retrieval, Similarity Measures, Textual Similarity.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Information Extraction ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Symbolic Systems

Abstract: Many Information Retrieval (IR) and Natural language processing (NLP) systems require textual similarity measurement in order to function, and do so with the help of similarity measures. Similarity measures function differently, some measures which work better on highly similar texts do not always do so well on highly dissimilar texts. In this paper, we evaluated the performances of eight popular similarity measures on four levels (degree) of textual similarity using a corpus of plagiarised texts. The evaluation was carried out in the context of candidate selection for plagiarism detection. Performance was measured in terms of recall, and the best performed similarity measure(s) for each degree of textual similarity was identified. Results from our Experiments show that the performances of most of the measures were equal on highly similar texts, with the exception of Euclidean distance and Jensen-Shannon divergence which had poorer performances. Cosine similarity and Bhattacharryan c oefficient performed best on lightly reviewed text, and on heavily reviewed texts, Cosine similarity and Pearson Correlation performed best and next best respectively. Pearson Correlation had the best performance on highly dissimilar texts. The results also show term weighing methods and n-gram document representations that best optimises the performance of each of the similarity measures on a particular level of intertextual similarity. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 74.48.170.251

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Thompson, V. ; Panchev, C. and Michael Oakes, M. (2015). Performance Evaluation of Similarity Measures on Similar and Dissimilar Text Retrieval. In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - SSTM; ISBN 978-989-758-158-8; ISSN 2184-3228, SciTePress, pages 577-584. DOI: 10.5220/0005619105770584

@conference{sstm15,
author={Victor U. Thompson and Christo Panchev and Michael {Michael Oakes}},
title={Performance Evaluation of Similarity Measures on Similar and Dissimilar Text Retrieval},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - SSTM},
year={2015},
pages={577-584},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005619105770584},
isbn={978-989-758-158-8},
issn={2184-3228},
}

TY - CONF

JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2015) - SSTM
TI - Performance Evaluation of Similarity Measures on Similar and Dissimilar Text Retrieval
SN - 978-989-758-158-8
IS - 2184-3228
AU - Thompson, V.
AU - Panchev, C.
AU - Michael Oakes, M.
PY - 2015
SP - 577
EP - 584
DO - 10.5220/0005619105770584
PB - SciTePress