Research Article
Effective level of term frequency impact on large-scale retrieval performance: by top-term ranking method
@INPROCEEDINGS{10.1145/1146847.1146884, author={Soheila KARBASI and Mohand BOUGHANEM}, title={Effective level of term frequency impact on large-scale retrieval performance: by top-term ranking method}, proceedings={1st International ICST Conference on Scalable Information Systems}, publisher={ACM}, proceedings_a={INFOSCALE}, year={2006}, month={6}, keywords={large collection; document length normalization; effective level of term frequency; Top-Term Ranking method}, doi={10.1145/1146847.1146884} }
- Soheila KARBASI
Mohand BOUGHANEM
Year: 2006
Effective level of term frequency impact on large-scale retrieval performance: by top-term ranking method
INFOSCALE
ACM
DOI: 10.1145/1146847.1146884
Abstract
As the volume of information increases, effective information retrieval methods become more essential to deal with the growth of information. Present document develops a new method to assess the potential role of the term frequency-inverse document frequency measures that are commonly used in text retrieval systems by the vector space model. We carried out preliminary tests to know the effect of term-weighing items on the retrieval performance in a basic scheme of vector space model. With regard to the preliminary tests, we identify a novel factor (effective level of term frequency) that represents the document content based on its length and maximum term-frequency. This factor is used to find the maximum principal terms within the documents and an appropriate subset of documents containing the query terms. Our proposed method (Top-Term Ranking) uses a reduced indexing view of the original terms, where only the principal terms of each document are considered for weighting. Regarding the result of our experiments on TREC collections, the effective level of term frequency (EL) is a significant factor in retrieving relevant documents, especially in large collections. The interest of the Top-Term Ranking method is to increase the performance of the large-scale information retrieval systems more than the common vector space methods.