skip to main content
10.1145/956863.956867acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

A study of parameter tuning for term frequency normalization

Published: 03 November 2003 Publication History

Abstract

Most current term frequency normalization approaches for information retrieval involve the use of parameters. The tuning of these parameters has an important impact on the overall performance of the information retrieval system. Indeed, a small variation in the involved parameter(s) could lead to an important variation in the precision/recall values. Most current tuning approaches are dependent on the document collections. As a consequence, the effective parameter value cannot be obtained for a given new collection without extensive training data. In this paper, we propose a novel and robust method for the tuning of term frequency normalization parameter(s), by measuring the normalization effect on the within document frequency of the query terms. As an illustration, we apply our method on Amati \& Van Rijsbergen's so-called normalization 2. The experiments for the ad-hoc TREC-6,7,8 tasks and TREC-8,9,10 Web tracks show that the new method is independent of the collections and able to provide reliable and good performance.

References

[1]
G. Amati. Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow, 2003.
[2]
G. Amati and C. J. V. Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. InACM Transactions on Information Systems (TOIS), volume 20(4), pages 357--389, October 2002.
[3]
G. Amati and C. J. V. Rijsbergen. Term frequency normalization via pareto distributions. InAdvances in Information Retrieval, 24th BCS-IRSG European Colloquium on IR Research Glasgow, UK, March 25-27, 2002 Proceedings., volume 2291 of Lecture Notes in Computer Science, pages 183--192. Springer, 2002.
[4]
A. Chowdhury, M. C. McCabe, D. Grossman, and O. Frieder. Document normalization revisited. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 381--382, 2002.
[5]
S. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 232--241, 1994.
[6]
S. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at trec-4. In NIST Special Publication 500-236: The Fourth Text REtrieval Conference (TREC-4), pages 73--96, 1995.
[7]
G. Salton, A. Wong, and C. Yang. A vector space model for information retrieval. Journal of American Society for Information Retrieval, 18(11):613--620, November 1975.
[8]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29, 1996.
[9]
K. Sparck-Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, 36(2000):779--840, 2000.
[10]
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. InProceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342, 2001.

Cited By

View all

Index Terms

  1. A study of parameter tuning for term frequency normalization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
      November 2003
      592 pages
      ISBN:1581137230
      DOI:10.1145/956863
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 November 2003

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. document length
      2. information retrieval
      3. parameter tuning
      4. term frequency normalization

      Qualifiers

      • Article

      Conference

      CIKM03

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 14 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media