skip to main content
10.1145/956863.956868acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Using titles and category names from editor-driven taxonomies for automatic evaluation

Published: 03 November 2003 Publication History

Abstract

Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the door to a new evaluation approach. We use the ODP (Open Directory Project) taxonomy to find sets of pseudo-relevant documents via one of two assumptions: 1) taxonomy entries are relevant to a given query if their editor-entered titles exactly match the query, or 2) all entries in a leaf-level taxonomy category are relevant to a given query if the category title exactly matches the query. We compare and contrast these two methodologies by evaluating six web search engines on a sample from an America Online log of ten million web queries, using MRR measures for the first method and precision-based measures for the second. We show that this technique is stable with respect to the query set selected and correlated with a reasonably large manual evaluation.

References

[1]
Beitzel, S., Jensen, E., Chowdhury, A., Grossman, D., and Frieder, O. Using Manually-built Web Directories for Automatic Evaluation of Known-Item Retrieval. To appear in SIGIR'03 poster session (Toronto, Canada, 2003).
[2]
Boyan, J., Freitag, D., and Joachims, T. A machine learning architecture for optimizing web search engines. In AAAI'96 (August, 1996) Workshop on Internet Based Information Systems. http://www.cs.cornell.edu/People/tj/publications/boyan_etal_96a.pdf
[3]
Broder, A. A Taxonomy of Web Search. SIGIR Forum 36(2) (Fall, 2002).
[4]
Bruza, P., McArthur, R., and Dennis, S. Interactive Internet search: keyword, directory and query reformulation mechanisms compared. In Proceedings of SIGIR'00 (Athens, Greece, 2000), ACM Press, 280--287.
[5]
Buckley, C., and Voorhees, E. Evaluating Evaluation Measure Stability. In Proceedings of SIGIR'00 (Athens, Greece, 2000), ACM Press, 33--40.
[6]
Buckley, C. Proposal to TREC Web Track mailing list (November, 2001). http://groups.yahoo.com/group/webir/message/760
[7]
Chowdhury, A., and Soboroff, I. Automatic Evaluation of World Wide Web Search Services. In Proceedings of SIGIR'02 (Tampere, Finland, August, 2002), ACM Press, 421 -- 422.
[8]
Craswell, N., Bailey, P., and Hawking, D. Is it fair to evaluate Web systems using TREC ad hoc methods? SIGIR'99 (Berkeley, CA, 1999) Workshop on Web Evaluation. http://pigfish.vic.cmis.csiro.au/ nickc/pubs/sigir99ws.ps.gz
[9]
Wei Ding and Gary Marchionini. Comparative study of web search service performance. In Proceedings of the ASIS 1996 Annual Conference (October 1996).
[10]
Gordon, M., and Pathak, P. Finding information on the world wide web: The retrieval effectiveness of search engines. Information Processing and Management, 35(2) (March 1999), 141--180.
[11]
Hawking, D., Craswell, N., and Thistlewaite P. Overview of TREC-7 Very Large Collection Track. In Proceedings of TReC7 (Gaithersburg, MD, 1998), NIST Special Publication 500-242, 91--104.
[12]
Hawking, D., Voorhees, E., Craswell, N., and Bailey, P. Overview of the TREC-8 Web Track. In Proceedings of TReC8 (Gaithersburg, MD, 1999), NIST Special Publication 500-246, 131--149.
[13]
Hawking, D., Craswell, N., Thistlewaite P., and Harman, D. Results and challenges in web search evaluation. In Proceedings of WWW8 (Toronto, Canada, May 1999), Elsevier Science, 243--252.
[14]
Hawking, D., and Craswell, N. Overview of the TREC-2001 Web Track. In Proceedings of TReC10 (Gaithersburg, MD, 2001), NIST Special Publication 500-250, 61--67.
[15]
Hawking, D., and Craswell, N. Measuring Search Engine Quality. Information Retrieval, 4(1) (2001), Kluwer Academic, 33--59.
[16]
Hawking, D., Craswell, N., and Griffiths, K. Which search engine is best at finding airline site home pages? CMIS Technical Report 01/45 (March, 2001). http://pigfish.vic.cmis.csiro.au/ nickc/pubs/TR01-45.pdf
[17]
Hawking, D., Craswell, N., and Griffiths, K. Which Search Engine is Best at Finding Online Services? In Proceedings of WWW10 (Hong Kong, May 2001), Posters. Actual poster available as <http://pigfish.vic.cmis.csiro.au/ nickc/pubs/www10actualposter.pdf>
[18]
Hawking, D., and Craswell, N. Overview of the TREC-2002 Web Track. To appear in Proceedings of TReC11 (Gaithersburg, MD, 2002).
[19]
Haveliwala, T., Gionis, A., Klein, D., and Indyk, P. Evaluating Strategies for Similarity Search on the Web. In Proceedings of WWW'02 (Honolulu, HI, May, 2002), ACM Press.
[20]
Jansen, B., Spink, A., and Saracevic, T. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2) (2000), 207--227.
[21]
Joachims, T. Evaluating Retrieval Performance using Clickthrough Data. SIGIR'02 (Tampere, Finland, August, 2002) Workshop on Mathematical/Formal Methods in Information Retrieval. http://www.cs.cornell.edu/People/tj/publications/joachims_02b.pdf
[22]
Leighton, H., and Srivastava, J. First 20 precision among world wide web search services (search engines). Journal of the American Society for Information Science, 50(10) (1999), 882--889.
[23]
Li, L., and Shang, Y. A new method for automatic performance comparison of search engines. World Wide Web, 3 (2000), Kluwer Academic, 241--247.
[24]
Menczer, F. Semi-Supervised Evaluation of Search Engines via Semantic Mapping. Submitted to WWW'03 (Budapest, Hungary, 2003), ACM Press. http://dollar.biz.uiowa.edu/ fil/Papers/engines.pdf
[25]
Shang, Y., and Li, L. Precision Evaluation of Search Engines. World Wide Web, 5 (2002), Kluwer Academic, 159--173.
[26]
Silverstein, C., Henzinger, M., Marais, H., and Moricz, M. Analysis of a very large web search engine query log. SIGIR Forum 33(1) (Fall, 1999), 6-12. Previously available as Technical Report TR 1998-014, Compaq Systems Research Center, Palo Alto, CA, 1998. <http://www.research.compaq.com/SRC>
[27]
Singhal, A., and Kaszkiel, M. A case study in web search using TREC algorithms. In Proceedings of WWW10 (Hong Kong, May 2001), 708--716.
[28]
Spink, A., Jansen, B.J., Wolfram, D., and Saracevic, T. From E-sex to e-commerce: Web search changes. IEEE Computer, 35(3) (2002), 107--109.
[29]
Turpin, H., and Hersh, W. Why Batch and User Evaluations Do Not Give the Same Results. In Proceedings of SIGIR'01 (New Orleans, LA, 2001), ACM Press, 225--231.
[30]
Voorhees, E. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. In Proceedings of SIGIR'98 (Melbourne, Australia, 1998), ACM Press, 315--323.
[31]
Voorhees, E. Evaluation by highly relevant documents. In Proceedings of SIGIR'01 (New Orleans, LA, 2001), ACM Press, 74--82.

Cited By

View all

Index Terms

  1. Using titles and category names from editor-driven taxonomies for automatic evaluation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
    November 2003
    592 pages
    ISBN:1581137230
    DOI:10.1145/956863
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic evaluation
    2. relevance judgments
    3. web search

    Qualifiers

    • Article

    Conference

    CIKM03

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media