Article

Using titles and category names from editor-driven taxonomies for automatic evaluation

Authors:

Steven M. Beitzel,

Eric C. Jensen,

Abdur Chowdhury,

David GrossmanAuthors Info & Claims

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

Pages 17 - 23

https://doi.org/10.1145/956863.956868

Published: 03 November 2003 Publication History

Abstract

Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the door to a new evaluation approach. We use the ODP (Open Directory Project) taxonomy to find sets of pseudo-relevant documents via one of two assumptions: 1) taxonomy entries are relevant to a given query if their editor-entered titles exactly match the query, or 2) all entries in a leaf-level taxonomy category are relevant to a given query if the category title exactly matches the query. We compare and contrast these two methodologies by evaluating six web search engines on a sample from an America Online log of ten million web queries, using MRR measures for the first method and precision-based measures for the second. We show that this technique is stable with respect to the query set selected and correlated with a reasonably large manual evaluation.

References

[1]

Beitzel, S., Jensen, E., Chowdhury, A., Grossman, D., and Frieder, O. Using Manually-built Web Directories for Automatic Evaluation of Known-Item Retrieval. To appear in SIGIR'03 poster session (Toronto, Canada, 2003).

Digital Library

[2]

Boyan, J., Freitag, D., and Joachims, T. A machine learning architecture for optimizing web search engines. In AAAI'96 (August, 1996) Workshop on Internet Based Information Systems. http://www.cs.cornell.edu/People/tj/publications/boyan_etal_96a.pdf

[3]

Broder, A. A Taxonomy of Web Search. SIGIR Forum 36(2) (Fall, 2002).

Digital Library

[4]

Bruza, P., McArthur, R., and Dennis, S. Interactive Internet search: keyword, directory and query reformulation mechanisms compared. In Proceedings of SIGIR'00 (Athens, Greece, 2000), ACM Press, 280--287.

Digital Library

[5]

Buckley, C., and Voorhees, E. Evaluating Evaluation Measure Stability. In Proceedings of SIGIR'00 (Athens, Greece, 2000), ACM Press, 33--40.

Digital Library

[6]

Buckley, C. Proposal to TREC Web Track mailing list (November, 2001). http://groups.yahoo.com/group/webir/message/760

[7]

Chowdhury, A., and Soboroff, I. Automatic Evaluation of World Wide Web Search Services. In Proceedings of SIGIR'02 (Tampere, Finland, August, 2002), ACM Press, 421 -- 422.

Digital Library

[8]

Craswell, N., Bailey, P., and Hawking, D. Is it fair to evaluate Web systems using TREC ad hoc methods? SIGIR'99 (Berkeley, CA, 1999) Workshop on Web Evaluation. http://pigfish.vic.cmis.csiro.au/ nickc/pubs/sigir99ws.ps.gz

[9]

Wei Ding and Gary Marchionini. Comparative study of web search service performance. In Proceedings of the ASIS 1996 Annual Conference (October 1996).

[10]

Gordon, M., and Pathak, P. Finding information on the world wide web: The retrieval effectiveness of search engines. Information Processing and Management, 35(2) (March 1999), 141--180.

Digital Library

[11]

Hawking, D., Craswell, N., and Thistlewaite P. Overview of TREC-7 Very Large Collection Track. In Proceedings of TReC7 (Gaithersburg, MD, 1998), NIST Special Publication 500-242, 91--104.

[12]

Hawking, D., Voorhees, E., Craswell, N., and Bailey, P. Overview of the TREC-8 Web Track. In Proceedings of TReC8 (Gaithersburg, MD, 1999), NIST Special Publication 500-246, 131--149.

[13]

Hawking, D., Craswell, N., Thistlewaite P., and Harman, D. Results and challenges in web search evaluation. In Proceedings of WWW8 (Toronto, Canada, May 1999), Elsevier Science, 243--252.

Digital Library

[14]

Hawking, D., and Craswell, N. Overview of the TREC-2001 Web Track. In Proceedings of TReC10 (Gaithersburg, MD, 2001), NIST Special Publication 500-250, 61--67.

[15]

Hawking, D., and Craswell, N. Measuring Search Engine Quality. Information Retrieval, 4(1) (2001), Kluwer Academic, 33--59.

Digital Library

[16]

Hawking, D., Craswell, N., and Griffiths, K. Which search engine is best at finding airline site home pages? CMIS Technical Report 01/45 (March, 2001). http://pigfish.vic.cmis.csiro.au/ nickc/pubs/TR01-45.pdf

[17]

Hawking, D., Craswell, N., and Griffiths, K. Which Search Engine is Best at Finding Online Services? In Proceedings of WWW10 (Hong Kong, May 2001), Posters. Actual poster available as <http://pigfish.vic.cmis.csiro.au/ nickc/pubs/www10actualposter.pdf>

[18]

Hawking, D., and Craswell, N. Overview of the TREC-2002 Web Track. To appear in Proceedings of TReC11 (Gaithersburg, MD, 2002).

[19]

Haveliwala, T., Gionis, A., Klein, D., and Indyk, P. Evaluating Strategies for Similarity Search on the Web. In Proceedings of WWW'02 (Honolulu, HI, May, 2002), ACM Press.

Digital Library

[20]

Jansen, B., Spink, A., and Saracevic, T. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2) (2000), 207--227.

Digital Library

[21]

Joachims, T. Evaluating Retrieval Performance using Clickthrough Data. SIGIR'02 (Tampere, Finland, August, 2002) Workshop on Mathematical/Formal Methods in Information Retrieval. http://www.cs.cornell.edu/People/tj/publications/joachims_02b.pdf

[22]

Leighton, H., and Srivastava, J. First 20 precision among world wide web search services (search engines). Journal of the American Society for Information Science, 50(10) (1999), 882--889.

Digital Library

[23]

Li, L., and Shang, Y. A new method for automatic performance comparison of search engines. World Wide Web, 3 (2000), Kluwer Academic, 241--247.

Digital Library

[24]

Menczer, F. Semi-Supervised Evaluation of Search Engines via Semantic Mapping. Submitted to WWW'03 (Budapest, Hungary, 2003), ACM Press. http://dollar.biz.uiowa.edu/ fil/Papers/engines.pdf

[25]

Shang, Y., and Li, L. Precision Evaluation of Search Engines. World Wide Web, 5 (2002), Kluwer Academic, 159--173.

Digital Library

[26]

Silverstein, C., Henzinger, M., Marais, H., and Moricz, M. Analysis of a very large web search engine query log. SIGIR Forum 33(1) (Fall, 1999), 6-12. Previously available as Technical Report TR 1998-014, Compaq Systems Research Center, Palo Alto, CA, 1998. <http://www.research.compaq.com/SRC>

Digital Library

[27]

Singhal, A., and Kaszkiel, M. A case study in web search using TREC algorithms. In Proceedings of WWW10 (Hong Kong, May 2001), 708--716.

Digital Library

[28]

Spink, A., Jansen, B.J., Wolfram, D., and Saracevic, T. From E-sex to e-commerce: Web search changes. IEEE Computer, 35(3) (2002), 107--109.

Digital Library

[29]

Turpin, H., and Hersh, W. Why Batch and User Evaluations Do Not Give the Same Results. In Proceedings of SIGIR'01 (New Orleans, LA, 2001), ACM Press, 225--231.

Digital Library

[30]

Voorhees, E. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. In Proceedings of SIGIR'98 (Melbourne, Australia, 1998), ACM Press, 315--323.

Digital Library

[31]

Voorhees, E. Evaluation by highly relevant documents. In Proceedings of SIGIR'01 (New Orleans, LA, 2001), ACM Press, 74--82.

Digital Library

Cited By

Faggioli GDietz LClarke CDemartini GHagen MHauff CKando NKanoulas EPotthast MStein BWachsmuth HYoshioka MKiseleva JAliannejadi M(2023)Perspectives on Large Language Models for Relevance JudgmentProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605136(39-50)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605136
Dietz LChatterjee SLennox CKashyapi SOza PGamari BAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)WikimarksProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531731(3003-3012)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531731
Dietz LDalton J(2020)Humans Optional? Automatic Large-Scale Test Collections for Entity, Passage, and Entity-Passage RetrievalDatenbank-Spektrum10.1007/s13222-020-00334-y20:1(17-28)Online publication date: 20-Mar-2020
https://doi.org/10.1007/s13222-020-00334-y
Show More Cited By

Index Terms

Using titles and category names from editor-driven taxonomies for automatic evaluation
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Evaluation of phrasal query suggestions
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

This paper evaluates the uptake and efficacy of a unified approach to phrasal query suggestions in the context of a high-precision search engine. The search engine performs ranked extended-Boolean searches with the proximity operator <scp>NEAR</scp> ...
Methodologies for Evaluation of Note-Based Music-Retrieval Systems

There have been many proposed music-retrieval systems, based on a variety of principles. How the effectiveness of these systems compares is not clear. The evaluation of some systems has been informal, without the rigor applied in other areas of ...
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

November 2003

592 pages

ISBN:1581137230

DOI:10.1145/956863

General Chair:
Donald Kraft
Louisiana State University
,
Program Chairs:
Ophir Frieder
Illinois Institute of Technology
,
Joachim Hammer
University of Florida
,
Sajda Qureshi
University of Nebraska, Omaha
,
Len Seligman
The MITRE Corporation

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM03

Sponsor:

CIKM03: 12th International Conference on Information and Knowledge Management

November 3 - 8, 2003

LA, New Orleans, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
587
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Faggioli GDietz LClarke CDemartini GHagen MHauff CKando NKanoulas EPotthast MStein BWachsmuth HYoshioka MKiseleva JAliannejadi M(2023)Perspectives on Large Language Models for Relevance JudgmentProceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3578337.3605136(39-50)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3578337.3605136
Dietz LChatterjee SLennox CKashyapi SOza PGamari BAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)WikimarksProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531731(3003-3012)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531731
Dietz LDalton J(2020)Humans Optional? Automatic Large-Scale Test Collections for Entity, Passage, and Entity-Passage RetrievalDatenbank-Spektrum10.1007/s13222-020-00334-y20:1(17-28)Online publication date: 20-Mar-2020
https://doi.org/10.1007/s13222-020-00334-y
Yadolahi MZargari FFarhoodi M(2018)Automatic Evaluation of Video Search engines in Persian Web domain based on Majority VotingSignal and Data Processing10.29252/jsdp.15.3.315:3(3-12)Online publication date: 1-Dec-2018
https://doi.org/10.29252/jsdp.15.3.3
Shoeleh FAzimzadeh MMirzaei AFarhoodi M(2016)Similarity based Automatic Web Search Engine Evaluation2016 8th International Symposium on Telecommunications (IST)10.1109/ISTEL.2016.7881901(643-648)Online publication date: Sep-2016
https://doi.org/10.1109/ISTEL.2016.7881901
Azimzadeh MBadie REsnaashari M(2016)A review on web search engines' automatic evaluation methods and how to select the evaluation method2016 Second International Conference on Web Research (ICWR)10.1109/ICWR.2016.7498450(78-83)Online publication date: Apr-2016
https://doi.org/10.1109/ICWR.2016.7498450
Mahmoudi MBadie RZahedi MAzimzadeh M(2014)Evaluating the retrieval effectiveness of search engines using Persian navigational queries7'th International Symposium on Telecommunications (IST'2014)10.1109/ISTEL.2014.7000767(563-568)Online publication date: Sep-2014
https://doi.org/10.1109/ISTEL.2014.7000767
Xamena EBrignole NMaguitman A(2013)A study of relevance propagation in large topic ontologiesJournal of the American Society for Information Science and Technology10.1002/asi.2292564:11(2238-2255)Online publication date: 3-Sep-2013
https://doi.org/10.1002/asi.22925
Maguitman ALorenzetti CCecchini RBrena RGuzman-Arenas A(2012)Evaluating and Enhancing Contextual Search with Semantic Similarity DataQuantitative Semantics and Soft Computing Methods for the Web10.4018/978-1-60960-881-1.ch008(163-182)Online publication date: 2012
https://doi.org/10.4018/978-1-60960-881-1.ch008
Agosti MBerendsen RBogers TBraschler MBuitelaar PChoukri KMaria Di Nunzio GFerro NForner PHanbury AHeppin KHansen PJärvelin ALarsen BLupu MMasiero IMüller HPeruzzo SPetras VPiroi Fde Rijke MSantucci GSilvello GToms E(2012)PROMISE retreat report prospects and opportunities for information access evaluationACM SIGIR Forum10.1145/2422256.242226546:2(60-84)Online publication date: 21-Dec-2012
https://dl.acm.org/doi/10.1145/2422256.2422265
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten