skip to main content
10.5555/1996889.1996910guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Design and implementation of relevance assessments using crowdsourcing

Published: 18 April 2011 Publication History

Abstract

In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.

References

[1]
von Ahn, L.: Games with a purpose. IEEE Computer 39(6), 92-94 (2006)
[2]
Alonso, O., Mizzaro, S.: Can we get rid of TREC Assessors? Using Mechanical Turk for Relevance Assessment. In: SIGIR Workshop Future of IR Evaluation (2009)
[3]
Alonso, O., Schenkel, R., Theobald, M.: Crowdsourcing Assessments for XML Ranked Retrieval. In: 32 ECIR, Milton Keynes, UK (2010)
[4]
Alonso, O., Baeza-Yates, R.: An Analysis of Crowdsourcing Relevance Assessments in Spanish. In: CERI 2010, Madrid, Spain (2010)
[5]
Bradburn, N., Sudman, S., Wansink, B.: Asking Questions: The Definitive Guide to Questionnaire Design. Josey-Bass (2004)
[6]
Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In: Proceedings of EMNLP (2009)
[7]
Grady, C., Lease, M.: Crowdsourcing Document Relevance Assessment with Mechanical Turk. In: NAACL HLTWorkshop on Creating Speech and Language Data with Amazons Mechanical Turk (2010)
[8]
Kazai, G., Milic-Frayling, N., Costello, J.: Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments. In: 32 SIGIR (2009)
[9]
Kinney, K., Huffman, S., Zhai, J.: How Evaluator Domain Expertise Affects Search Result Relevance Judgments. In: 17 CIKM (2008)
[10]
Malone, T.W., Laubacher, R., Dellarocas, C.: Harnessing Crowds: Mapping the Genome of Collective Intelligence. MIT Press, Cambridge (2009)
[11]
Mason, W., Watts, D.: Financial Incentives and the 'Performance of Crowds'. In: HCOMP Workshop at KDD, Paris, France (2009)
[12]
Nov, O., Naaman, M., Ye, C.: What Drives Content Tagging: The Case of Photos on Flickr. In: CHI, Florence, Italy (2008)
[13]
Snow, R., O' Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In: EMNLP (2008)
[14]
Tang, J., Sanderson, M.: Evaluation and User Preference Study on Spatial Diversity. In: 32 ECIR, Milton Keynes, UK (2010)
[15]
Voorhees, E.: Personal communication (2009).

Cited By

View all
  1. Design and implementation of relevance assessments using crowdsourcing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ECIR'11: Proceedings of the 33rd European conference on Advances in information retrieval
    April 2011
    792 pages
    ISBN:9783642201608
    • Editors:
    • Paul Clough,
    • Colum Foley,
    • Cathal Gurrin,
    • Hyowon Lee,
    • Gareth J. F. Jones

    Sponsors

    • Google Inc.
    • CNGL: CNGL
    • sfi: sfi
    • Microsoft Research: Microsoft Research
    • Yahoo! Labs

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 18 April 2011

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media