Article

Design and implementation of relevance assessments using crowdsourcing

Authors:

Ricardo Baeza-YatesAuthors Info & Claims

ECIR'11: Proceedings of the 33rd European conference on Advances in information retrieval

Pages 153 - 164

Published: 18 April 2011 Publication History

Abstract

In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details that would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, and justification analysis are important aspects of a successful crowdsourcing experiment. In this work we explore the design and execution of relevance judgments using Amazon Mechanical Turk as crowdsourcing platform, introducing a methodology for crowdsourcing relevance assessments and the results of a series of experiments using TREC 8 with a fixed budget. Our findings indicate that workers are as good as TREC experts, even providing detailed feedback for certain query-document pairs. We also explore the importance of document design and presentation when performing relevance assessment tasks. Finally, we show our methodology at work with several examples that are interesting in their own.

References

[1]

von Ahn, L.: Games with a purpose. IEEE Computer 39(6), 92-94 (2006)

Digital Library

[2]

Alonso, O., Mizzaro, S.: Can we get rid of TREC Assessors? Using Mechanical Turk for Relevance Assessment. In: SIGIR Workshop Future of IR Evaluation (2009)

[3]

Alonso, O., Schenkel, R., Theobald, M.: Crowdsourcing Assessments for XML Ranked Retrieval. In: 32 ECIR, Milton Keynes, UK (2010)

Digital Library

[4]

Alonso, O., Baeza-Yates, R.: An Analysis of Crowdsourcing Relevance Assessments in Spanish. In: CERI 2010, Madrid, Spain (2010)

[5]

Bradburn, N., Sudman, S., Wansink, B.: Asking Questions: The Definitive Guide to Questionnaire Design. Josey-Bass (2004)

[6]

Callison-Burch, C.: Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk. In: Proceedings of EMNLP (2009)

Digital Library

[7]

Grady, C., Lease, M.: Crowdsourcing Document Relevance Assessment with Mechanical Turk. In: NAACL HLTWorkshop on Creating Speech and Language Data with Amazons Mechanical Turk (2010)

Digital Library

[8]

Kazai, G., Milic-Frayling, N., Costello, J.: Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments. In: 32 SIGIR (2009)

Digital Library

[9]

Kinney, K., Huffman, S., Zhai, J.: How Evaluator Domain Expertise Affects Search Result Relevance Judgments. In: 17 CIKM (2008)

Digital Library

[10]

Malone, T.W., Laubacher, R., Dellarocas, C.: Harnessing Crowds: Mapping the Genome of Collective Intelligence. MIT Press, Cambridge (2009)

[11]

Mason, W., Watts, D.: Financial Incentives and the 'Performance of Crowds'. In: HCOMP Workshop at KDD, Paris, France (2009)

Digital Library

[12]

Nov, O., Naaman, M., Ye, C.: What Drives Content Tagging: The Case of Photos on Flickr. In: CHI, Florence, Italy (2008)

Digital Library

[13]

Snow, R., O' Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. In: EMNLP (2008)

Digital Library

[14]

Tang, J., Sanderson, M.: Evaluation and User Preference Study on Spatial Diversity. In: 32 ECIR, Milton Keynes, UK (2010)

Digital Library

[15]

Voorhees, E.: Personal communication (2009).

Cited By

Hettiachchi DKostakos VGoncalves J(2022)A Survey on Task Assignment in CrowdsourcingACM Computing Surveys10.1145/349452255:3(1-35)Online publication date: 3-Feb-2022
https://dl.acm.org/doi/10.1145/3494522
Moshfeghi YHuertas-Rosero A(2021)A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance AssessmentsACM Transactions on Information Systems10.1145/348096540:3(1-29)Online publication date: 17-Nov-2021
https://dl.acm.org/doi/10.1145/3480965
Nouri ZGadiraju UEngels GWachsmuth HConlan OHerder E(2021)What Is Unclear? Computational Assessment of Task Clarity in CrowdsourcingProceedings of the 32nd ACM Conference on Hypertext and Social Media10.1145/3465336.3475109(165-175)Online publication date: 30-Aug-2021
https://dl.acm.org/doi/10.1145/3465336.3475109
Show More Cited By

Design and implementation of relevance assessments using crowdsourcing
1. Human-centered computing

Recommendations

Design and Implementation of Relevance Assessments Using Crowdsourcing
ECIR 2011: Proceedings of the 33rd European Conference on Advances in Information Retrieval - Volume 6611

In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in ...
Using crowdsourcing for TREC relevance assessment

Crowdsourcing has recently gained a lot of attention as a tool for conducting different kinds of relevance evaluations. At a very high level, crowdsourcing describes outsourcing of tasks to a large group of people instead of assigning such tasks to an ...
Using graded relevance assessments in IR evaluation

This article proposes evaluation methods based on the use of nondichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ECIR'11: Proceedings of the 33rd European conference on Advances in information retrieval

April 2011

792 pages

ISBN:9783642201608

Editors:
Paul Clough
University of Sheffield, Information School, Sheffield, UK
,
Colum Foley
Dublin City University, School of Computing, Dublin, Ireland
,
Cathal Gurrin
Dublin City University, School of Computing, Dublin, Ireland
,
Hyowon Lee
Dublin City University, School of Computing, Dublin, Ireland
,
Gareth J. F. Jones
Dublin City University, School of Computing, Dublin, Ireland

Sponsors

Google Inc.
CNGL: CNGL
sfi: sfi
Microsoft Research: Microsoft Research
Yahoo! Labs

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 18 April 2011

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hettiachchi DKostakos VGoncalves J(2022)A Survey on Task Assignment in CrowdsourcingACM Computing Surveys10.1145/349452255:3(1-35)Online publication date: 3-Feb-2022
https://dl.acm.org/doi/10.1145/3494522
Moshfeghi YHuertas-Rosero A(2021)A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance AssessmentsACM Transactions on Information Systems10.1145/348096540:3(1-29)Online publication date: 17-Nov-2021
https://dl.acm.org/doi/10.1145/3480965
Nouri ZGadiraju UEngels GWachsmuth HConlan OHerder E(2021)What Is Unclear? Computational Assessment of Task Clarity in CrowdsourcingProceedings of the 32nd ACM Conference on Hypertext and Social Media10.1145/3465336.3475109(165-175)Online publication date: 30-Aug-2021
https://dl.acm.org/doi/10.1145/3465336.3475109
Difallah DChecco ADemartini GCudré-Mauroux P(2019)Deadline-Aware Fair Scheduling for Multi-Tenant Crowd-Powered SystemsACM Transactions on Social Computing10.1145/33010032:1(1-29)Online publication date: 21-Feb-2019
https://dl.acm.org/doi/10.1145/3301003
Chen DStolee KMenzies TGuéhéneuc YKhomh FSarro F(2019)Replication can improve prior resultsProceedings of the 27th International Conference on Program Comprehension10.1109/ICPC.2019.00037(179-190)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICPC.2019.00037
Inel OHaralabopoulos GLi DVan Gysel CSzlávik ZSimperl EKanoulas EAroyo LCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)Studying Topical Relevance with Evidence-based CrowdsourcingProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271779(1253-1262)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271779
Damessie TCulpepper JKim JScholer FCuzzocrea AAllan JPaton NSrivastava DAgrawal RBroder AZaki MCandan SLabrinidis ASchuster AWang H(2018)Presentation Ordering Effects On Assessor AgreementProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271750(723-732)Online publication date: 17-Oct-2018
https://dl.acm.org/doi/10.1145/3269206.3271750
Harris MLevene MZhang DLevene D(2018)Finding Parallel Passages in Cultural Heritage ArchivesJournal on Computing and Cultural Heritage 10.1145/319572711:3(1-24)Online publication date: 22-Aug-2018
https://dl.acm.org/doi/10.1145/3195727
McCreadie RSantos RMacdonald COunis I(2018)Explicit Diversification of Event Aspects for Temporal SummarizationACM Transactions on Information Systems10.1145/315867136:3(1-31)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.1145/3158671
Gadiraju UYang JBozzon ADolog PVojtas PBonchi FHelic D(2017)Clarity is a Worthwhile QualityProceedings of the 28th ACM Conference on Hypertext and Social Media10.1145/3078714.3078715(5-14)Online publication date: 4-Jul-2017
https://dl.acm.org/doi/10.1145/3078714.3078715
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten