skip to main content
10.1145/3531146.3534647acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open access

CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

Published: 20 June 2022 Publication History

Abstract

Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature that provides insights into ethical considerations around crowdsourced dataset annotation. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is, and how the annotators’ lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms, and what that relationship affords them. Finally, we introduce a novel framework, CrowdWorkSheets, for dataset developers to facilitate transparent documentation of key decisions points at various stages of the data annotation pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance.

References

[1]
2008. Turkopticon. https://turkopticon.net/. Accessed: 2021-07-21.
[2]
Ali Alkhatib, Michael S. Bernstein, and Margaret Levi. 2017. Examining Crowd Work and Gig Work Through The Historical Lens of Piecework. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 4599–4616. https://doi.org/10.1145/3025453.3025974
[3]
Lora Aroyo, Lucas Dixon, Nithum Thain, Olivia Redfield, and Rachel Rosen. 2019. Crowdsourcing Subjective Tasks: The Case Study of Understanding Toxicity in Online Discussions. In Companion Proceedings of The 2019 World Wide Web Conference (San Francisco, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 1100–1105. https://doi.org/10.1145/3308560.3317083
[4]
Lora Aroyo and Chris Welty. 2013. Crowd truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard. WebSci2013. ACM (2013).
[5]
Lora Aroyo and Chris Welty. 2015. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine 36, 1 (Mar. 2015), 15–24. https://doi.org/10.1609/aimag.v36i1.2564
[6]
Emily M. Bender and Batya Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Transactions of the Association for Computational Linguistics 6 (2018), 587–604. https://doi.org/10.1162/tacl_a_00041
[7]
Janine Berg. 2015. Income security in the on-demand economy: Findings and policy lessons from a survey of crowdworkers. Comp. Lab. L. & Pol’y J. 37 (2015), 543.
[8]
Maja Bott and Gregor Young. 2012. The role of crowdsourcing for better governance in international development. Praxis: The Fletcher Journal of Human Security 27, 1 (2012), 47–70.
[9]
Kasia S. Chmielinski, Sarah Newman, Matt Taylor, Josh Joseph, Kemi Thomas, Jessica Yurkofsky, and Yue Chelsea Qiu. 2020. The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence. (2020). http://securedata.lol/ NeurIPS Workshop on Dataset Curation and Security.
[10]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.
[11]
Emily L. Denton, A. Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, and M. Scheuerman. 2020. Bringing the People Back In: Contesting Benchmark Machine Learning Datasets. ICML Workshop on Participatory Approaches to Machine Learning (2020).
[12]
Lucas Dixon. 2018. Annotation instructions for Toxicity with sub-attributes. https://github.com/conversationai/conversationai.github.io/blob/master/crowdsourcing_annotation_schemes/toxicity_with_subattributes.md. Accessed: 2021-01-19.
[13]
Sarah Emerson. 2019. ‘I Can Hear the Suffering’: Rev Exposes Freelance Transcribers to Violent, Disturbing Content. Medium OneZero (2019).
[14]
Anna Filippova, Connor Gilroy, Ridhi Kashyap, Antje Kirchner, Allison C. Morgan, Kivan Polimis, Adaner Usmani, and Tong Wang. 2019. Humans in the Loop: Incorporating Expert and Crowd-Sourced Knowledge for Predictions Using Survey Data. Socius 5(2019), 2378023118820157.
[15]
Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.
[16]
Joshua Benjamin Freeman. 2018. Behemoth : a history of the factory and the making of the modern world. W.W. Norton & Company, Inc.,.
[17]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for datasets. arXiv preprint arXiv:1803.09010(2018).
[18]
R. Stuart Geiger, Kevin Yu, Yanlai Yang, Mindy Dai, Jie Qiu, Rebekah Tang, and Jenny Huang. 2020. Garbage in, Garbage out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* ’20). Association for Computing Machinery, New York, NY, USA, 325–336. https://doi.org/10.1145/3351095.3372862
[19]
Christine Gerber. 2021. Community building on crowdwork platforms: Autonomy and control of online workers?Competition & Change 25, 2 (2021), 190–211.
[20]
Sayan Ghosh, Dylan Baker, David Jurgens, and Vinodkumar Prabhakaran. 2021. Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). Association for Computational Linguistics, Online, 313–328. https://aclanthology.org/2021.wnut-1.35
[21]
Amy K Glasmeier. 2020. Living Wage Calculator. (2020). livingwage.mit.edu
[22]
Mary L Gray and Siddharth Suri. 2019. Ghost work: How to stop Silicon Valley from building a new global underclass. Eamon Dolan Books.
[23]
Benjamin V Hanrahan, Anita Chen, JiaHua Ma, Ning F Ma, Anna Squicciarini, and Saiph Savage. 2021. The Expertise Involved in Deciding which HITs are Worth Doing on Amazon Mechanical Turk. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1(2021), 1–23.
[24]
Kotaro Hara, Abigail Adams, Kristy Milland, Saiph Savage, Chris Callison-Burch, and Jeffrey P Bigham. 2018. A data-driven analysis of workers’ earnings on Amazon Mechanical Turk. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–14.
[25]
Anna Lauren Hoffmann. 2021. Terms of inclusion: Data, discourse, violence. New Media & Society 23, 12 (2021), 3539–3556. https://doi.org/10.1177/1461444820958725 arXiv:https://doi.org/10.1177/1461444820958725
[26]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677(2018).
[27]
Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. In Proceedings of the Conference on Fairness, Accountability, and Transparency.
[28]
Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, 560–575.
[29]
Panagiotis G Ipeirotis. 2010. Demographics of mechanical turk. (2010).
[30]
Lilly C Irani and M Six Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI conference on human factors in computing systems. 611–620.
[31]
Toni Kaplan, Susumu Saito, Kotaro Hara, and Jeffrey P Bigham. 2018. Striving to earn more: a survey of work strategies and tool use among crowd workers. In Sixth AAAI Conference on Human Computation and Crowdsourcing.
[32]
David Kocsis and Gert Jan De Vreede. 2016. Towards a taxonomy of ethical considerations in crowdsourcing(22nd Americas Conference on Information Systems: Surfing the IT Innovation Wave, AMCIS 2016).
[33]
David Martin, Benjamin V. Hanrahan, Jacki O’Neill, and Neha Gupta. 2014. Being a Turker. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (Baltimore, Maryland, USA) (CSCW ’14). Association for Computing Machinery, New York, NY, USA, 224–235. https://doi.org/10.1145/2531602.2531663
[34]
Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision. Proc. ACM Hum.-Comput. Interact. 4, CSCW2, Article 115 (Oct. 2020), 25 pages. https://doi.org/10.1145/3415186
[35]
Robert Munro Monarch. 2021. Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI. Simon and Schuster.
[36]
Cecilia Ovesdotter Alm. 2011. Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 107–112. https://aclanthology.org/P11-2019
[37]
Desmond Upton Patton, Philipp Blandfort, William R Frey, Michael B Gaskell, and Svebor Karaman. 2019. Annotating twitter data from vulnerable populations: Evaluating disagreement between domain experts and graduate student annotators.
[38]
Lisa Posch, Arnim Bleier, Fabian Flöck, and Markus Strohmaier. 2018. Characterizing the global crowd workforce: A cross-country comparison of crowdworker demographics. arXiv preprint arXiv:1812.05948(2018).
[39]
Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, and Mark Diaz. 2021. On Releasing Annotator-Level Labels and Information in Datasets. In Proceedings of the 15th Linguistic Annotation Workshop. Association for Computational Linguistics, Virtual.
[40]
Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. 2022. Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.
[41]
Alexander J. Quinn and Benjamin B. Bederson. 2011. Human Computation: A Survey and Taxonomy of a Growing Field. Association for Computing Machinery, New York, NY, USA, 1403–1412. https://doi.org/10.1145/1978942.1979148
[42]
Jorge Ramírez, Burcu Sayin, Marcos Baez, Fabio Casati, Luca Cernuzzi, Boualem Benatallah, and Gianluca Demartini. 2021. On the State of Reporting in Crowdsourcing Experiments and a Checklist to Aid Current Practices. Proc. ACM Hum.-Comput. Interact. 5, CSCW2, Article 387 (oct 2021), 34 pages. https://doi.org/10.1145/3479531
[43]
Sarah T Roberts. 2016. Digital refuse: Canadian garbage, commercial content moderation and the global circulation of social media’s waste. Wi: Journal of Mobile Media 10, 1 (2016), 1–18.
[44]
Negar Rostamzadeh, Subhrajit Roy, Diana Mincu, Andrew Smart, Lauren Wilcox, Mahima Pushkarna, Razvan Amironesei, Jessica Schrouff, Madeleine Elish, Nyalleng Moorosi, Berk Ustun, Noah Broesti, and Katherine Heller. 2021. Specialized Healthsheet for Healthcare Datasets. In Machine Learning for Health (ML4H).
[45]
Reka Marta Sabou, Kalina Bontcheva, Leon Derczynski, and A. Scharl. 2014. Corpus Annotation through Crowdsourcing:Towards Best Practice Guidelines. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). European Language Resources Association (ELRA), Reykjavik, Iceland.
[46]
Niloufar Salehi, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, Kristy Milland, and Clickhappier. 2015. We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers. Association for Computing Machinery, New York, NY, USA, 1621–1630. https://doi.org/10.1145/2702123.2702508
[47]
Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vinodkumar Prabhakaran. 2021. Re-Imagining Algorithmic Fairness in India and Beyond. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 315–328. https://doi.org/10.1145/3442188.3445896
[48]
Morgan Klaus Scheuerman, Emily Denton, and Alex Hanna. 2021. Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development. Computer Supported Cooperative Work (CSCW)(2021).
[49]
Daniel Schlagwein, Dubravka Cecez-Kecmanovic, and Benjamin Hanckel. 2019. Ethical norms and issues in crowdsourcing practices: A Habermasian analysis. Information Systems Journal 29, 4 (2019), 811–837. https://doi.org/10.1111/isj.12227 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/isj.12227
[50]
Louise Seamster and Raphaël Charron-Chénier. 2017. Predatory Inclusion and Education Debt: Rethinking the Racial Wealth Gap. Social Currents 4, 3 (2017), 199–207. https://doi.org/10.1177/2329496516686620 arXiv:https://doi.org/10.1177/2329496516686620
[51]
Andrew D. Selbst, Danah Boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and Abstraction in Sociotechnical Systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 59–68. https://doi.org/10.1145/3287560.3287598
[52]
Alana Semuels. 2018. The internet is enabling a new kind of poorly paid hell. The Atlantic 23(2018).
[53]
Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. Turkers, Scholars, ”Arafat” and ”Peace”: Cultural Communities and Algorithmic Gold Standards. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada) (CSCW ’15). Association for Computing Machinery, New York, NY, USA, 826–838. https://doi.org/10.1145/2675133.2675285
[54]
Boaz Shmueli, Jan Fell, Soumya Ray, and Lun-Wei Ku. 2021. Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing. arXiv preprint arXiv:2104.10097(2021).
[55]
Ramya Malur Srinivasan, Emily Denton, Jordan Jennifer Famularo, Negar Rostamzadeh, Fernando Diaz, and Beth Coleman. 2021. Artsheets for Art Datasets. In Proceedings of Neural Information Processing Systems (NeurIPS), Datasets & Benchmarks Track. https://openreview.net/pdf?id=K7ke_GZ_6N
[56]
Carlos Toxtli, Siddharth Suri, and Saiph Savage. 2021. Quantifying the Invisible Labor in Crowd Work. Proc. ACM Hum.-Comput. Interact. 5, CSCW2 (2021).
[57]
Donna Vakharia and Matthew Lease. 2015. Beyond Mechanical Turk: An analysis of paid crowd work platforms. Proceedings of the iConference(2015), 1–17.
[58]
Emily Vogels. 2021. The state of online harassment. Pew Research Center (2021).
[59]
Zeerak Waseem. 2016. Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In Proceedings of the First Workshop on NLP and Computational Social Science. Association for Computational Linguistics, Austin, Texas, 138–142. https://doi.org/10.18653/v1/W16-5618

Cited By

View all

Index Terms

  1. CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
        June 2022
        2351 pages
        ISBN:9781450393522
        DOI:10.1145/3531146
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 20 June 2022

        Check for updates

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        FAccT '22
        Sponsor:

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)899
        • Downloads (Last 6 weeks)93
        Reflects downloads up to 06 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media