skip to main content
research-article

Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata

Published: 11 November 2022 Publication History

Abstract

Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparency around the processes by which they are created, researchers and practitioners have begun to advocate for increased data documentation and have proposed several data documentation frameworks. However, there is little research on whether these data documentation frameworks meet the needs of ML practitioners, who both create and consume datasets. To address this gap, we set out to understand ML practitioners' data documentation perceptions, needs, challenges, and desiderata, with the ultimate goal of deriving design requirements that can inform future data documentation frameworks. We conducted a series of semi-structured interviews with 14 ML practitioners at a single large, international technology company. We had them answer a list of questions taken from datasheets for datasets~\citegebru2018datasheets. Our findings show that current approaches to data documentation are largely ad hoc and myopic in nature. Participants expressed needs for data documentation frameworks to be adaptable to their contexts, integrated into their existing tools and workflows, and automated wherever possible. Despite the fact that data documentation frameworks are often motivated from the perspective of responsible AI, participants did not make the connection between the questions that they were asked to answer and their responsible AI implications. In addition, participants often had difficulties prioritizing the needs of dataset consumers and providing information that someone unfamiliar with their datasets might need to know. Based on these findings, we derive seven design requirements for future data documentation frameworks such as more actionable guidance on how the characteristics of datasets might result in harms and how these harms might be mitigated, more explicit prompts for reflection, automated adaptation to different contexts, and integration into ML practitioners' existing tools and workflows.

Supplementary Material

PDF File (v6cscw2340aux.pdf)
This is the form provided to participants in the study presented in the CSCW 2022 paper "Understanding ML Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata" by Amy K. Heger et al.

References

[1]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.
[2]
Matthew Arnold, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilović, Ravi Nair, K Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, et al. 2019. FactSheets: Increasing trust in AI services through supplier's declarations of conformity. IBM Journal of Research and Development, Vol. 63, 4/5 (2019), 6--1.
[3]
Lora Aroyo and Chris Welty. 2015. Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation. AI Magazine, Vol. 36, 1 (2015), 15--24.
[4]
Solon Barocas, Asia J. Biega, Benjamin Fish, Jc edrzej Niklas, and Luke Stark. 2020. When Not to Design, Build, or Deploy (CRAFT session). In Proceedings of the Conference on Fairness, Accountability, and Transparency.
[5]
Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith Ringel Morris, Jennifer Wortman Vaughan, Duncan Wadsworth, and Hanna Wallach. 2021. Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs. In Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES).
[6]
Eric P. S. Baumer and M. Six Silberman. 2011. When the Implication Is Not to Design (Technology). In Proceedings of the ACM International Conference on Human Factors in Computing Systems (CHI). 2271--2274.
[7]
Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, Vol. 6 (2018), 587--604.
[8]
Margarita Boyarskaya, Alexandra Olteanu, and Kate Crawford. 2020. Overcoming Failures of Imagination in AI Infused System Development and Deployment. In NeurIPS Workshop on Navigating the Broader Impacts of AI Research.
[9]
Karen Boyd. 2020. Ethical Sensitivity in Machine Learning Development. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. 87--92.
[10]
Karen L. Boyd. 2021. Datasheets for Datasets Help ML Engineers Notice and Understand Ethical Issues in Training Data. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW2, Article 438, 27 pages.
[11]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology, Vol. 3, 2 (2006), 77--101. https://doi.org/10.1191/1478088706qp063oa
[12]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the Conference on Fairness, Accountability and Transparency (FAT*). 77--91.
[13]
Yang Trista Cao and Hal Daumé III. 2020. Toward Gender-Inclusive Coreference Resolution. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). abs/1910.13913.
[14]
Kasia S. Chmielinski, Sarah Newman, Matt Taylor, Josh Joseph, Kemi Thomas, Jessica Yurkofsky, and Yue Chelsea Qiu. 2020. The Dataset Nutrition Label (2nd Gen): Leveraging Context to mitigate Harms in Artificial Intelligence. In NeurIPS Workshop on Dataset Curation and Security.
[15]
Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, Vol. 5, 2 (2017), 153--163.
[16]
Kevin Crowston, Jeffery S Saltz, Amira Rezgui, Yatish Hegde, and Sangseok You. 2019. MIDST: A System to Support Stigmergic Coordination in Data-Science Teams. In Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing. 5--8.
[17]
Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. 2019. Does Object Recognition Work for Everyone?. In CVPR Workshop on Computer Vision for Global Challenges. 52--59.
[18]
Casey Fiesler. 2021. Innovating Like an Optimist, Preparing Like a Pessimist: Ethical Speculation and the Legal Imagination. Colorado Technology Law Journal, Vol. 19, 1 (2021).
[19]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM, Vol. 64, 12 (December 2021), 86--92.
[20]
Robert Geirhos, J" orn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A. Wichmann. 2020. Shortcut Learning in Deep Neural Networks. Nature Machine Intelligence, Vol. 2 (2020), 665--673.
[21]
Zoltán Gócza. 2015. Myth #21: People can tell you what they want. https://uxmyths.com/post/746610684/myth-21-people-can-tell-you-what-they-want/
[22]
Philipp Hacker. 2018. Teaching fairness to artificial intelligence: existing and novel strategies against algorithmic discrimination under EU law. Common Market Law Review, Vol. 55, 4 (2018).
[23]
MD Romael Haque, Katherine Weathington, and Shion Guha. 2019. Exploring the Impact of (Not) Changing Default Settings in Algorithmic Crime Mapping-A Case Study of Milwaukee, Wisconsin. In Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing. 206--210.
[24]
Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and Visualizing Data Iteration in Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, 1--13.
[25]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677 (2018).
[26]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--16.
[27]
Youyang Hou and Dakuo Wang. 2017. Hacking with NPOs: collaborative analytics and broker roles in civic data hackathons. Proceedings of the ACM on Human-Computer Interaction, Vol. 1, CSCW (2017), 1--16.
[28]
Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. 2019. Understanding and Mitigating Worker Biases in the Crowdsourced Collection of Subjective Judgments. In Proceedings of the ACM International Conference on Human Factors in Computing Systems (CHI). 1--12.
[29]
Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure. In ACM Conference on Fairness, Accountability and Transparency (FAccT). 560--575.
[30]
Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence, Vol. 1 (2019), 389--399.
[31]
Yannis Katsis and Christine T Wolf. 2019. ModelLens: An Interactive System to Support the Model Improvement Practices of Data Science Teams. In Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing. 9--13.
[32]
PM Krafft, Meg Young, Michael Katell, Karen Huang, and Ghislain Bugingo. 2020. Defining AI in policy versus practice. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 72--78.
[33]
Charlotte P Lee. 2007. Boundary negotiating artifacts: Unbinding the routine of boundary objects and embracing chaos in collaborative work. Computer Supported Cooperative Work (CSCW), Vol. 16, 3 (2007), 307--339.
[34]
Min Kyung Lee and Kate Rich. 2021. Who Is Included in Human Perceptions of AI?: Trust and Perceived Fairness around Healthcare AI and Cultural Mistrust. (2021), 1--14.
[35]
Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-designing checklists to understand organizational challenges and opportunities around fairness in ai. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--14.
[36]
Yaoli Mao, Dakuo Wang, Michael Muller, Kush R Varshney, Ioana Baldini, Casey Dugan, and Aleksandra Mojsilović. 2019. How data scientistswork together with domain experts in scientific collaborations: To find the right answer or to ask the right question? Proceedings of the ACM on Human-Computer Interaction, Vol. 3, GROUP (2019), 1--23.
[37]
Joseph A Maxwell. 2012. Qualitative research design: An interactive approach. Vol. 41. Sage publications.
[38]
Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision. Proceedings of the ACM on Human-Computer Interaction, Vol. 4, CSCW2 (2020), 1--25.
[39]
Milagros Miceli, Tianling Yang, Laurens Naudts, Martin Schuessler, Diana Serbanescu, and Alex Hanna. 2021. Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 161--172.
[40]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.
[41]
Brent Mittelstadt. 2019. Principles Alone Cannot Guarantee Ethical AI. Nature Machine Intelligence, Vol. 1 (2019), 501--507.
[42]
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How data science workers work with data: Discovery, capture, curation, design, creation. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--15.
[43]
Andrew B Neang, Will Sutherland, Michael W Beach, and Charlotte P Lee. 2021. Data Integration as Coordination: The Articulation of Data Work in an Ocean Science Collaboration. Proceedings of the ACM on Human-Computer Interaction, Vol. 4, CSCW3 (2021), 1--25.
[44]
Jakob Nielsen. 2001. First Rule of Usability? Don't Listen to Users. https://www.nngroup.com/articles/first-rule-of-usability-dont-listen-to-users/
[45]
The Partnership on AI. 2019. Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles (ABOUT ML), Version 0. Technical Report.
[46]
Samir Passi and Steven J Jackson. 2018. Trust in data science: Collaboration, translation, and accountability in corporate data science projects. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--28.
[47]
Kayur Patel, Naomi Bancroft, Steven M Drucker, James Fogarty, Andrew J Ko, and James Landay. 2010. Gestalt: integrated support for implementation and analysis in machine learning. In Proceedings of the 23nd annual ACM symposium on User interface software and technology. 37--46.
[48]
Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2020. Data and its (dis)contents: A survey of dataset development and use in machine learning research. In NeurIPS Workshop on Machine Learning Retrospectives, Surveys, and Meta-analyses.
[49]
Bogdana Rakova, Jingying Yang, Henriette Cramer, and Rumman Chowdhury. 2021. Where Responsible AI meets Reality: Practitioner Perspectives on Enablers for Shifting Organizational Practices. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW1, Article 7 (2021).
[50]
Kjeld Schmidt. 2008. Taking CSCW Seriously: Supporting Articulation Work (1992). In Cooperative Work and Coordinative Practices. Springer, 45--71.
[51]
Andrew D Selbst. 2017. Disparate impact in big data policing. Ga. L. Rev., Vol. 52 (2017), 109.
[52]
M. Six Silberman, Bill Tomlinson, Rochelle LaPlante, Joel Ross, Lilly Irani, and Andrew Zaldivar. 2018. Responsible Research with Crowds: Pay Crowdworkers at Least Minimum Wage. Commun. ACM, Vol. 61, 3 (March 2018), 39--41.
[53]
Susan Leigh Star and James R Griesemer. 1989. Institutional ecology,translations' and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907--39. Social studies of science, Vol. 19, 3 (1989), 387--420.
[54]
Luke Stark. 2019. Facial Recognition is the Plutonium of AI. ACM XRDS, Vol. 25, 3 (2019), 50--55.
[55]
Anselm Strauss. 1988. The articulation of project work: An organizational process. Sociological Quarterly, Vol. 29, 2 (1988), 163--178.
[56]
Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the 2018 chi conference on human factors in computing systems. 1--14.
[57]
Jessica Vitak, Katie Shilton, and Zahra Ashktorab. 2016. Beyond the Belmont principles: Ethical challenges, practices, and beliefs in the online data research community. In Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing. 941--953.
[58]
Leo Yeykelis. 2018. Why It's Wrong To Ask Users What They Want (And What To Ask Instead). Forbes (May 2018). https://www.forbes.com/sites/leoyeykelis/2018/05/10/why-its-wrong-to-ask-users-what-they-want-and-what-to-ask-instead/'sh=1449b7c91f22
[59]
John R. Zech, Marcus A. Badgeley, Manway Liu, Anthony B. Costa, Joseph J. Titano, and Eric Karl Oermann. 2018. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Medicine, Vol. 15, 11 (2018).
[60]
Amy X Zhang, Michael Muller, and Dakuo Wang. 2020. How do data science workers collaborate? Roles, workflows, and tools. Proceedings of the ACM on Human-Computer Interaction, Vol. 4, CSCW1 (2020), 1--23.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 6, Issue CSCW2
CSCW
November 2022
8205 pages
EISSN:2573-0142
DOI:10.1145/3571154
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2022
Published in PACMHCI Volume 6, Issue CSCW2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. datasets
  2. documentation
  3. machine learning
  4. responsible AI

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)204
  • Downloads (Last 6 weeks)41
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media