skip to main content
research-article

Assessing the Fairness of AI Systems: AI Practitioners' Processes, Challenges, and Needs for Support

Published: 07 April 2022 Publication History

Abstract

Various tools and practices have been developed to support practitioners in identifying, assessing, and mitigating fairness-related harms caused by AI systems. However, prior research has highlighted gaps between the intended design of these tools and practices and their use within particular contexts, including gaps caused by the role that organizational factors play in shaping fairness work. In this paper, we investigate these gaps for one such practice: disaggregated evaluations of AI systems, intended to uncover performance disparities between demographic groups. By conducting semi-structured interviews and structured workshops with thirty-three AI practitioners from ten teams at three technology companies, we identify practitioners' processes, challenges, and needs for support when designing disaggregated evaluations. We find that practitioners face challenges when choosing performance metrics, identifying the most relevant direct stakeholders and demographic groups on which to focus, and collecting datasets with which to conduct disaggregated evaluations. More generally, we identify impacts on fairness work stemming from a lack of engagement with direct stakeholders or domain experts, business imperatives that prioritize customers over marginalized groups, and the drive to deploy AI systems at scale.

Supplementary Material

ZIP File (v6cscw1052aux.zip)
The supplemental material for our paper includes the protocols we used for the semi-structured interviews and the planning workshops.

References

[1]
Mark S Ackerman. 2000. The intellectual challenge of CSCW: The gap between social requirements and technical feasibility. Human-Computer Interaction, Vol. 15, 2--3 (2000), 179--203.
[2]
McKane Andrus, Elena Spitzer, Jeffrey Brown, and Alice Xiang. 2021. What We Can't Measure, We Can't Understand: Challenges to Demographic Data Procurement in the Pursuit of Fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 249--260.
[3]
Sherry R Arnstein. 1969. A ladder of citizen participation. Journal of the American Institute of planners, Vol. 35, 4 (1969), 216--224.
[4]
Mariam Asad, Lynn Dombrowski, Sasha Costanza-Chock, Sheena Erete, and Christina Harrington. 2019. Academic accomplices: Practical strategies for research justice. In Companion Publication of the 2019 on Designing Interactive Systems Conference 2019 Companion. 353--356.
[5]
Chloé Bakalar, Renata Barreto, Miranda Bogen, Sam Corbett-Davies, Melissa Hall, Isabel Kloumann, Michelle Lam, Joaquin Qui nonero Candela, Manish Raghavan, Joshua Simons, et al. 2021. Fairness On The Ground: Applying Algorithmic Fairness Approaches to Production Systems. arXiv preprint arXiv:2103.06172 (2021).
[6]
Solon Barocas, Asia J Biega, Benjamin Fish, Jke drzej Niklas, and Luke Stark. 2020. When not to design, build, or deploy. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 695--695.
[7]
Solon Barocas, Anhong Guo, Ece Kamar, Jacquelyn Krones, Meredith Ringel Morris, Jennifer Wortman Vaughan, Duncan Wadsworth, and Hanna Wallach. 2021. Designing Disaggregated Evaluations of AI Systems: Choices, Considerations, and Tradeoffs. arXiv preprint arXiv:2103.06076 (2021).
[8]
Eric PS Baumer and M Six Silberman. 2011. When the implication is not to design (technology). In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2271--2274.
[9]
Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610--623.
[10]
Cynthia L Bennett and Daniela K Rosner. 2019. The Promise of Empathy: Design, Disability, and Knowing the" Other". In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--13.
[11]
Abeba Birhane. 2020. Algorithmic colonization of Africa. SCRIPTed, Vol. 17 (2020), 389.
[12]
Melanie Birks, Ysanne Chapman, and Karen Francis. 2008. Memoing in qualitative research: Probing data and processes. Journal of research in nursing, Vol. 13, 1 (2008), 68--75.
[13]
Miranda Bogen, Aaron Rieke, and Shazeda Ahmed. 2020. Awareness in practice: tensions in access to sensitive attribute data for antidiscrimination. In Proceedings of the 2020 conference on fairness, accountability, and transparency. 492--500.
[14]
Margarita Boyarskaya, Alexandra Olteanu, and Kate Crawford. 2020. Overcoming Failures of Imagination in AI Infused System Development and Deployment. arXiv preprint arXiv:2011.13416 (2020).
[15]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology, Vol. 3, 2 (2006), 77--101.
[16]
Virginia Braun and Victoria Clarke. 2019. Reflecting on reflexive thematic analysis. Qualitative Research in Sport, Exercise and Health, Vol. 11, 4 (2019), 589--597.
[17]
Philip AE Brey. 2012. Anticipatory ethics for emerging technologies. Nanoethics, Vol. 6, 1 (2012), 1--13.
[18]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77--91.
[19]
Adele E Clarke et al. 2016. Anticipation work: Abduction, simplification, hope. Boundary objects and beyond: Working with Leigh Star, Vol. 85 (2016).
[20]
Sasha Costanza-Chock. 2020. Design justice: Community-led practices to build the worlds we need .The MIT Press.
[21]
Vincent De Gooyert, Etiënne Rouwette, Hans Van Kranenburg, and Edward Freeman. 2017. Reviewing the role of stakeholders in operational research: A stakeholder theory perspective. European Journal of Operational Research, Vol. 262, 2 (2017), 402--410.
[22]
Ezekiel Dixon-Román and Luciana Parisi. 2020. Data Capitalism, Sociogenic Prediction and Recursive Indeterminacies. In Public Plurality in an Era of Data Determinacy: Data Publics .
[23]
Arturo Escobar. 2018. Designs for the pluriverse: Radical interdependence, autonomy, and the making of worlds .Duke University Press.
[24]
Luciano Floridi and Andrew Strait. 2020. Ethical foresight analysis: What it is and why it is needed? Minds and Machines (2020), 1--21.
[25]
Jodi Forlizzi. 2018. Moving beyond user-centered design. interactions, Vol. 25, 5 (2018), 22--23.
[26]
Batya Friedman, David G Hendry, and Alan Borning. 2017. A survey of value sensitive design methods. Foundations and Trends in Human-Computer Interaction, Vol. 11, 2 (2017), 63--125.
[27]
Batya Friedman, Peter H Kahn Jr, Jennifer Hagman, Rachel L Severson, and Brian Gill. 2006. The watcher and the watched: Social judgments about privacy in a public place. Human-Computer Interaction, Vol. 21, 2 (2006), 235--272.
[28]
Tarleton Gillespie. 2020. Content moderation, AI, and the question of scale. Big Data & Society, Vol. 7, 2 (2020), 2053951720943234.
[29]
Alex Hanna and Tina M Park. 2020. Against Scale: Provocations and Resistances to Scale Thinking. arXiv preprint arXiv:2010.08850 (2020).
[30]
Brent Hecht, Lauren Wilcox, Jeffrey P. Bigham, Johannes Schöning, Ehsan Hoque, Jason Ernst, Yonatan Bisk, Luigi De Russis, Lana Yarosh, Bushra Anjam, Danish Contractor, and Cathy Wu. 2018. It's time to do something: Mitigating the negative impacts of computing through a change to the peer review process. ACM Future of Computing Blog.
[31]
Anna Lauren Hoffmann. 2020. Terms of Inclusion: Data, Discourse, Violence. New Media & Society (2020).
[32]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 1--18. https://doi.org/10.1145/3290605.3300830
[33]
Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, and Margaret Mitchell. 2021. Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 560--575.
[34]
Steven J Jackson and Sarah Barbrow. 2015. Standards and/as innovation: Protocols, creativity, and interactive systems development in ecology. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 1769--1778.
[35]
Abigail Z Jacobs and Hanna Wallach. 2021. Measurement and fairness. (2021), 375--385.
[36]
Deborah G Johnson. 2011. Software agents, anticipatory ethics, and accountability. In The growing gap between emerging technologies and legal-ethical oversight. Springer, 61--76.
[37]
Sonia K Katyal. 2019. Private accountability in the age of artificial intelligence. UCLA L. Rev., Vol. 66 (2019), 54.
[38]
Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, Vol. 117, 14 (2020), 7684--7689.
[39]
Joshua A Kroll. 2018. The fallacy of inscrutability. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 376, 2133 (2018), 20180084.
[40]
Michelle Seng Ah Lee and Jatinder Singh. 2020. The Landscape and Gaps in Open Source Fairness Toolkits. Available at SSRN (2020).
[41]
Lora Bex Lempert. 2010. Asking Questions of the Data: Memo Writing in the Grounded Theory Tradition. The SAGE Handbook of Grounded Theory: Paperback Edition (2010), 245.
[42]
Michael Lempert and E Summerson Carr. 2016. Scale: Discourse and dimensions of social life .University of California Press.
[43]
Calvin A Liang, Sean A Munson, and Julie A Kientz. 2021. Embracing Four Tensions in Human-Computer Interaction Research with Marginalized People. ACM Transactions on Computer-Human Interaction (TOCHI), Vol. 28, 2 (2021), 1--47.
[44]
Michael A Madaio, Luke Stark, Jennifer Wortman Vaughan, and Hanna Wallach. 2020. Co-designing checklists to understand organizational challenges and opportunities around fairness in ai. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--14.
[45]
Marci Meingast, Jennifer King, and Deirdre K Mulligan. 2007. Embedded RFID and everyday things: A case study of the security and privacy risks of the US e-passport. In 2007 IEEE International Conference on RFID. IEEE, 7--14.
[46]
Jacob Metcalf, Emanuel Moss, et al. 2019. Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics. Social Research: An International Quarterly, Vol. 86, 2 (2019), 449--476.
[47]
Milagros Miceli, Martin Schuessler, and Tianling Yang. 2020. Between Subjectivity and Imposition: Power Dynamics in Data Annotation for Computer Vision. Proceedings of the ACM on Human-Computer Interaction, Vol. 4, CSCW2 (2020), 1--25.
[48]
Ronald K Mitchell, Bradley R Agle, and Donna J Wood. 1997. Toward a theory of stakeholder identification and salience: Defining the principle of who and what really counts. Academy of management review, Vol. 22, 4 (1997), 853--886.
[49]
Shakir Mohamed, Marie-Therese Png, and William Isaac. 2020. Decolonial AI: Decolonial theory as sociotechnical foresight in artificial intelligence. Philosophy & Technology, Vol. 33, 4 (2020), 659--684.
[50]
Jessica Morley, Anat Elhalal, Francesca Garcia, Libby Kinsey, Jakob Mokander, and Luciano Floridi. 2021. Ethics as a service: a pragmatic operationalisation of AI Ethics. arXiv preprint arXiv:2102.09364 (2021).
[51]
Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf. 2021. Assembling Accountability: Algorithmic Impact Assessment for the Public Interest. Available at SSRN 3877437 (2021).
[52]
Michael Muller, Melanie Feinberg, Timothy George, Steven J Jackson, Bonnie E John, Mary Beth Kery, and Samir Passi. 2019. Human-centered study of data science work practices. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1--8.
[53]
Priyanka Nanayakkara, Nicholas Diakopoulos, and Jessica Hullman. 2020. Anticipatory Ethics and the Role of Uncertainty. arXiv preprint arXiv:2011.13170 (2020).
[54]
Mei Ngan, Mei Ngan, Patrick Grother, Kayee Hanaoka, and Jason Kuo. 2020. Face recognition vendor test (frvt) part 4: Morph-performance of automated face morph detection .US Department of Commerce, National Institute of Standards and Technology.
[55]
Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science, Vol. 366, 6464 (2019), 447--453.
[56]
Samir Passi and Solon Barocas. 2019. Problem formulation and fairness. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 39--48.
[57]
Samir Passi and Steven J Jackson. 2018. Trust in data science: Collaboration, translation, and accountability in corporate data science projects. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--28.
[58]
Samir Passi and Phoebe Sengers. 2020. Making data science systems work. Big Data & Society, Vol. 7, 2 (2020), 2053951720939605.
[59]
Jennifer Pierre, Roderic Crooks, Morgan Currie, Britt Paris, and Irene Pasquetto. 2021. Getting Ourselves Together: Data-centered participatory design research & epistemic burden. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--11.
[60]
David Piorkowski, Soya Park, April Yi Wang, Dakuo Wang, Michael Muller, and Felix Portnoy. 2021. How AI Developers Overcome Communication Challenges in a Multidisciplinary Team: A Case Study. arXiv preprint arXiv:2101.06098 (2021).
[61]
Carina EA Prunkl, Carolyn Ashurst, Markus Anderljung, Helena Webb, Jan Leike, and Allan Dafoe. 2021. Institutionalizing ethics in AI through broader impact requirements. Nature Machine Intelligence, Vol. 3, 2 (2021), 104--110.
[62]
Inioluwa Deborah Raji, Andrew Smart, Rebecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 33--44.
[63]
Bogdana Rakova, Jingying Yang, Henriette Cramer, and Rumman Chowdhury. 2020. Where Responsible AI meets Reality: Practitioner Perspectives on Enablers for shifting Organizational Practices. arXiv preprint arXiv:2006.12358 (2020).
[64]
Victor Ray. 2019. A theory of racialized organizations. American Sociological Review, Vol. 84, 1 (2019), 26--53.
[65]
Brianna Richardson, Jean Garcia-Gathright, Samuel F Way, Jennifer Thom, and Henriette Cramer. 2021. Towards Fairness in Practice: A Practitioner-Oriented Rubric for Evaluating Fair ML Toolkits. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--13.
[66]
Chiara Rossitto, Airi Lampinen, Susanne Bødker, Ann Light, Ketie Berns, and Julie Hui. 2020. Reconsidering Scale and Scaling in CSCW Research. In Conference Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing. 493--501.
[67]
Nithya Sambasivan, Erin Arnesen, Ben Hutchinson, Tulsee Doshi, and Vinodkumar Prabhakaran. 2021. Re-imagining algorithmic fairness in india and beyond. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 315--328.
[68]
Kjeld Schmidt. 2000. The critical role of workplace studies in CSCW., 141--149 pages.
[69]
DA Schön. 1983. The Reflective Practitioner: How Professionals Think in Action Basic Books Inc. New York, NY (1983).
[70]
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the conference on fairness, accountability, and transparency. 59--68.
[71]
Phoebe Sengers, Kirsten Boehner, Shay David, and Joseph'Jofish' Kaye. 2005. Reflective design. In Proceedings of the 4th decennial conference on Critical computing: between sense and sensibility. 49--58.
[72]
Katie Shilton. 2015a. "That's Not An Architecture Problem!": Techniques and Challenges for Practicing Anticipatory Technology Ethics. iConference 2015 Proceedings (2015).
[73]
Katie Shilton. 2015b. Anticipatory ethics for a future Internet: Analyzing values during the design of an internet infrastructure. Science and engineering ethics, Vol. 21, 1 (2015), 1--18.
[74]
Mona Sloane, Emanuel Moss, Olaitan Awomolo, and Laura Forlano. 2020. Participation is not a design fix for machine learning. arXiv preprint arXiv:2007.02423 (2020).
[75]
Stephanie B Steinhardt and Steven J Jackson. 2015. Anticipation work: Cultivating vision in collective practice. In Proceedings of the 18th ACM conference on computer supported cooperative work & social computing. 443--453.
[76]
Lucy Suchman. 2002. Located accountabilities in technology production. Scandinavian journal of information systems, Vol. 14, 2 (2002), 7.
[77]
Lucy Suchman. 2011. Anthropological relocations and the limits of design. Annual review of anthropology, Vol. 40 (2011), 1--18.
[78]
Harini Suresh, Steven R Gomez, Kevin K Nam, and Arvind Satyanarayan. 2021. Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs. arXiv preprint arXiv:2101.09824 (2021).
[79]
Stefan Timmermans and Steven Epstein. 2010. A world of standards but not a standard world: Toward a sociology of standards and standardization. Annual review of Sociology, Vol. 36 (2010), 69--89.
[80]
Jasper Tran O'Leary, Sara Zewde, Jennifer Mankoff, and Daniela K Rosner. 2019. Who gets to future? Race, representation, and design methods in Africatown. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--13.
[81]
Anna Lowenhaupt Tsing. 2012. On NonscalabilityThe Living World Is Not Amenable to Precision-Nested Scales. Common Knowledge, Vol. 18, 3 (2012), 505--524.
[82]
Sarah Myers West, Meredith Whittaker, and Kate Crawford. 2019. Discriminating systems. AI Now (2019).
[83]
J Christopher Westland. 2002. The cost of errors in software development: evidence from industry. Journal of Systems and Software, Vol. 62, 1 (2002), 1--9.
[84]
Christine Wolf and Jeanette Blomberg. 2019. Evaluating the promise of human-algorithm collaborations in everyday work practices. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--23.
[85]
Christine T Wolf. 2019. Conceptualizing Care in the Everyday Work Practices of Machine Learning Developers. In Companion Publication of the 2019 on Designing Interactive Systems Conference 2019 Companion. ACM, 331--335. https://doi.org/10.1145/3301019.3323879
[86]
Christine T Wolf and Jeanette L Blomberg. 2020. Making Sense of Enterprise Apps in Everyday Work Practices. Computer Supported Cooperative Work (CSCW), Vol. 29, 1 (2020), 1--27.
[87]
Christine T Wolf, Kathryn E Ringland, Isley Gao, and Paul Dourish. 2018a. Participating through data: Charting relational tensions in multiplatform data flows. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--17.
[88]
Christine T Wolf, Haiyi Zhu, Julia Bullard, Min Kyung Lee, and Jed R Brubaker. 2018b. The Changing Contours of" Participation" in Data-driven, Algorithmic Ecosystems: Challenges, Tactics, and an Agenda. In Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing. 377--384.
[89]
Richmond Y Wong and Tonya Nguyen. 2021. Timelines: A World-Building Activity for Values Advocacy. (2021).
[90]
Daisy Yoo, Alina Huldtgren, Jill Palzkill Woelfer, David G Hendry, and Batya Friedman. 2013. A value sensitive action-reflection model: evolving a co-design space with stakeholder and designer prompts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 419--428. https://doi.org/10.1145/2470654.2470715
[91]
Daisy Yoo, John Zimmerman, Aaron Steinfeld, and Anthony Tomasic. 2010. Understanding the space for co-design in riders' interactions with a transit service. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1797--1806. https://doi.org/10.1145/1753326.1753596
[92]
Iris Marion Young. 1990. Justice and the Politics of Difference. Princeton University Press.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 6, Issue CSCW1
CSCW1
April 2022
2511 pages
EISSN:2573-0142
DOI:10.1145/3530837
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2022
Published in PACMHCI Volume 6, Issue CSCW1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI
  2. fairness
  3. machine learning
  4. software development practices

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)811
  • Downloads (Last 6 weeks)153
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media