Search | arXiv e-print repository

Explanation Hacking: The perils of algorithmic recourse

Authors: Emily Sullivan, Atoosa Kasirzadeh

Abstract: We argue that the trend toward providing users with feasible and actionable explanations of AI decisions, known as recourse explanations, comes with ethical downsides. Specifically, we argue that recourse explanations face several conceptual pitfalls and can lead to problematic explanation hacking, which undermines their ethical status. As an alternative, we advocate that explanations of AI decisi… ▽ More We argue that the trend toward providing users with feasible and actionable explanations of AI decisions, known as recourse explanations, comes with ethical downsides. Specifically, we argue that recourse explanations face several conceptual pitfalls and can lead to problematic explanation hacking, which undermines their ethical status. As an alternative, we advocate that explanations of AI decisions should aim at understanding. △ Less

Submitted 22 March, 2024; originally announced June 2024.

arXiv:2405.13974 [pdf, other]

CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models

Authors: Giada Pistilli, Alina Leidinger, Yacine Jernite, Atoosa Kasirzadeh, Alexandra Sasha Luccioni, Margaret Mitchell

Abstract: This paper introduces the "CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset, designed to evaluate the social and cultural variation of Large Language Models (LLMs) across multiple languages and value-sensitive topics. We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, so… ▽ More This paper introduces the "CIVICS: Culturally-Informed & Values-Inclusive Corpus for Societal impacts" dataset, designed to evaluate the social and cultural variation of Large Language Models (LLMs) across multiple languages and value-sensitive topics. We create a hand-crafted, multilingual dataset of value-laden prompts which address specific socially sensitive topics, including LGBTQI rights, social welfare, immigration, disability rights, and surrogacy. CIVICS is designed to generate responses showing LLMs' encoded and implicit values. Through our dynamic annotation processes, tailored prompt design, and experiments, we investigate how open-weight LLMs respond to value-sensitive issues, exploring their behavior across diverse linguistic and cultural contexts. Using two experimental set-ups based on log-probabilities and long-form responses, we show social and cultural variability across different LLMs. Specifically, experiments involving long-form responses demonstrate that refusals are triggered disparately across models, but consistently and more frequently in English or translated statements. Moreover, specific topics and sources lead to more pronounced differences across model answers, particularly on immigration, LGBTQI rights, and social welfare. As shown by our experiments, the CIVICS dataset aims to serve as a tool for future research, promoting reproducibility and transparency across broader linguistic settings, and furthering the development of AI technologies that respect and reflect global cultural diversities and value pluralism. The CIVICS dataset and tools will be made available upon publication under open licenses; an anonymized version is currently available at https://huggingface.co/CIVICS-dataset. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2404.09932 [pdf, other]

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (13 additional authors not shown)

Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions. This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.00579 [pdf, other]

A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)

Authors: Yashar Deldjoo, Zhankui He, Julian McAuley, Anton Korikov, Scott Sanner, Arnau Ramisa, René Vidal, Maheswaran Sathiamoorthy, Atoosa Kasirzadeh, Silvia Milano

Abstract: Traditional recommender systems (RS) typically use user-item rating histories as their main data source. However, deep generative models now have the capability to model and sample from complex data distributions, including user-item interactions, text, images, and videos, enabling novel recommendation tasks. This comprehensive, multidisciplinary survey connects key advancements in RS using Genera… ▽ More Traditional recommender systems (RS) typically use user-item rating histories as their main data source. However, deep generative models now have the capability to model and sample from complex data distributions, including user-item interactions, text, images, and videos, enabling novel recommendation tasks. This comprehensive, multidisciplinary survey connects key advancements in RS using Generative Models (Gen-RecSys), covering: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS. Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges. This survey accompanies a tutorial presented at ACM KDD'24, with supporting materials provided at: https://encr.pw/vDhLq. △ Less

Submitted 4 July, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: This survey accompanies a tutorial presented at ACM KDD'24

arXiv:2402.06811 [pdf, ps, other]

Discipline and Label: A WEIRD Genealogy and Social Theory of Data Annotation

Authors: Andrew Smart, Ding Wang, Ellis Monk, Mark Díaz, Atoosa Kasirzadeh, Erin Van Liemt, Sonja Schmer-Galunder

Abstract: Data annotation remains the sine qua non of machine learning and AI. Recent empirical work on data annotation has begun to highlight the importance of rater diversity for fairness, model performance, and new lines of research have begun to examine the working conditions for data annotation workers, the impacts and role of annotator subjectivity on labels, and the potential psychological harms from… ▽ More Data annotation remains the sine qua non of machine learning and AI. Recent empirical work on data annotation has begun to highlight the importance of rater diversity for fairness, model performance, and new lines of research have begun to examine the working conditions for data annotation workers, the impacts and role of annotator subjectivity on labels, and the potential psychological harms from aspects of annotation work. This paper outlines a critical genealogy of data annotation; starting with its psychological and perceptual aspects. We draw on similarities with critiques of the rise of computerized lab-based psychological experiments in the 1970's which question whether these experiments permit the generalization of results beyond the laboratory settings within which these results are typically obtained. Do data annotations permit the generalization of results beyond the settings, or locations, in which they were obtained? Psychology is overly reliant on participants from Western, Educated, Industrialized, Rich, and Democratic societies (WEIRD). Many of the people who work as data annotation platform workers, however, are not from WEIRD countries; most data annotation workers are based in Global South countries. Social categorizations and classifications from WEIRD countries are imposed on non-WEIRD annotators through instructions and tasks, and through them, on data, which is then used to train or evaluate AI models in WEIRD countries. We synthesize evidence from several recent lines of research and argue that data annotation is a form of automated social categorization that risks entrenching outdated and static social categories that are in reality dynamic and changing. We propose a framework for understanding the interplay of the global social conditions of data annotation with the subjective phenomenological experience of data annotation work. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 18 pages

arXiv:2401.07836 [pdf, ps, other]

Two Types of AI Existential Risk: Decisive and Accumulative

Authors: Atoosa Kasirzadeh

Abstract: The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects t… ▽ More The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects the serious possibility of AI x-risks manifesting incrementally through a series of smaller yet interconnected disruptions, gradually crossing critical thresholds over time. This paper contrasts the conventional "decisive AI x-risk hypothesis" with an "accumulative AI x-risk hypothesis." While the former envisions an overt AI takeover pathway, characterized by scenarios like uncontrollable superintelligence, the latter suggests a different causal pathway to existential catastrophes. This involves a gradual accumulation of critical AI-induced threats such as severe vulnerabilities and systemic erosion of econopolitical structures. The accumulative hypothesis suggests a boiling frog scenario where incremental AI risks slowly converge, undermining resilience until a triggering event results in irreversible collapse. Through systems analysis, this paper examines the distinct assumptions differentiating these two hypotheses. It is then argued that the accumulative view reconciles seemingly incompatible perspectives on AI risks. The implications of differentiating between these causal pathways -- the decisive and the accumulative -- for the governance of AI risks as well as long-term AI safety are discussed. △ Less

Submitted 6 February, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

arXiv:2307.05543 [pdf, ps, other]

Typology of Risks of Generative Text-to-Image Models

Authors: Charlotte Bird, Eddie L. Ungless, Atoosa Kasirzadeh

Abstract: This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerni… ▽ More This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney, through a comprehensive literature review. While these models offer unprecedented capabilities for generating images, their development and use introduce new types of risk that require careful consideration. Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed. We offer a taxonomy of risks across six key stakeholder groups, inclusive of unexplored issues, and suggest future research directions. We identify 22 distinct risk types, spanning issues from data bias to malicious use. The investigation presented here is intended to enhance the ongoing discourse on responsible model development and deployment. By highlighting previously overlooked risks and gaps, it aims to shape subsequent research and governance initiatives, guiding them toward the responsible, secure, and ethically conscious evolution of text-to-image models. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: Accepted for publication in 2023 AAAI/ACM Conference on AI, Ethics, and Society (AIES 2023)

arXiv:2306.01479 [pdf, ps, other]

Reconciling Governmental Use of Online Targeting With Democracy

Authors: Katja Andric, Atoosa Kasirzadeh

Abstract: The societal and epistemological implications of online targeted advertising have been scrutinized by AI ethicists, legal scholars, and policymakers alike. However, the government's use of online targeting and its consequential socio-political ramifications remain under-explored from a critical socio-technical standpoint. This paper investigates the socio-political implications of governmental onl… ▽ More The societal and epistemological implications of online targeted advertising have been scrutinized by AI ethicists, legal scholars, and policymakers alike. However, the government's use of online targeting and its consequential socio-political ramifications remain under-explored from a critical socio-technical standpoint. This paper investigates the socio-political implications of governmental online targeting, using a case study of the UK government's application of such techniques for public policy objectives. We argue that this practice undermines democratic ideals, as it engenders three primary concerns -- Transparency, Privacy, and Equality -- that clash with fundamental democratic doctrines and values. To address these concerns, the paper introduces a preliminary blueprint for an AI governance framework that harmonizes governmental use of online targeting with certain democratic principles. Furthermore, we advocate for the creation of an independent, non-governmental regulatory body responsible for overseeing the process and monitoring the government's use of online targeting, a critical measure for preserving democratic values. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: Accepted for publication in 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT 2023)

arXiv:2304.11163 [pdf, other]

ChatGPT, Large Language Technologies, and the Bumpy Road of Benefiting Humanity

Authors: Atoosa Kasirzadeh

Abstract: The allure of emerging AI technologies is undoubtedly thrilling. However, the promise that AI technologies will benefit all of humanity is empty so long as we lack a nuanced understanding of what humanity is supposed to be in the face of widening global inequality and pressing existential threats. Going forward, it is crucial to invest in rigorous and collaborative AI safety and ethics research. W… ▽ More The allure of emerging AI technologies is undoubtedly thrilling. However, the promise that AI technologies will benefit all of humanity is empty so long as we lack a nuanced understanding of what humanity is supposed to be in the face of widening global inequality and pressing existential threats. Going forward, it is crucial to invest in rigorous and collaborative AI safety and ethics research. We also need to develop standards in a sustainable and equitable way that differentiate between merely speculative and well-researched questions. Only the latter enable us to co-construct and deploy the values that are necessary for creating beneficial AI. Failure to do so could result in a future in which our AI technological advancements outstrip our ability to navigate their ethical and social implications. This path we do not want to go down. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: As part of a series on Dailynous : "Philosophers on next-generation large language models"

arXiv:2209.00731 [pdf, ps, other]

In conversation with Artificial Intelligence: aligning language models with human values

Authors: Atoosa Kasirzadeh, Iason Gabriel

Abstract: Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational… ▽ More Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values. △ Less

Submitted 21 December, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

Comments: Accepted for publication with minor revisions at Philosophy & Technology

arXiv:2206.00945 [pdf, ps, other]

doi 10.1145/3514094.3534188

Algorithmic Fairness and Structural Injustice: Insights from Feminist Political Philosophy

Authors: Atoosa Kasirzadeh

Abstract: Data-driven predictive algorithms are widely used to automate and guide high-stake decision making such as bail and parole recommendation, medical resource distribution, and mortgage allocation. Nevertheless, harmful outcomes biased against vulnerable groups have been reported. The growing research field known as 'algorithmic fairness' aims to mitigate these harmful biases. Its primary methodology… ▽ More Data-driven predictive algorithms are widely used to automate and guide high-stake decision making such as bail and parole recommendation, medical resource distribution, and mortgage allocation. Nevertheless, harmful outcomes biased against vulnerable groups have been reported. The growing research field known as 'algorithmic fairness' aims to mitigate these harmful biases. Its primary methodology consists in proposing mathematical metrics to address the social harms resulting from an algorithm's biased outputs. The metrics are typically motivated by -- or substantively rooted in -- ideals of distributive justice, as formulated by political and legal philosophers. The perspectives of feminist political philosophers on social justice, by contrast, have been largely neglected. Some feminist philosophers have criticized the paradigm of distributive justice and have proposed corrective amendments to surmount its limitations. The present paper brings some key insights of feminist political philosophy to algorithmic fairness. The paper has three goals. First, I show that algorithmic fairness does not accommodate structural injustices in its current scope. Second, I defend the relevance of structural injustices -- as pioneered in the contemporary philosophical literature by Iris Marion Young -- to algorithmic fairness. Third, I take some steps in developing the paradigm of 'responsible algorithmic fairness' to correct for errors in the current scope and implementation of algorithmic fairness. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: This paper is accepted for publication in the Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES 22)

arXiv:2112.04359 [pdf, other]

Ethical and social risks of harm from Language Models

Authors: Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel

Abstract: This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguist… ▽ More This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2109.06309 [pdf, ps, other]

doi 10.1145/3461702.3462528

Fairness and Data Protection Impact Assessments

Authors: Atoosa Kasirzadeh, Damian Clifford

Abstract: In this paper, we critically examine the effectiveness of the requirement to conduct a Data Protection Impact Assessment (DPIA) in Article 35 of the General Data Protection Regulation (GDPR) in light of fairness metrics. Through this analysis, we explore the role of the fairness principle as introduced in Article 5(1)(a) and its multifaceted interpretation in the obligation to conduct a DPIA. Our… ▽ More In this paper, we critically examine the effectiveness of the requirement to conduct a Data Protection Impact Assessment (DPIA) in Article 35 of the General Data Protection Regulation (GDPR) in light of fairness metrics. Through this analysis, we explore the role of the fairness principle as introduced in Article 5(1)(a) and its multifaceted interpretation in the obligation to conduct a DPIA. Our paper argues that although there is a significant theoretical role for the considerations of fairness in the DPIA process, an analysis of the various guidance documents issued by data protection authorities on the obligation to conduct a DPIA reveals that they rarely mention the fairness principle in practice. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Journal ref: AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

arXiv:2109.04083 [pdf, other]

doi 10.1145/3600211.3604669

User Tampering in Reinforcement Learning Recommender Systems

Authors: Charles Evans, Atoosa Kasirzadeh

Abstract: In this paper, we introduce new formal methods and provide empirical evidence to highlight a unique safety concern prevalent in reinforcement learning (RL)-based recommendation algorithms -- 'user tampering.' User tampering is a situation where an RL-based recommender system may manipulate a media user's opinions through its suggestions as part of a policy to maximize long-term user engagement. We… ▽ More In this paper, we introduce new formal methods and provide empirical evidence to highlight a unique safety concern prevalent in reinforcement learning (RL)-based recommendation algorithms -- 'user tampering.' User tampering is a situation where an RL-based recommender system may manipulate a media user's opinions through its suggestions as part of a policy to maximize long-term user engagement. We use formal techniques from causal modeling to critically analyze prevailing solutions proposed in the literature for implementing scalable RL-based recommendation systems, and we observe that these methods do not adequately prevent user tampering. Moreover, we evaluate existing mitigation strategies for reward tampering issues, and show that these methods are insufficient in addressing the distinct phenomenon of user tampering within the context of recommendations. We further reinforce our findings with a simulation study of an RL-based recommendation system focused on the dissemination of political content. Our study shows that a Q-learning algorithm consistently learns to exploit its opportunities to polarize simulated users with its early recommendations in order to have more consistent success with subsequent recommendations that align with this induced polarization. Our findings emphasize the necessity for developing safer RL-based recommendation systems and suggest that achieving such safety would require a fundamental shift in the design away from the approaches we have seen in the recent literature. △ Less

Submitted 24 July, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

Comments: In proceedings of the 6th AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES '23)

arXiv:2103.00752 [pdf, ps, other]

Reasons, Values, Stakeholders: A Philosophical Framework for Explainable Artificial Intelligence

Authors: Atoosa Kasirzadeh

Abstract: The societal and ethical implications of the use of opaque artificial intelligence systems for consequential decisions, such as welfare allocation and criminal justice, have generated a lively debate among multiple stakeholder groups, including computer scientists, ethicists, social scientists, policy makers, and end users. However, the lack of a common language or a multi-dimensional framework to… ▽ More The societal and ethical implications of the use of opaque artificial intelligence systems for consequential decisions, such as welfare allocation and criminal justice, have generated a lively debate among multiple stakeholder groups, including computer scientists, ethicists, social scientists, policy makers, and end users. However, the lack of a common language or a multi-dimensional framework to appropriately bridge the technical, epistemic, and normative aspects of this debate prevents the discussion from being as productive as it could be. Drawing on the philosophical literature on the nature and value of explanations, this paper offers a multi-faceted framework that brings more conceptual precision to the present debate by (1) identifying the types of explanations that are most pertinent to artificial intelligence predictions, (2) recognizing the relevance and importance of social and ethical values for the evaluation of these explanations, and (3) demonstrating the importance of these explanations for incorporating a diversified approach to improving the design of truthful algorithmic ecosystems. The proposed philosophical framework thus lays the groundwork for establishing a pertinent connection between the technical and ethical aspects of artificial intelligence systems. △ Less

Submitted 28 February, 2021; originally announced March 2021.

Comments: This paper is accepted for non-archival publication at the ACM conference on Fairness, Accountability, and Transparency (FAccT) 2021

arXiv:2102.05085 [pdf, ps, other]

The Use and Misuse of Counterfactuals in Ethical Machine Learning

Authors: Atoosa Kasirzadeh, Andrew Smart

Abstract: The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be considered are social categories such as race or gender. We review a broad body of papers from philosophy and social sciences on social ontology and… ▽ More The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be considered are social categories such as race or gender. We review a broad body of papers from philosophy and social sciences on social ontology and the semantics of counterfactuals, and we conclude that the counterfactual approach in machine learning fairness and social explainability can require an incoherent theory of what social categories are. Our findings suggest that most often the social categories may not admit counterfactual manipulation, and hence may not appropriately satisfy the demands for evaluating the truth or falsity of counterfactuals. This is important because the widespread use of counterfactuals in machine learning can lead to misleading results when applied in high-stakes domains. Accordingly, we argue that even though counterfactuals play an essential part in some causal inferences, their use for questions of algorithmic fairness and social explanations can create more problems than they resolve. Our positive result is a set of tenets about using counterfactuals for fairness and explanations in machine learning. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 9 pages, 1 table, 1 figure

arXiv:1910.13607 [pdf, ps, other]

Mathematical decisions and non-causal elements of explainable AI

Authors: Atoosa Kasirzadeh

Abstract: The social implications of algorithmic decision-making in sensitive contexts have generated lively debates among multiple stakeholders, such as moral and political philosophers, computer scientists, and the public. Yet, the lack of a common language and a conceptual framework for an appropriate bridging of the moral, technical, and political aspects of the debate prevents the discussion to be as e… ▽ More The social implications of algorithmic decision-making in sensitive contexts have generated lively debates among multiple stakeholders, such as moral and political philosophers, computer scientists, and the public. Yet, the lack of a common language and a conceptual framework for an appropriate bridging of the moral, technical, and political aspects of the debate prevents the discussion to be as effective as it can be. Social scientists and psychologists are contributing to this debate by gathering a wealth of empirical data, yet a philosophical analysis of the social implications of algorithmic decision-making remains comparatively impoverished. In attempting to address this lacuna, this paper argues that a hierarchy of different types of explanations for why and how an algorithmic decision outcome is achieved can establish the relevant connection between the moral and technical aspects of algorithmic decision-making. In particular, I offer a multi-faceted conceptual framework for the explanations and the interpretations of algorithmic decisions, and I claim that this framework can lay the groundwork for a focused discussion among multiple stakeholders about the social implications of algorithmic decision-making, as well as AI governance and ethics more generally. △ Less

Submitted 12 December, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: A shorter version of this paper was presented at the NeurIPS 2019, Human-Centric Machine Learning Workshop

Showing 1–17 of 17 results for author: Kasirzadeh, A