Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques

Duong, Manh Khoi; Conrad, Stefan

doi:10.1007/978-3-031-68323-7_33

Computer Science > Machine Learning

arXiv:2405.12926 (cs)

[Submitted on 21 May 2024 (v1), last revised 19 Sep 2024 (this version, v3)]

Title:Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques

Authors:Manh Khoi Duong, Stefan Conrad

View PDF HTML (experimental)

Abstract:In this paper, we deal with bias mitigation techniques that remove specific data points from the training set to aim for a fair representation of the population in that set. Machine learning models are trained on these pre-processed datasets, and their predictions are expected to be fair. However, such approaches may exclude relevant data, making the attained subsets less trustworthy for further usage. To enhance the trustworthiness of prior methods, we propose additional requirements and objectives that the subsets must fulfill in addition to fairness: (1) group coverage, and (2) minimal data loss. While removing entire groups may improve the measured fairness, this practice is very problematic as failing to represent every group cannot be considered fair. In our second concern, we advocate for the retention of data while minimizing discrimination. By introducing a multi-objective optimization problem that considers fairness and data loss, we propose a methodology to find Pareto-optimal solutions that balance these objectives. By identifying such solutions, users can make informed decisions about the trade-off between fairness and data quality and select the most suitable subset for their application. Our method is distributed as a Python package via PyPI under the name FairDo (this https URL).

Comments:	The Version of Record of this contribution is published in Springer LNCS 14912 and is available online at this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.12926 [cs.LG]
	(or arXiv:2405.12926v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.12926
Journal reference:	Lecture Notes in Computer Science, Vol. 14912 (2024), pp. 375-380. Springer
Related DOI:	https://doi.org/10.1007/978-3-031-68323-7_33

Submission history

From: Manh Khoi Duong [view email]
[v1] Tue, 21 May 2024 16:51:28 UTC (88 KB)
[v2] Tue, 11 Jun 2024 14:22:14 UTC (88 KB)
[v3] Thu, 19 Sep 2024 11:31:09 UTC (93 KB)

Computer Science > Machine Learning

Title:Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators