research-article

Consistent Range Approximation for Fair Predictive Modeling

Authors:

Sainyam Galhotra,

Babak SalimiAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 16, Issue 11

Pages 2925 - 2938

https://doi.org/10.14778/3611479.3611498

Published: 01 July 2023 Publication History

Abstract

This paper proposes a novel framework for certifying the fairness of predictive models trained on biased data. It draws from query answering for incomplete and inconsistent databases to formulate the problem of consistent range approximation (CRA) of fairness queries for a predictive model on a target population. The framework employs background knowledge of the data collection process and biased data, working with or without limited statistics about the target population, to compute a range of answers for fairness queries. Using CRA, the framework builds predictive models that are certifiably fair on the target population, regardless of the availability of external data during training. The framework's efficacy is demonstrated through evaluations on real data, showing substantial improvement over existing state-of-the-art methods.

References

[1]

2022. CRAB Code. https://github.com/lodino/Crab.

[2]

ACP. [n.d.]. Racial and Ethnic Disparities in Health Care. https://www.acponline.org/acp_policy/policies/racial_ethnic_disparities_2010.pdf.

[3]

Carolyn Ashurst, Ryan Carey, Silvia Chiappa, and Tom Everitt. 2022. Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness. arXiv preprint arXiv:2202.10816 (2022).

[4]

Agathe Balayn, Christoph Lofi, and Geert-Jan Houben. 2021. Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems. The VLDB Journal 30, 5 (2021), 739--768.

Digital Library

[5]

Elias Bareinboim and Judea Pearl. 2012. Controlling selection bias in causal inference. In Artificial Intelligence and Statistics. PMLR, 100--108.

[6]

Ainhize Barrainkua, Paula Gordaliza, Jose A Lozano, and Novi Quadrianto. 2022. A Survey on Preserving Fairness Guarantees in Changing Environments. arXiv preprint arXiv:2211.07530 (2022).

[7]

Peter L Bartlett, Michael I Jordan, and Jon D McAuliffe. 2006. Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101, 473 (2006), 138--156.

[8]

Yahav Bechavod and Katrina Ligett. 2017. Learning fair classifiers: A regularization-inspired approach. (2017).

[9]

Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilovic, et al. 2018. AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias. arXiv preprint arXiv:1810.01943 (2018).

[10]

Rachel KE Bellamy, Kuntal Dey, Michael Hind, Samuel C Hoffman, Stephanie Houde, Kalapriya Kannan, Pranay Lohia, Jacquelyn Martino, Sameep Mehta, Aleksandra Mojsilović, et al. 2019. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM Journal of Research and Development 63, 4/5 (2019), 4--1.

[11]

Richard Berk, Hoda Heidari, Shahin Jabbari, Matthew Joseph, Michael Kearns, Jamie Morgenstern, Seth Neel, and Aaron Roth. 2017. A convex framework for fair regression. arXiv preprint arXiv:1706.02409 (2017).

[12]

Leopoldo Bertossi. 2006. Consistent query answering in databases. ACM Sigmod Record 35, 2 (2006), 68--76.

Digital Library

[13]

Jelke Bethlehem. 2010. Selection bias in web surveys. International statistical review 78, 2 (2010), 161--188.

[14]

Avrim Blum and Kevin Stangl. 2019. Recovering from biased data: Can fairness constraints improve accuracy? arXiv preprint arXiv:1912.01094 (2019).

[15]

Toon Calders and Sicco Verwer. 2010. Three naive bayes approaches for discrimination-free classification. Data mining and knowledge discovery 21, 2 (2010), 277--292.

[16]

Kenzie A Cameron, Jing Song, Larry M Manheim, and Dorothy D Dunlop. 2010. Gender disparities in health and healthcare use among older adults. Journal of women's health 19, 9 (2010), 1643--1650.

[17]

Yatong Chen, Reilly Raab, Jialu Wang, and Yang Liu. 2022. Fairness Transferability Subject to Bounded Distribution Shift. arXiv preprint arXiv:2206.00129 (2022).

[18]

Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining. 797--806.

Digital Library

[19]

Corinna Cortes, Mehryar Mohri, Michael Riley, and Afshin Rostamizadeh. 2008. Sample selection bias correction theory. In International conference on algorithmic learning theory. Springer, 38--53.

Digital Library

[20]

Aron Culotta. 2014. Reducing sampling bias in social media data for county health inference. In Joint Statistical Meetings Proceedings. Citeseer, 1--12.

[21]

Shai Ben David, Tyler Lu, Teresa Luu, and Dávid Pál. 2010. Impossibility theorems for domain adaptation. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 129--136.

[22]

Akhil A Dixit and Phokion G Kolaitis. 2021. Consistent answers of aggregation queries using SAT solvers. arXiv preprint arXiv:2103.03314 (2021).

[23]

Wei Du and Xintao Wu. 2021. Robust fairness-aware learning under sample selection bias. arXiv preprint arXiv:2105.11570 (2021).

[24]

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214--226.

Digital Library

[25]

Wenfei Fan and Floris Geerts. 2012. Foundations of data quality management. Synthesis Lectures on Data Management 4, 5 (2012), 1--217.

[26]

Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259--268.

Digital Library

[27]

Sainyam Galhotra, Karthikeyan Shanmugam, Prasanna Sattigeri, and Kush R Varshney. 2022. Causal Feature Selection for Algorithmic Fairness. (2022).

[28]

Andrew Gelman, Jeffrey Fagan, and Alex Kiss. 2007. An analysis of the New York City police department's "stop-and-frisk" policy in the context of claims of racial bias. Journal of the American statistical association 102, 479 (2007), 813--823.

[29]

Stephen Giguere, Blossom Metevier, Bruno Castro da Silva, Yuriy Brun, Philip Thomas, and Scott Niekum. 2022. Fairness guarantees under demographic shift. In International Conference on Learning Representations.

[30]

Naman Goel, Alfonso Amayuelas, Amit Deshpande, and Amit Sharma. 2020. The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective. arXiv preprint arXiv:2012.11448 (2020).

[31]

Naman Goel, Alfonso Amayuelas, Amit Deshpande, and Amit Sharma. 2021. The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 7564--7573.

[32]

Zerrin Asan Greenacre et al. 2016. The importance of selection bias in internet surveys. Open Journal of Statistics 6, 03 (2016), 397.

[33]

Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of opportunity in supervised learning. Advances in neural information processing systems 29 (2016), 3315--3323.

[34]

Jiayuan Huang, Arthur Gretton, Karsten Borgwardt, Bernhard Schölkopf, and Alex Smola. 2006. Correcting sample selection bias by unlabeled data. Advances in neural information processing systems 19 (2006).

[35]

Maliha Tashfia Islam, Anna Fariha, Alexandra Meliou, and Babak Salimi. 2022. Through the data management lens: Experimental analysis and evaluation of fair classification. In Proceedings of the 2022 International Conference on Management of Data. 232--246.

Digital Library

[36]

Zhongjun Jin, Mengjing Xu, Chenkai Sun, Abolfazl Asudeh, and HV Jagadish. 2020. Mithracoverage: a system for investigating population bias for intersectional fairness. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2721--2724.

Digital Library

[37]

Nathan Kallus and Angela Zhou. 2018. Residual unfairness in fair machine learning from prejudiced data. In International Conference on Machine Learning. PMLR, 2439--2448.

[38]

Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and information systems 33, 1 (2012), 1--33.

[39]

Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. 2012. Fairness-aware classifier with prejudice remover regularizer. In Joint European conference on machine learning and knowledge discovery in databases. Springer, 35--50.

[40]

Michelle E Kho, Mark Duffett, Donald J Willison, Deborah J Cook, and Melissa C Brouwers. 2009. Written informed consent and selection bias in observational studies using medical records: systematic review. Bmj 338 (2009).

[41]

Nikola Konstantinov and Christoph H Lampert. 2021. Fairness Through Regu-larization for Learning to Rank. arXiv preprint arXiv:2102.05996 (2021).

[42]

Nikita Kozodoi, Johannes Jacob, and Stefan Lessmann. 2022. Fairness in credit scoring: Assessment, implementation and profit implications. European Journal of Operational Research 297, 3 (2022), 1083--1094.

[43]

James E Lange, Mark B Johnson, and Robert B Voas. 2005. Testing the racial profiling hypothesis for seemingly disparate traffic stops on the New Jersey Turnpike. Justice Quarterly 22, 2 (2005), 193--223.

[44]

Anqi Liu and Brian Ziebart. 2014. Robust classification under sample selection bias. Advances in neural information processing systems 27 (2014).

[45]

David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learning adversarially fair and transferable representations. In International Conference on Machine Learning. PMLR, 3384--3393.

[46]

Subha Maity, Debarghya Mukherjee, Mikhail Yurochkin, and Yuekai Sun. 2021. Does enforcing fairness mitigate biases caused by subpopulation shift? Advances in Neural Information Processing Systems 34 (2021), 25773--25784.

[47]

Anna Meyer, Aws Albarghouthi, and Loris D'Antoni. 2021. Certifying Robustness to Programmable Data Bias in Decision Trees. Advances in Neural Information Processing Systems 34 (2021).

[48]

Alan Mishler and Niccolò Dalmasso. 2022. Fair When Trained, Unfair When Deployed: Observable Fairness Measures are Unstable in Performative Prediction Settings. arXiv preprint arXiv:2202.05049 (2022).

[49]

Darie Moldovan. 2022. Algorithmic decision making methods for fair credit scoring.

[50]

Jose G Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern recognition 45, 1 (2012), 521--530.

[51]

Debarghya Mukherjee, Felix Petersen, Mikhail Yurochkin, and Yuekai Sun. 2022. Domain Adaptation meets Individual Fairness. And they get along. arXiv preprint arXiv:2205.00504 (2022).

[52]

Razieh Nabi and Ilya Shpitser. 2018. Fair inference on outcomes. In Proceedings of the... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, Vol. 2018. NIH Public Access, 1931.

[53]

Jorge Nocedal and Stephen J Wright. 1999. Numerical optimization. Springer.

[54]

Laurel Orr, Samuel Ainsworth, Walter Cai, Kevin Jamieson, Magda Balazinska, and Dan Suciu. 2019. Mosaic: a sample-based database system for open world query processing. arXiv preprint arXiv:1912.07777 (2019).

[55]

Laurel Orr, Magdalena Balazinska, and Dan Suciu. 2020. Sample debiasing in the themis open world database system. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 257--268.

Digital Library

[56]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).

[57]

Judea Pearl. 2009. Causality. Cambridge university press.

[58]

Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. 2017. On fairness and calibration. Advances in neural information processing systems 30 (2017).

[59]

Ashkan Rezaei. 2020. Implementation for Robust Fairness Under Covariate Shift. https://github.com/arezae4/fair_covariate_shift.

[60]

Ashkan Rezaei, Anqi Liu, Omid Memarrast, and Brian Ziebart. 2020. Robust Fairness under Covariate Shift. arXiv preprint arXiv:2010.05166 (2020).

[61]

Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. 2019. Capuchin: Causal database repair for algorithmic fairness. arXiv preprint arXiv:1902.08283 (2019).

[62]

Babak Salimi, Luke Rodriguez, Bill Howe, and Dan Suciu. 2019. Interventional fairness: Causal database repair for algorithmic fairness. In Proceedings of the 2019 International Conference on Management of Data. 793--810.

Digital Library

[63]

Jessica Schrouff, Natalie Harris, Oluwasanmi Koyejo, Ibrahim Alabdulmohsin, Eva Schnider, Krista Opsahl-Ong, Alex Brown, Subhrajit Roy, Diana Mincu, Christina Chen, et al. 2022. Maintaining fairness across distribution shift: do we have viable solutions for real-world applications? arXiv preprint arXiv:2202.01034 (2022).

[64]

Candice Schumann, Xuezhi Wang, Alex Beutel, Jilin Chen, Hai Qian, and Ed H Chi. 2019. Transfer of machine learning fairness across domains. arXiv preprint arXiv:1906.09688 (2019).

[65]

Patrick Schwab, August DuMont Schütte, Benedikt Dietz, Stefan Bauer, et al. 2020. Clinical predictive models for COVID-19: systematic study. Journal of medical Internet research 22, 10 (2020), e21439.

[66]

Nima Shahbazi, Yin Lin, Abolfazl Asudeh, and HV Jagadish. 2023. Representation Bias in Data: A Survey on Identification and Resolution Techniques. Comput. Surveys (2023).

[67]

Harvineet Singh, Rina Singh, Vishwali Mhasawade, and Rumi Chunara. 2021. Fairness violations and mitigation under covariate shift. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 3--13.

Digital Library

[68]

Alice E Smith, David W Coit, Thomas Baeck, David Fogel, and Zbigniew Michalewicz. 1997. Penalty functions. Handbook of evolutionary computation 97, 1 (1997), C5.

[69]

Alexander Stevens, Peter Deruyck, Ziboud Van Veldhoven, and Jan Vanthienen. 2020. Explainability and Fairness in Machine Learning: Improve Fair End-to-end Lending for Kiva. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 1241--1248.

[70]

Adarsh Subbaswamy, Peter Schulam, and Suchi Saria. 2019. Preventing failures due to dataset shift: Learning predictive models that transport. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 3118--3127.

[71]

Masashi Sugiyama, Matthias Krauledat, and Klaus-Robert Müller. 2007. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research 8, 5 (2007).

[72]

Bahar Taskesen, Viet Anh Nguyen, Daniel Kuhn, and Jose Blanchet. 2020. A distributionally robust approach to fair classification. arXiv preprint arXiv:2007.09530 (2020).

[73]

https://statisticalatlas.com/United-States/Overview. 2018. the demographic statistical atlas of the united states - statistical atlas 2018. Statisticalatlas (2018).

[74]

Yijie Wang, Viet Anh Nguyen, and Grani A Hanasusanto. 2021. Wasserstein robust classification with fairness constraints. arXiv preprint arXiv:2103.06828 (2021).

[75]

Yanchen Wang and Lisa Singh. 2021. Analyzing the impact of missing values and selection bias on fairness. International Journal of Data Science and Analytics 12, 2 (2021), 101--119.

[76]

Blake Woodworth, Suriya Gunasekar, Mesrob I. Ohannessian, and Nathan Srebro. 2017. Learning Non-Discriminatory Predictors. In Proceedings of the 2017 Conference on Learning Theory (Proceedings of Machine Learning Research), Satyen Kale and Ohad Shamir (Eds.), Vol. 65. PMLR, Amsterdam, Netherlands, 1920--1953. http://proceedings.mlr.press/v65/woodworth17a.html

[77]

Yifan Wu. 2021. Learning to Predict and Make Decisions under Distribution Shift. Ph.D. Dissertation. Carnegie Mellon University.

[78]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2017. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1171--1180.

Digital Library

[79]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. 2017. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th international conference on world wide web. 1171--1180.

Digital Library

[80]

Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. 2017. Fairness constraints: Mechanisms for fair classification. In Artificial intelligence and statistics. PMLR, 962--970.

[81]

Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International conference on machine learning. PMLR, 325--333.

[82]

Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 335--340.

Digital Library

[83]

Hantian Zhang, Xu Chu, Abolfazl Asudeh, and Shamkant B Navathe. 2021. Omnifair: A declarative system for model-agnostic group fairness in machine learning. In Proceedings of the 2021 International Conference on Management of Data. 2076--2088.

Digital Library

[84]

Junzhe Zhang and Elias Bareinboim. 2018. Equality of opportunity in classification: A causal approach. Advances in Neural Information Processing Systems 31 (2018).

[85]

Jie M Zhang and Mark Harman. 2021. " Ignorance and Prejudice" in Software Fairness. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1436--1447.

Digital Library

[86]

Xuezhou Zhang, Xiaojin Zhu, and Stephen Wright. 2018. Training set Debugging using Trusted Items. In Proceedings of The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI).

[87]

Yiliang Zhang and Qi Long. 2021. Assessing Fairness in the Presence of Missing Data. Advances in Neural Information Processing Systems 34 (2021).

[88]

Zirun Zhao, Anne Chen, Wei Hou, James M Graham, Haifang Li, Paul S Richman, Henry C Thode, Adam J Singer, and Tim Q Duong. 2020. Prediction model and risk scores of ICU admission and mortality in COVID-19. PloS one 15, 7 (2020), e0236618.

[89]

Jiongli Zhu, Nazanin Sabri, Sainyam Galhotra, and Babak Salimi. 2022. Crab: Learning Certifiably Fair Predictive Models in the Presence of Selection Bias. arXiv preprint arXiv:2212.10839 (2022).

Cited By

Recommendations

First-order under-approximations of consistent query answers

Consistent Query Answering (CQA) is a principled approach for answering queries on inconsistent databases. The consistent answer to a query q on an inconsistent database db is the intersection of the answers to q on all repairs, where a repair is any ...
A Markov chain based pruning method for predictive range queries
SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

Predictive range queries retrieve objects in a certain spatial region at a (future) prediction time. Processing predictive range queries on large moving object databases is expensive. Thus effective pruning is important, especially for long-term ...
Improving range-sum query evaluation on data cubes via polynomial approximation

Inefficient query answering is the main drawback in Decision Support Systems (DSS), due to the very large size of the multidimensional data stored in the underlying Data Warehouse Server (DWS). Aggregate queries are the most frequent and useful kind for ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 16, Issue 11

July 2023

789 pages

ISSN:2150-8097

Editors:
Georgia Koutrika
Athena Research Center
,
Jun Yang
Duke University

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2023

Published in PVLDB Volume 16, Issue 11

Check for updates

Badges

Artifacts Available / v1.1

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
101
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)9

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents