skip to main content
10.1145/3583780.3615040acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

RoCourseNet: Robust Training of a Prediction Aware Recourse Model

Published: 21 October 2023 Publication History

Abstract

Counterfactual (CF) explanations for machine learning (ML) models are preferred by end-users, as they explain the predictions of ML models by providing a recourse (or contrastive) case to individuals who are adversely impacted by predicted outcomes. Existing CF explanation methods generate recourses under the assumption that the underlying target ML model remains stationary over time. However, due to commonly occurring distributional shifts in training data, ML models constantly get updated in practice, which might render previously generated recourses invalid and diminish end-users trust in our algorithmic framework. To address this problem, we propose RoCourseNet, a training framework that jointly optimizes predictions and recourses that are robust to future data shifts. This work contains four key contributions: (1) We formulate the robust recourse generation problem as a tri-level optimization problem which consists of two sub-problems: (i) a bi-level problem that finds the worst-case adversarial shift in the training data, and (ii) an outer minimization problem to generate robust recourses against this worst-case shift. (2) We leverage adversarial training to solve this tri-level optimization problem by: (i) proposing a novel virtual data shift (VDS) algorithm to find worst-case shifted ML models via explicitly considering the worst-case data shift in the training dataset, and (ii) a block-wise coordinate descent procedure to optimize for prediction and corresponding robust recourses. (3) We evaluate RoCourseNet's performance on three real-world datasets, and show that RoCourseNet consistently achieves more than 96% robust validity and outperforms state-of-the-art baselines by at least 10% in generating robust CF explanations. (4) Finally, we generalize the RoCourseNet framework to accommodate any parametric post-hoc methods for improving robust validity.

References

[1]
David Alvarez-Melis and Tommi S Jaakkola. 2018. On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049 (2018).
[2]
Arthur Asuncion and David Newman. 2007. UCI machine learning repository.
[3]
Solon Barocas, Andrew D Selbst, and Manish Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 80--89.
[4]
Richard Bellman. 1961. Adaptive Control Processes. Princeton University Press.
[5]
P Bickel, P Diggle, S Fienberg, U Gather, I Olkin, and S Zeger. 2009. Springer series in statistics. Springer.
[6]
Emily Black, Zifan Wang, and Matt Fredrikson. 2022. Consistent Counterfactuals for Deep Models. In International Conference on Learning Representations. https://openreview.net/forum?id=St6eyiTEHnG
[7]
Qi-Zhi Cai, Chang Liu, and Dawn Song. 2018. Curriculum adversarial training. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3740--3747.
[8]
Jinghui Chen, Yu Cheng, Zhe Gan, Quanquan Gu, and Jingjing Liu. 2022. Efficient robust training via backward smoothing. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
[9]
Paulo Cortez and Alice Maria Goncc alves Silva. 2008. Using data mining to predict secondary school student performance. (2008).
[10]
Jordan Jimmy Crabbe. 2013. Handling the curse of dimensionality in multivariate kernel density estimation. Oklahoma State University.
[11]
Kedar Dhamdhere, Mukund Sundararajan, and Qiqi Yan. 2018. How important is a neuron? arXiv preprint arXiv:1805.12233 (2018).
[12]
Ricardo Dominguez-Olmedo, Amir H Karimi, and Bernhard Schölkopf. 2022. On the adversarial robustness of causal algorithmic recourse. In International Conference on Machine Learning. PMLR, 5324--5342.
[13]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning. PMLR, 1126--1135.
[14]
Hidde Fokkema, Rianne de Heide, and Tim van Erven. 2022. Attribution-based Explanations that Provide Recourse Cannot be Robust. arXiv preprint arXiv:2205.15834 (2022).
[15]
Yinghua Gao, Dongxian Wu, Jingfeng Zhang, Guanhao Gan, Shu-Tao Xia, Gang Niu, and Masashi Sugiyama. 2022. On the Effectiveness of Adversarial Training against Backdoor Attacks. arXiv preprint arXiv:2202.10627 (2022).
[16]
Jonas Geiping, Liam Fowl, Gowthami Somepalli, Micah Goldblum, Michael Moeller, and Tom Goldstein. 2021. What Doesn't Kill You Makes You Robust (er): Adversarial Training against Poisons and Backdoors. arXiv preprint arXiv:2102.13624 (2021).
[17]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[18]
Alex Gu, Songtao Lu, Parikshit Ram, and Lily Weng. 2022. Min-Max Bilevel Multi-objective Optimization with Applications in Machine Learning. arXiv preprint arXiv:2203.01924 (2022).
[19]
Hangzhi Guo, Thanh H. Nguyen, and Amulya Yadav. 2023. CounterNet: End-to-End Training of Prediction Aware Counterfactual Explanations. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA) (KDD '23). Association for Computing Machinery, New York, NY, USA, 577--589. https://doi.org/10.1145/3580305.3599290
[20]
W Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, and Tom Goldstein. 2020. Metapoison: Practical general-purpose clean-label data poisoning. Advances in Neural Information Processing Systems, Vol. 33 (2020), 12080--12091.
[21]
Amir-Hossein Karimi, Gilles Barthe, Bernhard Schölkopf, and Isabel Valera. 2020. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050 (2020).
[22]
Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. 2021. Algorithmic recourse: from counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 353--362.
[23]
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. 2018. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning. PMLR, 2668--2677.
[24]
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International conference on machine learning. PMLR, 1885--1894.
[25]
Thibault Laugel, Xavier Renard, Marie-Jeanne Lesot, Christophe Marsala, and Marcin Detyniecki. 2018. Defining locality for surrogates in post-hoc interpretablity. arXiv preprint arXiv:1806.07498 (2018).
[26]
Klas Leino, Shayak Sen, Anupam Datta, Matt Fredrikson, and Linyi Li. 2018. Influence-directed explanations for deep convolutional networks. In 2018 IEEE International Test Conference (ITC). IEEE, 1--8.
[27]
Min Li, Amy Mickel, and Stanley Taylor. 2018. ?Should This Loan be Approved or Denied?": A Large Dataset with Class Assignment Guidelines. Journal of Statistics Education, Vol. 26, 1 (2018), 55--66.
[28]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765--4774.
[29]
Dougal Maclaurin, David Duvenaud, and Ryan Adams. 2015. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning. PMLR, 2113--2122.
[30]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).
[31]
Divyat Mahajan, Chenhao Tan, and Amit Sharma. 2019. Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277 (2019).
[32]
Saumitra Mishra, Sanghamitra Dutta, Jason Long, and Daniele Magazzeni. 2021. A survey on the robustness of feature importance and counterfactual explanations. arXiv preprint arXiv:2111.00358 (2021).
[33]
Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 607--617.
[34]
Thomas Nagler and Claudia Czado. 2016. Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas. Journal of Multivariate Analysis, Vol. 151 (2016), 69--89.
[35]
Tuan-Duy Hien Nguyen, Ngoc Bui, Duy Nguyen, Man-Chung Yue, and Viet Anh Nguyen. 2022. Robust Bayesian Recourse. In The 38th Conference on Uncertainty in Artificial Intelligence.
[36]
Martin Pawelczyk, Klaus Broelemann, and Gjergji Kasneci. 2020a. Learning model-agnostic counterfactual explanations for tabular data. In Proceedings of The Web Conference 2020. 3126--3132.
[37]
Martin Pawelczyk, Klaus Broelemann, and Gjergji Kasneci. 2020b. On counterfactual explanations under predictive multiplicity. In Conference on Uncertainty in Artificial Intelligence. PMLR, 809--818.
[38]
Kaivalya Rawal, Ece Kamar, and Himabindu Lakkaraju. 2020. Algorithmic Recourse in the Wild: Understanding the Impact of Data and Model Shifts. arXiv preprint arXiv:2012.11788 (2020).
[39]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.
[40]
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, Vol. 1, 5 (2019), 206--215.
[41]
Aniruddha Saha, Akshayvarun Subramanya, and Hamed Pirsiavash. 2020. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 11957--11965.
[42]
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. 2018. Adversarially robust generalization requires more data. Advances in neural information processing systems, Vol. 31 (2018).
[43]
Amirreza Shaban, Ching-An Cheng, Nathan Hatch, and Byron Boots. 2019. Truncated back-propagation for bilevel optimization. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 1723--1732.
[44]
Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. 2019. Adversarial training for free! Advances in Neural Information Processing Systems, Vol. 32 (2019).
[45]
Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 180--186.
[46]
Ilia Stepin, Jose M Alonso, Alejandro Catala, and Mart'in Pereira-Fari na. 2021. A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access, Vol. 9 (2021), 11974--12001.
[47]
Sohini Upadhyay, Shalmali Joshi, and Himabindu Lakkaraju. 2021. Towards robust and reliable algorithmic recourse. Advances in Neural Information Processing Systems, Vol. 34 (2021), 16926--16937.
[48]
Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 10--19.
[49]
Arnaud Van Looveren and Janis Klaise. 2019. Interpretable counterfactual explanations guided by prototypes. arXiv preprint arXiv:1907.02584 (2019).
[50]
Sahil Verma, John Dickerson, and Keegan Hines. 2020. Counterfactual Explanations for Machine Learning: A Review. arXiv preprint arXiv:2010.10596 (2020).
[51]
Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech., Vol. 31 (2017), 841.
[52]
Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. 2019. On the Convergence and Robustness of Adversarial Training. In International Conference on Machine Learning. PMLR, 6586--6595.
[53]
Eric Wong, Leslie Rice, and J Zico Kolter. 2019. Fast is better than free: Revisiting adversarial training. In International Conference on Learning Representations.
[54]
Fan Yang, Sahan Suresh Alva, Jiahao Chen, and Xia Hu. 2021. Model-Based Counterfactual Synthesizer for Interpretation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Virtual Event, Singapore) (KDD '21). Association for Computing Machinery, New York, NY, USA, 1964--1974. https://doi.org/10.1145/3447548.3467333

Cited By

View all

Index Terms

  1. RoCourseNet: Robust Training of a Prediction Aware Recourse Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
    October 2023
    5508 pages
    ISBN:9798400701245
    DOI:10.1145/3583780
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adversarial machine learning
    2. algorithmic recourse
    3. counterfactual explanation
    4. explainable artificial intelligence
    5. interpretability

    Qualifiers

    • Research-article

    Funding Sources

    • ARO

    Conference

    CIKM '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)75
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media