Article

Learning representations for counterfactual inference

Authors:

Fredrik D. Johansson,

David SontagAuthors Info & Claims

ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

Pages 3020 - 3029

Published: 19 June 2016 Publication History

Abstract

Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, "Would this patient have lower blood sugar had she received a different medication?". We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art.

References

[1]

Austin, Peter C. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research, 46(3): 399-424, 2011.

[2]

Bang, Heejung and Robins, James M. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962-973, 2005.

[3]

Ben-David, Shai, Blitzer, John, Crammer, Koby, Pereira, Fernando, et al. Analysis of representations for domain adaptation. Advances in neural information processing systems, 19:137, 2007.

[4]

Bengio, Yoshua, Courville, Aaron, and Vincent, Pierre. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798-1828, 2013.

[5]

Beygelzimer, Alina, Langford, John, Li, Lihong, Reyzin, Lev, and Schapire, Robert E. Contextual bandit algorithms with supervised learning guarantees. arXiv preprint arXiv:1002.4058, 2010.

[6]

Bottou, Léon, Peters, Jonas, Quinonero-Candela, Joaquin, Charles, Denis X, Chickering, D Max, Portugaly, Elon, Ray, Dipankar, Simard, Patrice, and Snelson, Ed. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research, 14(1):3207-3260, 2013.

[7]

Chernozhukov, Victor, Fernández-Val, Iván, and Melly, Blaise. Inference on counterfactual distributions. Econometrica, 81(6):2205-2268, 2013.

[8]

Chipman, Hugh and McCulloch, Robert. BayesTree: Bayesian additive regression trees. https://cran.r-project.org/package=BayesTree/, 2016. Accessed: 2016-01-30.

[9]

Chipman, Hugh A, George, Edward I, and McCulloch, Robert E. Bart: Bayesian additive regression trees. The Annals of Applied Statistics, pp. 266-298, 2010.

[10]

Cortes, Corinna and Mohri, Mehryar. Domain adaptation and sample bias correction theory and algorithm for regression. Theoretical Computer Science, 519:103-126, 2014.

[11]

Daume III, Hal and Marcu, Daniel. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, pp. 101-126, 2006.

[12]

Dorie, Vincent. NPCI: Non-parametrics for causal inference. https://github.com/vdorie/npci, 2016. Accessed: 2016-01-30.

[13]

Dudík, Miroslav, Langford, John, and Li, Lihong. Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601, 2011.

[14]

Gani, Yaroslav, Ustinova, Evgeniya, Ajakan, Hana, Germain, Pascal, Larochelle, Hugo, Laviolette, François, Marchand, Mario, and Lempitsky, Victor. Domain-adversarial training of neural networks. arXiv preprint arXiv:1505.07818, 2015.

[15]

Gelman, Andrew and Hill, Jennifer. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, 2006.

[16]

Gretton, Arthur, Borgwardt, Karsten M., Rasch, Malte J., Schölkopf, Bernhard, and Smola, Alexander. A kernel two-sample test. J. Mach. Learn. Res., 13:723-773, March 2012. ISSN 1532-4435.

[17]

Hill, Jennifer L. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1), 2011.

[18]

Jiang, Jing. A literature survey on domain adaptation of statistical classifiers. Technical report, University of Illinois at Urbana-Champaign, 2008.

[19]

Kang, Joseph DY and Schafer, Joseph L. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical science, pp. 523-539, 2007.

[20]

Langford, John, Li, Lihong, and Dudík, Miroslav. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1097-1104, 2011.

[21]

Lewis, David. Causation. The journal of philosophy, pp. 556-567, 1973.

[22]

Louizos, Christos, Swersky, Kevin, Li, Yujia, Welling, Max, and Zemel, Richard. The variational fair auto encoder. arXiv preprint arXiv:1511.00830, 2015.

[23]

Mansour, Yishay, Mohri, Mehryar, and Rostamizadeh, Afshin. Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430, 2009.

[24]

Morgan, Stephen L and Winship, Christopher. Counterfactuals and causal inference. Cambridge University Press, 2014.

[25]

Newman, David. Bag of words data set. https://archive.ics.uci.edu/ml/datasets/Bag+of+Words, 2008.

[26]

Pearl, Judea. Causality. Cambridge university press, 2009.

[27]

Pearl, Judea. Invited commentary: understanding bias amplification. American journal of epidemiology, 174(11): 1223-1227, 2011.

[28]

Prentice, Ross. Use of the logistic model in retrospective studies. Biometrics, pp. 599-606, 1976.

[29]

Robins, James M, Hernan, Miguel Angel, and Brumback, Babette. Marginal structural models and causal inference in epidemiology. Epidemiology, pp. 550-560, 2000.

[30]

Rosenbaum, Paul R. Observational studies. Springer, 2002.

[31]

Rosenbaum, Paul R. Design of Observational Studies. Springer Science & Business Media, 2009.

[32]

Rosenbaum, Paul R and Rubin, Donald B. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41-55, 1983.

[33]

Rubin, Donald B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688, 1974.

[34]

Rubin, Donald B. Causal inference using potential outcomes. Journal of the American Statistical Association, 2011.

[35]

Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. On causal and anticausal learning. In Proceedings of the 29th International Conference on Machine Learning, pp. 1255-1262, New York, NY, USA, 2012. Omnipress.

[36]

Strehl, Alex, Langford, John, Li, Lihong, and Kakade, Sham M. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems, pp. 2217-2225, 2010.

[37]

Sutton, Richard S and Barto, Andrew G. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.

[38]

Swaminathan, Adith and Joachims, Thorsten. Batch learning from logged bandit feedback through counterfactual risk minimization. Journal of Machine Learning Research, 16:1731-1755, 2015.

[39]

Tian, Lu, Alizadeh, Ash A, Gentles, Andrew J, and Tibshirani, Robert. A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association, 109(508):1517-1532, 2014.

[40]

van der Laan, Mark J and Petersen, Maya L. Causal effect models for realistic individualized treatment and intention to treat rules. The International Journal of Biostatistics, 3(1), 2007.

[41]

Wager, Stefan and Athey, Susan. Estimation and inference of heterogeneous treatment effects using random forests. arXiv preprint arXiv:1510.04342, 2015.

[42]

Weiss, Jeremy C, Kuusisto, Finn, Boyd, Kendrick, Lui, Jie, and Page, David C. Machine learning for treatment assignment: Improving individualized risk attribution. American Medical Informatics Association Annual Symposium, 2015.

[43]

Zemel, Rich, Wu, Yu, Swersky, Kevin, Pitassi, Toni, and Dwork, Cynthia. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 325-333, 2013.

Cited By

Chen ZGuo RTon JLiu YBaeza-Yates RBonchi F(2024)Conformal Counterfactual Inference under Hidden ConfoundingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671976(397-408)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671976
Martinez-Taboada DRamdas AKennedy EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)An efficient doubly-robust test for the kernel treatment effectProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668741(59924-59952)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668741
Zhang QWang YWang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Identifiable contrastive learning with automatic feature importance discoveryProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668670(58461-58477)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668670
Show More Cited By

Learning representations for counterfactual inference
1. Computing methodologies

Recommendations

Causal generative explainers using counterfactual inference: a case study on the Morpho-MNIST dataset
Abstract
In this paper, we propose leveraging causal generative learning as an interpretable tool for explaining image classifiers. Specifically, we present a generative counterfactual inference approach to study the influence of visual features (pixels) ...
Counterfactual Explanations for Reinforcement Learning Agents
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Reinforcement learning (RL) algorithms often use neural networks to represent agent's policy, making them difficult to interpret. Counterfactual explanations are human-friendly explanations which offer users actionable advice on how to change their ...
Learning Fair Node Representations with Graph Counterfactual Fairness
WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

Fair machine learning aims to mitigate the biases of model predictions against certain subpopulations regarding sensitive attributes such as race and gender. Among the many existing fairness notions, counterfactual fairness measures the model fairness ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

June 2016

3077 pages

Publisher

JMLR.org

Publication History

Published: 19 June 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen ZGuo RTon JLiu YBaeza-Yates RBonchi F(2024)Conformal Counterfactual Inference under Hidden ConfoundingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671976(397-408)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671976
Martinez-Taboada DRamdas AKennedy EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)An efficient doubly-robust test for the kernel treatment effectProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668741(59924-59952)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668741
Zhang QWang YWang YOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Identifiable contrastive learning with automatic feature importance discoveryProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668670(58461-58477)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668670
Alaa AAhmad Zvan der Laan MOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Conformai meta-learners for predictive inference of individual treatment effectsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668187(47682-47703)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668187
Nilforoshan HMoor MRoohani YChen YŚurina AYasunaga MOblak SLeskovec JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Zero-shot causal learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666423(6862-6901)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666423
Aloui ADong JLe CTarokh VEvans RShpitser I(2023)Transfer learning for individual treatment effect estimationProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625840(56-66)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625840
Wu AKuang KXiong RLi BWu FKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Stable estimation of heterogeneous treatment effectsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619968(37496-37510)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619968
Vanderschueren TCurth AVerbeke WVan Der Schaar MKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Accounting for informative sampling when learning to forecast treatment outcomes over timeProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619859(34855-34874)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619859
Tang CWang HLi XQing CLi LZhou JKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Difference-in-differences meets tree-based methodsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619815(33792-33803)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619815
Sverdrup ECui YKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Proximal causal learning of conditional average treatment effectsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619792(33285-33298)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619792
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents