skip to main content
10.5555/3045390.3045708guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Learning representations for counterfactual inference

Published: 19 June 2016 Publication History

Abstract

Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, "Would this patient have lower blood sugar had she received a different medication?". We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art.

References

[1]
Austin, Peter C. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research, 46(3): 399-424, 2011.
[2]
Bang, Heejung and Robins, James M. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962-973, 2005.
[3]
Ben-David, Shai, Blitzer, John, Crammer, Koby, Pereira, Fernando, et al. Analysis of representations for domain adaptation. Advances in neural information processing systems, 19:137, 2007.
[4]
Bengio, Yoshua, Courville, Aaron, and Vincent, Pierre. Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798-1828, 2013.
[5]
Beygelzimer, Alina, Langford, John, Li, Lihong, Reyzin, Lev, and Schapire, Robert E. Contextual bandit algorithms with supervised learning guarantees. arXiv preprint arXiv:1002.4058, 2010.
[6]
Bottou, Léon, Peters, Jonas, Quinonero-Candela, Joaquin, Charles, Denis X, Chickering, D Max, Portugaly, Elon, Ray, Dipankar, Simard, Patrice, and Snelson, Ed. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research, 14(1):3207-3260, 2013.
[7]
Chernozhukov, Victor, Fernández-Val, Iván, and Melly, Blaise. Inference on counterfactual distributions. Econometrica, 81(6):2205-2268, 2013.
[8]
Chipman, Hugh and McCulloch, Robert. BayesTree: Bayesian additive regression trees. https://cran.r-project.org/package=BayesTree/, 2016. Accessed: 2016-01-30.
[9]
Chipman, Hugh A, George, Edward I, and McCulloch, Robert E. Bart: Bayesian additive regression trees. The Annals of Applied Statistics, pp. 266-298, 2010.
[10]
Cortes, Corinna and Mohri, Mehryar. Domain adaptation and sample bias correction theory and algorithm for regression. Theoretical Computer Science, 519:103-126, 2014.
[11]
Daume III, Hal and Marcu, Daniel. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, pp. 101-126, 2006.
[12]
Dorie, Vincent. NPCI: Non-parametrics for causal inference. https://github.com/vdorie/npci, 2016. Accessed: 2016-01-30.
[13]
Dudík, Miroslav, Langford, John, and Li, Lihong. Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601, 2011.
[14]
Gani, Yaroslav, Ustinova, Evgeniya, Ajakan, Hana, Germain, Pascal, Larochelle, Hugo, Laviolette, François, Marchand, Mario, and Lempitsky, Victor. Domain-adversarial training of neural networks. arXiv preprint arXiv:1505.07818, 2015.
[15]
Gelman, Andrew and Hill, Jennifer. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, 2006.
[16]
Gretton, Arthur, Borgwardt, Karsten M., Rasch, Malte J., Schölkopf, Bernhard, and Smola, Alexander. A kernel two-sample test. J. Mach. Learn. Res., 13:723-773, March 2012. ISSN 1532-4435.
[17]
Hill, Jennifer L. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1), 2011.
[18]
Jiang, Jing. A literature survey on domain adaptation of statistical classifiers. Technical report, University of Illinois at Urbana-Champaign, 2008.
[19]
Kang, Joseph DY and Schafer, Joseph L. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical science, pp. 523-539, 2007.
[20]
Langford, John, Li, Lihong, and Dudík, Miroslav. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1097-1104, 2011.
[21]
Lewis, David. Causation. The journal of philosophy, pp. 556-567, 1973.
[22]
Louizos, Christos, Swersky, Kevin, Li, Yujia, Welling, Max, and Zemel, Richard. The variational fair auto encoder. arXiv preprint arXiv:1511.00830, 2015.
[23]
Mansour, Yishay, Mohri, Mehryar, and Rostamizadeh, Afshin. Domain adaptation: Learning bounds and algorithms. arXiv preprint arXiv:0902.3430, 2009.
[24]
Morgan, Stephen L and Winship, Christopher. Counterfactuals and causal inference. Cambridge University Press, 2014.
[25]
Newman, David. Bag of words data set. https://archive.ics.uci.edu/ml/datasets/Bag+of+Words, 2008.
[26]
Pearl, Judea. Causality. Cambridge university press, 2009.
[27]
Pearl, Judea. Invited commentary: understanding bias amplification. American journal of epidemiology, 174(11): 1223-1227, 2011.
[28]
Prentice, Ross. Use of the logistic model in retrospective studies. Biometrics, pp. 599-606, 1976.
[29]
Robins, James M, Hernan, Miguel Angel, and Brumback, Babette. Marginal structural models and causal inference in epidemiology. Epidemiology, pp. 550-560, 2000.
[30]
Rosenbaum, Paul R. Observational studies. Springer, 2002.
[31]
Rosenbaum, Paul R. Design of Observational Studies. Springer Science & Business Media, 2009.
[32]
Rosenbaum, Paul R and Rubin, Donald B. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41-55, 1983.
[33]
Rubin, Donald B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688, 1974.
[34]
Rubin, Donald B. Causal inference using potential outcomes. Journal of the American Statistical Association, 2011.
[35]
Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. On causal and anticausal learning. In Proceedings of the 29th International Conference on Machine Learning, pp. 1255-1262, New York, NY, USA, 2012. Omnipress.
[36]
Strehl, Alex, Langford, John, Li, Lihong, and Kakade, Sham M. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems, pp. 2217-2225, 2010.
[37]
Sutton, Richard S and Barto, Andrew G. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998.
[38]
Swaminathan, Adith and Joachims, Thorsten. Batch learning from logged bandit feedback through counterfactual risk minimization. Journal of Machine Learning Research, 16:1731-1755, 2015.
[39]
Tian, Lu, Alizadeh, Ash A, Gentles, Andrew J, and Tibshirani, Robert. A simple method for estimating interactions between a treatment and a large number of covariates. Journal of the American Statistical Association, 109(508):1517-1532, 2014.
[40]
van der Laan, Mark J and Petersen, Maya L. Causal effect models for realistic individualized treatment and intention to treat rules. The International Journal of Biostatistics, 3(1), 2007.
[41]
Wager, Stefan and Athey, Susan. Estimation and inference of heterogeneous treatment effects using random forests. arXiv preprint arXiv:1510.04342, 2015.
[42]
Weiss, Jeremy C, Kuusisto, Finn, Boyd, Kendrick, Lui, Jie, and Page, David C. Machine learning for treatment assignment: Improving individualized risk attribution. American Medical Informatics Association Annual Symposium, 2015.
[43]
Zemel, Rich, Wu, Yu, Swersky, Kevin, Pitassi, Toni, and Dwork, Cynthia. Learning fair representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 325-333, 2013.

Cited By

View all
  1. Learning representations for counterfactual inference

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48
    June 2016
    3077 pages

    Publisher

    JMLR.org

    Publication History

    Published: 19 June 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Conformal Counterfactual Inference under Hidden ConfoundingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671976(397-408)Online publication date: 25-Aug-2024
    • (2023)An efficient doubly-robust test for the kernel treatment effectProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668741(59924-59952)Online publication date: 10-Dec-2023
    • (2023)Identifiable contrastive learning with automatic feature importance discoveryProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668670(58461-58477)Online publication date: 10-Dec-2023
    • (2023)Conformai meta-learners for predictive inference of individual treatment effectsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668187(47682-47703)Online publication date: 10-Dec-2023
    • (2023)Zero-shot causal learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666423(6862-6901)Online publication date: 10-Dec-2023
    • (2023)Transfer learning for individual treatment effect estimationProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625840(56-66)Online publication date: 31-Jul-2023
    • (2023)Stable estimation of heterogeneous treatment effectsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619968(37496-37510)Online publication date: 23-Jul-2023
    • (2023)Accounting for informative sampling when learning to forecast treatment outcomes over timeProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619859(34855-34874)Online publication date: 23-Jul-2023
    • (2023)Difference-in-differences meets tree-based methodsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619815(33792-33803)Online publication date: 23-Jul-2023
    • (2023)Proximal causal learning of conditional average treatment effectsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619792(33285-33298)Online publication date: 23-Jul-2023
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media