skip to main content
research-article
Free access

Gaussian Processes for Independence Tests with Non-iid Data in Causal Inference

Published: 26 November 2015 Publication History

Abstract

In applied fields, practitioners hoping to apply causal structure learning or causal orientation algorithms face an important question: which independence test is appropriate for my data? In the case of real-valued iid data, linear dependencies, and Gaussian error terms, partial correlation is sufficient. But once any of these assumptions is modified, the situation becomes more complex. Kernel-based tests of independence have gained popularity to deal with nonlinear dependencies in recent years, but testing for conditional independence remains a challenging problem. We highlight the important issue of non-iid observations: when data are observed in space, time, or on a network, “nearby” observations are likely to be similar. This fact biases estimates of dependence between variables. Inspired by the success of Gaussian process regression for handling non-iid observations in a wide variety of areas and by the usefulness of the Hilbert-Schmidt Independence Criterion (HSIC), a kernel-based independence test, we propose a simple framework to address all of these issues: first, use Gaussian process regression to control for certain variables and to obtain residuals. Second, use HSIC to test for independence. We illustrate this on two classic datasets, one spatial, the other temporal, that are usually treated as iid. We show how properly accounting for spatial and temporal variation can lead to more reasonable causal graphs. We also show how highly structured data, like images and text, can be used in a causal inference framework using a novel structured input/output Gaussian process formulation. We demonstrate this idea on a dataset of translated sentences, trying to predict the source language.

References

[1]
Michel Besserve, Nikos K. Logothetis, and Bernhard Schölkopf. 2013. Statistical analysis of coupled time series with kernel cross-spectral density operators. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2535--2543.
[2]
George E. P. Box, Gwilym M. Jenkins, and Gregory C. Reinsel. 2008. Time Series Analysis. John Wiley & Sons, Inc.
[3]
Leo Breiman and Jerome H. Friedman. 1985. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association 80, 391 (1985), 580--598.
[4]
Taeryon Choi and Mark J Schervish. 2007. On posterior consistency in nonparametric regression problems. Journal of Multivariate Analysis 98, 10 (2007), 1969--1987.
[5]
K. Chwialkowski and A. Gretton. 2014. A kernel independence test for random processes. In ICML. http://arxiv.org/abs/1402.4501.
[6]
N. Cressie and C. K. Wikle. 2011. Statistics for Spatio-Temporal Data. Vol. 465. Wiley.
[7]
G. Doran, K. Muandet, K. Zhang, and B. Scholkopf. 2014. A permutation-based kernel conditional independence test. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence. 132--41.
[8]
David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, and Zoubin Ghahramani. 2013. Structure discovery in nonparametric regression through compositional kernel search. arXiv preprint arXiv:1302.4922 (2013).
[9]
Ragnar Frisch and Frederick V. Waugh. 1933. Partial time regressions as compared with individual trends. Econometrica: Journal of the Econometric Society (1933), 387--401.
[10]
Kenji Fukumizu, Arthur Gretton, Xiaohai Sun, and Bernhard Schölkopf. 2007. Kernel measures of conditional dependence. In NIPS, Vol. 20. 489--496.
[11]
Tilmann Gneiting, M. Genton, and Peter Guttorp. 2007. Geostatistical space-time models, stationarity, separability and full symmetry. Statistical Methods for Spatio-Temporal Systems (2007), 151--175.
[12]
Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. 2012. A kernel two-sample test. The Journal of Machine Learning Research 13 (2012), 723--773.
[13]
Arthur Gretton, Olivier Bousquet, Alex Smola, and Bernhard Schölkopf. 2005. Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic Learning Theory. Springer, 63--77.
[14]
A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Schoelkopf, and A. Smola. 2008. A kernel statistical test of independence. (2008). http://books.nips.cc/papers/files/nips20/NIPS2007_0730.pdf.
[15]
David Harrison Jr. and Daniel L Rubinfeld. 1978. Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management 5, 1 (1978), 81--102.
[16]
Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. 2008. Nonlinear causal discovery with additive noise models. Advances in Neural Information Processing Systems (NIPS), Vol. 21. 689--696.
[17]
Alfredo Kalaitzis, Antti Honkela, Pei Gao, and Neil D. Lawrence. 2013. gptk: Gaussian Processes Tool-Kit. http://CRAN.R-project.org/package=gptk R package version 1.07.
[18]
Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, and Peter Bühlmann. 2012. Causal inference using graphical models with the r package pcalg. Journal of Statistical Software 47, 11 (2012), 1--26. http://www.jstatsoft.org/v47/i11/
[19]
Alexandros Karatzoglou, Alex Smola, Kurt Hornik, and Achim Zeileis. 2004. Kernlab -- An S4 package for kernel methods in R. Journal of Statistical Software 11, 9 (2004), 1--20.
[20]
Samory Kpotufe, Eleni Sgouritsa, Dominik Janzing, and Bernhard Scholkopf. 2014. Consistency of causal inference under the additive noise model. In Proceedings of the 31st International Conference on Machine Learning (ICML'14). Beijing, China.
[21]
Alessio Moneta, Nadine Chlaß, Doris Entner, and Patrik O. Hoyer. 2011. Causal search in structural vector autoregressive models. Journal of Machine Learning Research-Proceedings Track 12 (2011), 95--114.
[22]
Patrick A. P. Moran. 1950. Notes on continuous stochastic phenomena. Biometrika (1950), 17--23.
[23]
R. Kelley Pace and Otis W. Gilley. 1997. Using the spatial configuration of the data to improve estimation. The Journal of Real Estate Finance and Economics 14, 3 (1997), 333--340.
[24]
Judea Pearl. 2009. Causality: Models, Reasoning and Inference (2nd ed.). Cambridge University Press, New York.
[25]
K. Pearson. 1983. Handbook of Applied Mathematics. Van Nostrand Reinhold Company, New York.
[26]
Jonas Peters, Joris Mooij, Dominik Janzing, and Bernhard Schölkopf. 2013. Causal discovery with continuous additive noise models. ArXiv preprint ArXiv:1309.6779v4 (2013).
[27]
J. D. Ramsey. 2014. A scalable conditional independence test for nonlinear, non-gaussian data. ArXiv e-prints (Jan. 2014).
[28]
Carl Edward Rasmussen and Hannes Nickisch. 2010. Gaussian processes for machine learning (GPML) toolbox. The Journal of Machine Learning Research 11 (2010), 3011--3015.
[29]
Carl Edward Rasmussen and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Massachusetts.
[30]
Sashank Reddi and Barnabás Póczos. 2013. Scale invariant conditional dependence measures. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). Atlanta, GA, USA, 1355--1363.
[31]
K. Salkauskas. 1982. Some relationships between surface splines and kriging. In Multivariate Approximation Theory II, W. Schempp and K. Zeller (Eds.). Birkhauser, Basel, 313--325.
[32]
A. J. Smola, A. Gretton, L. Song, and B. Schölkopf. 2007. A hilbert space embedding for distributions. In Proceedings of the International Conference on Algorithmic Learning Theory, Vol. 4754. Springer, 13--31.
[33]
P. Spirtes, C. Glymour, and R. Scheines. 2001. Causation, Prediction, and Search. Vol. 81. MIT Press.
[34]
Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. 2010. Hilbert space embeddings and metrics on probability measures. The Journal of Machine Learning Research 99 (2010), 1517--1561.
[35]
Liangjun Su and Halbert White. 2007. A consistent characteristic function-based test for conditional independence. Journal of Econometrics 141, 2 (2007), 807--834.
[36]
Gábor J. Székely, Maria L. Rizzo, et al. 2009. Brownian distance covariance. The Annals of Applied Statistics 3, 4 (2009), 1236--1265.
[37]
Jörg Tiedemann. 2009. News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In Recent Advances in Natural Language Processing, N. Nicolov, K. Bontcheva, G. Angelova, and R. Mitkov (Eds.). Vol. V. John Benjamins, Amsterdam/Philadelphia, Borovets, Bulgaria, 237--248.
[38]
Robert E. Tillman, Arthur Gretton, and Peter Spirtes. 2009. Nonlinear directed acyclic structure learning with weakly additive noise models. In NIPS. 1847--1855.
[39]
Aad Van Der Vaart and Harry Van Zanten. 2011. Information rates of nonparametric Gaussian process methods. The Journal of Machine Learning Research 12 (2011), 2095--2119.
[40]
Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari. 2013. GPstuff: Bayesian modeling with Gaussian processes. The Journal of Machine Learning Research 14, 1 (2013), 1175--1179.
[41]
G. Wahba. 1990. Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics, Vol. 59. SIAM, Philadelphia.
[42]
C. K. I. Williams. 1998. Prediction with gaussian processes: From linear regression to linear prediction and beyond. In Learning and Inference in Graphical Models, M. I. Jordan (Ed.). Kluwer Academic, 599--621.
[43]
A. G. Wilson and R. P. Adams. 2013. Gaussian process kernels for pattern discovery and extrapolation. In Proceedings of the 30th International Conference on Machine Learning.
[44]
K. Zhang, J. Peters, D. Janzing, B., and B. Schölkopf. 2011. Kernel-based conditional independence test and application in causal discovery. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 804--813.
[45]
Xinhua Zhang, Le Song, Arthur Gretton, and Alex J. Smola. 2008. Kernel measures of independence for non-iid data. In NIPS, Vol. 22.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 7, Issue 2
Special Issue on Causal Discovery and Inference
January 2016
270 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2850424
  • Editor:
  • Yu Zheng
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2015
Accepted: 01 July 2015
Revised: 01 April 2015
Received: 01 July 2014
Published in TIST Volume 7, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Gaussian process
  2. Reproducing kernel Hilbert space
  3. causal inference
  4. causal structure learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)179
  • Downloads (Last 6 weeks)32
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media