| Regression by dependence minimization and its application to causal inference in additive noise models |
| Full text |
Pdf
(818 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 382
archive
Proceedings of the 26th Annual International Conference on Machine Learning
table of contents
Montreal, Quebec, Canada
Pages 745-752
Year of Publication: 2009
ISBN:978-1-60558-516-1
|
|
Authors
|
|
Joris Mooij
|
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
|
|
Dominik Janzing
|
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
|
|
Jonas Peters
|
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
|
|
Bernhard Schölkopf
|
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 18, Citation Count: 0
|
|
|
ABSTRACT
Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons.
|
| |
2
|
Geiger, D., & Heckerman, D. (1994). Learning Gaussian networks. Proc. of the 10th Annual Conference on Uncertainty in Artificial Intelligence (pp. 235--243).
|
| |
3
|
Gretton, A., Bousquet, O., Smola, A., & Schöölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory: 16th International Conference (ALT 2005) (pp. 63--78).
|
| |
4
|
Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). A distribution - free theory of nonparametric regression. New York: Springer Verlag.
|
| |
5
|
Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schöölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in Neural Information Processing Systems 21 (NIPS* 2008), 689--696.
|
| |
6
|
|
| |
7
|
Mooij, J., Janzing, D., & Schöölkopf, B. (2008). Distinguishing between cause and effect. http://www.kyb.tuebingen.mpg.de/bs/people/jorism/causality-data/.
|
| |
8
|
Okazaki, N., & Nocedal, J. (2008). libLBFGS: C library of limited-memory BFGS (L-BFGS). http://www.chokkan.org/software/liblbfgs/.
|
| |
9
|
|
| |
10
|
|
| |
11
|
Rasmussen, C. E., & Williams, C. (2007). GPML code. http://www.gaussianprocess.org/gpml/code.
|
| |
12
|
Schölkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press.
|
| |
13
|
|
| |
14
|
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. Springer-Verlag. (2nd ed. MIT Press 2000).
|
| |
15
|
|
| |
16
|
Zhang, K., & Hyväärinen, A. (2008). Distinguishing causes from effects using nonlinear acyclic causal models. http://videolectures.net/coa08_zhang_hyvarinen_dcfeu/. Talk at the NIPS 2008 Workshop on Causality: objectives and assessment.
|
|