|
ABSTRACT
Gaussian graphical models (GGMs) are widely used to tackle the important and challenging problem of inferring genetic regulatory networks from expression data. These models have gained much attention as they encode full conditional relationships between variables, i.e. genes. As a consequence, structure learning of a GGM requires an invertible and well-conditioned covariance matrix. Unfortunately, the usual estimator---the sample covariance matrix---is ill-suited in the "small n, large p" setting characteristic of microarray data. As an alternative, [9] proposed a shrinkage estimator that is both statistically efficient and computationally fast. The effectiveness of this estimator in bioinformatics has been illustrated by [12] who successfully used it to infer genetic regulatory networks from microarray data. Unfortunately, this improved estimator requires the shrinkage intensity to be estimated from the data, which is problematic in the "small n, large p" setting. Indeed, we show that the optimal shrinkage intensity estimator used in [9, 12] is biased. We propose a parametric bootstrap approach to estimate this bias and derive a "bias-corrected" shrinkage estimator. The applicability and usefulness of our estimator are demonstrated on both simulated and real expression data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Adrian Dobra , Chris Hans , Beatrix Jones , Joseph R. Nevins , Guang Yao , Mike West, Sparse graphical models for exploring gene expression data, Journal of Multivariate Analysis, v.90 n.1, p.196-212, July 2004
[doi> 10.1016/j.jmva.2004.02.009]
|
| |
3
|
D. Edwards. Introduction to Graphical Modelling. Springer Texts in Statistics. Springer, second edition, 2000.
|
| |
4
|
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York, 2001.
|
| |
5
|
W. James and C. Stein. Estimation with quadratic loss. In L. M. LeCam and J. Neyman, editors, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, pages 361--379, Berkeley, California, 1961. University of California Press.
|
| |
6
|
|
| |
7
|
K. Kontos and G. Bontempi. Nested q-partial graphs for genetic network inference from "small n, large p" microarray data. In M. Elloumi, J. Küng, M. Linial, R. Murphy, K. Schneider, and C. Toma, editors, Proceedings of the 2nd International Conference on Bioinformatics Research and Development (BIRD 2008), number 13 in Communications in Computer and Information Science (CCIS), pages 273--287, Heidelberg, 2008. Springer.
|
| |
8
|
S. L. Lauritzen. Graphical Models. Oxford Statistical Science Series. Clarendon Press, Oxford, 1996.
|
| |
9
|
|
| |
10
|
P. Magwene and J. Kim. Estimating genomic coexpression networks using first-order conditional independence. Genome Biology, 5:R100, 2004.
|
| |
11
|
|
| |
12
|
J. Schäfer and K. Strimmer. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1): 32, 2005.
|
| |
13
|
E. P. van Someren, L. F. A. Wessels, E. Backer, and M. J. T. Reinders. Genetic network modeling. Pharmacogenomics, 3(4): 507--525, 2002.
|
| |
14
|
|
| |
15
|
J. Whittaker. Graphical Models in Applied Multivariate Statistics. John Wiley and Sons, Inc., 1990.
|
| |
16
|
A. Wille and P. Bühlmann. Low-order conditional independence graphs for inferring genetic networks. Statistical Applications in Genetics and Molecular Biology, 5(1):Article 1, 2006.
|
|