|
ABSTRACT
Modeling relational data is an important problem for modern data analysis and machine learning. In this paper we propose a Bayesian model that uses a hierarchy of probabilistic assumptions about the way objects interact with one another in order to learn latent groups, their typical interaction patterns, and the degree of membership of objects to groups. Our model explains the data using a small set of parameters that can be reliably estimated with an efficient inference algorithm. In our approach, the set of probabilistic assumptions may be tailored to a specific application domain in order to incorporate intuitions and/or semantics of interest. We demonstrate our methods on simulated data and we successfully apply our model to a data set of protein-to-protein interactions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
B. P. Carlin and T. A. Louis. Bayes and Empirical Bayes Methods for Data Analysis. Chapman & Hall, 2005.
|
| |
4
|
D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13, 2001.
|
| |
5
|
|
| |
6
|
E. Erosheva and S. E. Fienberg. Classification---The Ubiquitous Challenge, chapter Bayesian Mixed Membership Models for Soft Classification, pages 11--26. Springer-Verlag, 2005.
|
| |
7
|
E. A. Erosheva, S. E. Fienberg, and J. Lafferty, Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 97(22):11885--11892, 2004.
|
| |
8
|
S. E. Fienberg, M. M. Meyer, and S. Wasserman. Statistical analysis of multiple sociometric relations. Journal of the American Statistical Association, 80:51--67, 1985.
|
| |
9
|
A. C. Gavin, M. Bosche, R. Krause, P. Grandi, M. Marzioch, A. Bauer, J. Schultz, and et. al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415:141--147, 2002.
|
| |
10
|
Y. Ho, A. Gruhler, A. Heilbut, G. D. Bader, L. Moore, S. L. Adams, A. Millar, P. Taylor, K. Bennett, and K. B. et. al. Systematic identification of protein complexes in saccharomyces cerevisiae by mass spectrometry. Nature, 415:180--183, 2002.
|
| |
11
|
P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090--1098, 2002.
|
| |
12
|
P. W. Holland and S. Leinhardt. Sociological Methodology, chapter Local structure in social networks, pages 1--45. Jossey-Bass, 1975.
|
| |
13
|
C. Kemp, T. L. Griffiths, and J. B. Tenenbaum. Discovering latent classes in relational data. Technical Report AI Memo 2004--019, MIT, 2004.
|
| |
14
|
G. R. Lanckriet, M. Deng, N. Cristianini, M. I. Jordan, and W. S. Noble. Kernel-based data fusion and its application to protein function prediction in yeast. In Proceedings of the Pacific Symposium on Biocomputing, 2004.
|
| |
15
|
K. G. Manton, M. A. Woodbury, and H. D. Tolley. Statistical Applications Using Fuzzy Sets. Wiley, 1994.
|
| |
16
|
H. W. Mewes, C. Amid, R. Arnold, D. Frishman, U. Guldener, and et. al. Mips: analysis and annotation of proteins from whole genomes. Nucleic Acids Research, 32:D41--44, 2004.
|
| |
17
|
T. Minka. Estimating a Dirichlet distribution. Technical report, M.I.T., 2000.
|
| |
18
|
J. Pritchard, M. Stephens, and P. Donnelly. Inference of population structure using multilocus genotype data. Genetics, 155:945--959, 2000.
|
| |
19
|
N. A. Rosenberg, J. K. Pritchard, J. L. Weber, H. M. Cann, K. K. Kidd, L. A. Zhivotovsky, and M. W. Feldman. Genetic structure of human populations. Science, 298:2381--2385, 2002.
|
| |
20
|
T. A. B. Snijders. Markov chain monte carlo estimation of exponential random graph models. Journal of Social Structure, 2002.
|
| |
21
|
B. Taskar, M. F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In Neural Information Processing Systems 15, 2003.
|
| |
22
|
M. J. Wainwright and M. I. Jordan. Graphical models, exponential families and variational inference. Technical Report 649, Department of Statistics, University of California, Berkeley, 2003.
|
| |
23
|
S. Wasserman and P. Pattison. Logit models and logistic regression for social networks: I. an introduction to markov graphs and p*. Psychometrika, 61:401--425, 1996.
|
| |
24
|
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russel. Distance metric learning with applications to clustering with side information. In Advances in Neural Information Processing Systems, 15, 2003.
|
| |
25
|
L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In Advances in Neural Information Processing Systems 17, 2004.
|
CITED BY 3
|
|
Fan Guo , Steve Hanneke , Wenjie Fu , Eric P. Xing, Recovering temporally rewiring networks: a model-based approach, Proceedings of the 24th international conference on Machine learning, p.321-328, June 20-24, 2007, Corvalis, Oregon
|
|
|
|
|
|
Huajing Li , Zaiqing Nie , Wang-Chien Lee , Lee Giles , Ji-Rong Wen, Scalable community discovery on textual data with relations, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|