|
ABSTRACT
The presence of autocorrelation provides strong motivation for using relational techniques for learning and inference. Autocorrelation is a statistical dependency between the values of the same variable on related entities and is a nearly ubiquitous characteristic of relational data sets. Recent research has explored the use of collective inference techniques to exploit this phenomenon. These techniques achieve significant performance gains by modeling observed correlations among class labels of related instances, but the models fail to capture a frequent cause of autocorrelation---the presence of underlying groups that influence the attributes on a set of entities. We propose a latent group model (LGM) for relational data, which discovers and exploits the hidden structures responsible for the observed autocorrelation among class labels. Modeling the latent group structure improves model performance, increases inference efficiency, and enhances our understanding of the datasets. We evaluate performance on three relational classification tasks and show that LGM outperforms models that ignore latent group structure, particularly when there is little information with which to seed inference.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
| |
3
|
Mark Craven , Dan DiPasquo , Dayne Freitag , Andrew McCallum , Tom Mitchell , Kamal Nigam , Seán Slattery, Learning to extract symbolic knowledge from the World Wide Web, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.509-516, July 1998, Madison, Wisconsin, United States
|
| |
4
|
|
| |
5
|
|
| |
6
|
P. Hoff, A. Raftery, and M. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090--1098, 2002.
|
| |
7
|
|
 |
8
|
|
| |
9
|
C. Kemp, T. Griffiths, and J. Tenenbaum. Discovering latent classes in relational data. Technical Report AI Memo 2004-019, Massachusetts Institute of Technology, 2004.
|
| |
10
|
Jeremy Kubica , Andrew Moore , Jeff Schneider , Yiming Yang, Stochastic link and group detection, Eighteenth national conference on Artificial intelligence, p.798-804, July 28-August 01, 2002, Edmonton, Alberta, Canada
|
| |
11
|
S. Macskassy and F. Provost. Classification in networked data: A toolkit and a univariate case study. Technical Report CeDER-04-08, Stern School of Business, New York University, 2004.
|
| |
12
|
|
| |
13
|
B. Milch, B. Marthi, D. Sontag, S. Russell, D. Ong, and A. Kolobov. Approximate inference for infinite contingent bayesian networks. In Proc. of the 10th International Workshop on Artificial Intelligence and Statistics, 2005.
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
K. Nowicki and T. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96:1077--1087, 2001.
|
 |
18
|
|
| |
19
|
|
| |
20
|
I. Stahl. Predicate invention in inductive logic programming. In Advances in Inductive Logic Programming, pages 34--47. 1996.
|
| |
21
|
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pages 485--492, 2002.
|
| |
22
|
B. Taskar, E. Segal, and D. Koller. Probabilistic classification and clustering in relational data. In Proceedings of the 17th International Joint Conference on Artificial Intelligence, pages 870--878, 2001.
|
|