|
ABSTRACT
We propose a novel statistical method to predict large scale dyadic response variables in the presence of covariate information. Our approach simultaneously incorporates the effect of covariates and estimates local structure that is induced by interactions among the dyads through a discrete latent factor model. The discovered latent factors provide a redictive model that is both accurate and interpretable. We illustrate our method by working in a framework of generalized linear models, which include commonly used regression techniques like linear regression, logistic regression and Poisson regression as special cases. We also provide scalable generalized EM-based algorithms for model fitting using both "hard" and "soft" cluster assignments. We demonstrate the generality and efficacy of our approach through large scale simulation studies and analysis of datasets obtained from certain real-world movie recommendation and internet advertising applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Aitkin. A general maximum likelihood analysis of overdispersion in generalized linear models. Journal of Statistics and Computing, 6(3):1573--1375, September 1996.
|
| |
2
|
|
| |
3
|
|
 |
4
|
Deepayan Chakrabarti , Spiros Papadimitriou , Dharmendra S. Modha , Christos Faloutsos, Fully automatic cross-associations, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014064]
|
| |
5
|
D. Chickering, D. Heckerman, C. Meek, J. C. Platt, and B. Thiesson. Targeted internet advertising using predictive clustering and linear programming. http://research.microsoft.com/meek/papers/goal-oriented.ps.
|
 |
6
|
|
| |
7
|
C. Fernandez and P. J. Green. Modelling spatially correlated data via mixtures: a Bayesian approach. Journal of Royal Statistics Society Series B, (4):805--826, 2002.
|
| |
8
|
G. Golub and C. Loan. Matrix Computations. John Hopkins University Press, Baltimore, MD., 1989.
|
| |
9
|
Movielens data set. http://www.cs.umn.edu/Research/GroupLens/data/ml-data.tar.gz.
|
| |
10
|
|
| |
11
|
P. Hoff, A. Raftery, and M. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97:1090--1098, 2002.
|
 |
12
|
|
| |
13
|
D. L. Lee and S. Seung. Algorithms for non-negative matrix factorization. In NIPS, pages 556--562, 2001.
|
 |
14
|
Bo Long , Xiaoyun Wu , Zhongfei (Mark) Zhang , Philip S. Yu, Unsupervised learning on k-partite graphs, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150439]
|
| |
15
|
|
| |
16
|
P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman & Hall/CRC, 1989.
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077--1087, 2001.
|
| |
21
|
|
| |
22
|
J. Rasbash and H. Goldstein. Efficient analysis of mixed hierarchical and cross-classified random structures using a multilevel model. Journal of Educational Statistics, (4):337--350, 1994.
|
 |
23
|
Paul Resnick , Neophytos Iacovou , Mitesh Suchak , Peter Bergstrom , John Riedl, GroupLens: an open architecture for collaborative filtering of netnews, Proceedings of the 1994 ACM conference on Computer supported cooperative work, p.175-186, October 22-26, 1994, Chapel Hill, North Carolina, United States
[doi> 10.1145/192844.192905]
|
|