|
ABSTRACT
Relational learning is concerned with predicting unknown values of a relation, given a database of entities and observed relations among entities. An example of relational learning is movie rating prediction, where entities could include users, movies, genres, and actors. Relations encode users' ratings of movies, movies' genres, and actors' roles in movies. A common prediction technique given one pairwise relation, for example a #users x #movies ratings matrix, is low-rank matrix factorization. In domains with multiple relations, represented as multiple matrices, we may improve predictive accuracy by exploiting information from one relation while predicting another. To this end, we propose a collective matrix factorization model: we simultaneously factor several matrices, sharing parameters among factors when an entity participates in multiple relations. Each relation can have a different value type and error distribution; so, we allow nonlinear relationships between the parameters and outputs, using Bregman divergences to measure error. We extend standard alternating projection algorithms to our model, and derive an efficient Newton update for the projection. Furthermore, we propose stochastic optimization methods to deal with large, sparse matrices. Our model generalizes several existing matrix factorization methods, and therefore yields new large-scale optimization algorithms for these problems. Our model can handle any pairwise relational schema and a wide variety of error models. We demonstrate its efficiency, as well as the benefit of sharing parameters among relations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
D. J. Aldous. Representations for partially exchangeable arrays of random variables. J. Multi. Anal., 11(4):581--598, 1981.
|
| |
3
|
D. J. Aldous. Exchangeability and related topics, chapter 1. Springer, 1985.
|
| |
4
|
|
| |
5
|
A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SDM. SIAM, 2007.
|
| |
6
|
|
| |
7
|
|
| |
8
|
L. Bottou and Y. LeCun. Large scale online learning. In NIPS, 2003.
|
| |
9
|
|
| |
10
|
L. Bregman. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comp. Math and Math. Phys., 7:200--217, 1967.
|
| |
11
|
|
 |
12
|
|
| |
13
|
D. Cohn and T. Hofmann. The missing link-a probabilistic model of document content and hypertext connectivity. In NIPS, 2000.
|
| |
14
|
M. Collins, S. Dasgupta, and R. E. Schapire. A generalization of principal component analysis to the exponential family. In NIPS, 2001.
|
| |
15
|
|
| |
16
|
G. H. Golub and C. F. V. Loan. Matrix Computions. John Hopkins UP, 3rd edition, 1996.
|
| |
17
|
G. J. Gordon. Generalized2 linear2 models. In NIPS, 2002.
|
| |
18
|
|
 |
19
|
|
| |
20
|
Internet Movie Database Inc. IMDB interfaces. http://www.imdb.com/interfaces, Jan. 2007.
|
| |
21
|
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS, 2001.
|
| |
22
|
J. D. Leeuw. Block relaxation algorithms in statistics, 1994.
|
 |
23
|
Bo Long , Zhongfei (Mark) Zhang , Xiaoyun Wú , Philip S. Yu, Spectral clustering for multi-type relational data, Proceedings of the 23rd international conference on Machine learning, p.585-592, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143918]
|
 |
24
|
Bo Long , Zhongfei (Mark) Zhang , Xiaoyun Wu , Philip S. Yu, Relational clustering by symmetric convex coding, Proceedings of the 24th international conference on Machine learning, p.569-576, June 20-24, 2007, Corvalis, Oregon
[doi> 10.1145/1273496.1273568]
|
 |
25
|
|
| |
26
|
P. McCullagh and J. Nelder. Generalized Linear Models. Chapman and Hall: London., 1989.
|
| |
27
|
Netflix. Netflix prize dataset. http://www.netflixprize.com, Jan. 2007.
|
| |
28
|
J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.
|
 |
29
|
|
 |
30
|
|
| |
31
|
A. P. Singh and G. J. Gordon. Relational learning via collective matrix factorization. Technical Report CMU-ML-08-109, Machine Learning Department, Carnegie Mellon University, 2008.
|
| |
32
|
N. Srebro and T. Jaakola. Weighted low-rank approximations. In ICML, 2003.
|
| |
33
|
N. Srebro, J. D. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In NIPS, 2004.
|
| |
34
|
P. Stoica and Y. Selen. Cyclic minimizers, majorization techniques, and the expectation-maximization algorithm: a refresher. Sig. Process. Mag., IEEE, 21(1):112--114, 2004.
|
 |
35
|
|
 |
36
|
Shipeng Yu , Kai Yu , Volker Tresp , Hans-Peter Kriegel , Mingrui Wu, Supervised probabilistic principal component analysis, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150454]
|
 |
37
|
|
|