| How does high dimensionality affect collaborative filtering? |
| Full text |
Pdf
(351 KB)
|
Source
|
ACM Conference On Recommender Systems
archive
Proceedings of the third ACM conference on Recommender systems
table of contents
New York, New York, USA
SESSION: Short papers
table of contents
Pages 293-296
Year of Publication: 2009
ISBN:978-1-60558-435-5
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 14, Citation Count: 0
|
|
|
ABSTRACT
A crucial operation in memory-based collaborative filtering (CF) is determining nearest neighbors (NNs) of users/items. This paper addresses two phenomena that emerge when CF algorithms perform NN search in high-dimensional spaces that are typical in CF applications. The first is similarity concentration and the second is the appearance of hubs (i.e. points which appear in $k$-NN lists of many other points). Through theoretical analysis and experimental evaluation we show that these phenomena are inherent properties of high-dimensional space, unrelated to other data properties like sparsity, and that they can impact CF algorithms by questioning the meaning and representativeness of discovered NNs. Moreover, we show that it is not easy to mitigate the phenomena using dimensionality reduction. Studying these phenomena aims to provide a better understanding of the limitations of memory-based CF and motivate the development of new algorithms that would overcome them.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Casella and R. L. Berger. Statistical Inference, 2nd ed. Duxbury, 2002.
|
| |
2
|
D. Francois, V. Wertz, and M. Verleysen. The concentration of fractional distances. IEEE T. Knowl. Data. En., 19(7):873--886, 2007.
|
| |
3
|
L. A. Goodman. On the exact variance of products. J. Am. Stat. Assoc., 55(292):708--713, 1960.
|
| |
4
|
M. Grcar, D. Mladenic, B. Fortuna, and M. Grobelnik. Data sparsity issues in the collaborative filtering framework. In Proc. WebKDD Workshop, pages 58--76, 2005.
|
| |
5
|
A. Hinneburg, C. C. Aggarwal, and D. A. Keim. What is the nearest neighbor in high dimensional spaces? In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 506--515, 2000.
|
| |
6
|
M. Radovanovic, A. Nanopoulos, and M. Ivanovic. Nearest neighbors in high-dimensional data: The emergence and influence of hubs. In Proc. Int. Conf. on Machine Learning (ICML), pages 865--872, 2009.
|
| |
7
|
B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Application of dimensionality reduction in recommender system. In Proc. WebKDD Workshop, 2000.
|
| |
8
|
B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proc. World Wide Web Conf. (WWW), pages 285--295, 2001.
|
| |
9
|
J. Wang, A. P. de Vries, and M. J. T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In Proc. ACM Conf. on Research and Development in Information Retrieval (SIGIR), pages 501--508, 2006.
|
| |
10
|
K. Yu, X. Xu, M. Ester, and H.-P. Kriegel. Feature weighting and instance selection for collaborative filtering: An information--theoretic approach. Knowl. Inf. Syst., 5(2):201--224, 2003.
|
|