ACM Home Page
Please provide us with feedback. Feedback
Pervasive parallelism in data mining: dataflow solution to co-clustering large and sparse Netflix data
Full text MovMov (10:56),  PdfPdf (641 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Industrial track papers table of contents
Pages 1115-1124  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Srivatsava Daruru  The University of Texas at Austin, Austin, TX, USA
Nena M. Marin  Pervasive Software, Inc., Austin, TX, USA
Matt Walker  Pervasive Software, Inc., Austin, TX, USA
Joydeep Ghosh  The University of Texas at Austin, Austin, TX, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 41,   Downloads (12 Months): 123,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557140
What is a DOI?

ABSTRACT

All Netflix Prize algorithms proposed so far are prohibitively costly for large-scale production systems. In this paper, we describe an efficient dataflow implementation of a collaborative filtering (CF) solution to the Netflix Prize problem [1] based on weighted coclustering [5]. The dataflow library we use facilitates the development of sophisticated parallel programs designed to fully utilize commodity multicore hardware, while hiding traditional difficulties such as queuing, threading, memory management, and deadlocks.

The dataflow CF implementation first compresses the large, sparse training dataset into co-clusters. Then it generates recommendations by combining the average ratings of the co-clusters with the biases of the users and movies. When configured to identify 20x20 co-clusters in the Netflix training dataset, the implementation predicted over 100 million ratings in 16.31 minutes and achieved an RMSE of 0.88846 without any fine-tuning or domain knowledge. This is an effective real-time prediction runtime of 9.7 us per rating which is far superior to previously reported results. Moreover, the implemented co-clustering framework supports a wide variety of other large-scale data mining applications and forms the basis for predictive modeling on large, dyadic datasets [4, 7].


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Netflix inc. netflix data. http://www.netflixprize.com//download, Oct 2006.
 
2
Pervasive software inc. pervasive data rush. http://www.pervasivedatarush.com/downloads, 2007.
 
3
Timely development. netflix prize. http://www.timelydevelopment.com/demos/NetflixPrize.aspx, 2007.
4
 
5
6
7
 
8
 
9
 
10
 
11
G. Golub and C. V. Loan. Matrix Computations. John Hopkins University Press, Baltimore, 1996.
 
12
A. Grama, A. Gupta, G. Karypis, and V. Kumar. Introduction to Parallel Computing. Addison-Wesley, 2003.
 
13
K. Irwin and M. Walker. Four Paths to Java Parallelism., volume 13. Java Developer.s Journal, 2008.
 
14
G. Kahn. The semantics of a simple language for parallel programming. Proc. of the IFIP Congress 74, North-Holland Publishing Co., Holland, 1974.
15
 
16
E. Lee and T. Parks. Dataflow process networks. Proceedings of the IEEE, 83(5):773--801, May 1995.
 
17
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Application of dimensionality reduction in recommender system - a case study. Proc. WebKDD'00, 2000.
 
18
S.Funk. Netflix update: Try this at home. http://sifter.org/ simon/journal/20061211.html, 2006.
 
19

Collaborative Colleagues:
Srivatsava Daruru: colleagues
Nena M. Marin: colleagues
Matt Walker: colleagues
Joydeep Ghosh: colleagues