ACM Home Page
Please provide us with feedback. Feedback
MapReduce: simplified data processing on large clusters
Full text Digital EditionDigital Edition PdfPdf (235 KB)
Source
Communications of the ACM archive
Volume 51 ,  Issue 1  (January 2008) table of contents
50th anniversary issue: 1958 - 2008
SPECIAL ISSUE: Breakthrough research: a preview of things to come table of contents
Pages 107-113  
Year of Publication: 2008
ISSN:0001-0782
Authors
Jeffrey Dean  Google, Mountain View, CA
Sanjay Ghemawat  Google, Mountain View, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 296,   Downloads (12 Months): 2174,   Citation Count: 32
Additional Information:

appendices and supplements   abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1327452.1327492
What is a DOI?

APPENDICES and SUPPLEMENTS
Pdfp107-dean.jp.pdf (360 KB),
Japanese CACM Collection  
Requires Asian Language Support in Adobe Reader and Japanese Language Support in Your Browser.


ABSTRACT

MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Hadoop: Open source implementation of MapReduce. http://lucene. apache.org/hadoop/.
 
2
The Phoenix system for MapReduce programming. http://csl.stanford. edu/~christos/sw/phoenix/.
3
 
4
 
5
 
6
 
7
Chu, C.-T., Kim, S. K., Lin, Y. A., Yu, Y., Bradski, G., Ng, A., and Olukotun, K. 2006. Map-Reduce for machine learning on multicore. In Proceedings of Neural Information Processing Systems Conference (NIPS). Vancouver, Canada.
 
8
9
10
 
11
 
12
Gray, J. Sort benchmark home page. http://research.microsoft.com/barc/SortBenchmark/.
 
13
14
15
 
16
 
17

CITED BY  32


REVIEW

"Chris A Mattmann : Reviewer"

Google has revolutionized the way that large-scale data management is engineered and deployed, and evolves over time. In particular, it developed novel methods for file-based data management for rapid indexing and searching of Web pages (PageRank   more...

Collaborative Colleagues:
Jeffrey Dean: colleagues
Sanjay Ghemawat: colleagues