ACM Home Page
Please provide us with feedback. Feedback
The potential of the cell processor for scientific computing
Full text PdfPdf (285 KB)
Source Conference On Computing Frontiers archive
Proceedings of the 3rd conference on Computing frontiers table of contents
Ischia, Italy
SESSION: Multithreaded, multicore, and SoC systems table of contents
Pages: 9 - 20  
Year of Publication: 2006
ISBN:1-59593-302-6
Authors
Samuel Williams  Lawrence Berkeley National Laboratory, Berkeley, CA
John Shalf  Lawrence Berkeley National Laboratory, Berkeley, CA
Leonid Oliker  Lawrence Berkeley National Laboratory, Berkeley, CA
Shoaib Kamil  Lawrence Berkeley National Laboratory, Berkeley, CA
Parry Husbands  Lawrence Berkeley National Laboratory, Berkeley, CA
Katherine Yelick  Lawrence Berkeley National Laboratory, Berkeley, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 31,   Downloads (12 Months): 320,   Citation Count: 37
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1128022.1128027
What is a DOI?

ABSTRACT

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the forthcoming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on the Cell full system simulator. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Cactus homepage. http://www.cactuscode.org.
 
3
 
4
Cell broadband engine architecture and its first implementation. http://www-128.ibm.com/developerworks/power/library/pa-cellperf/.
 
5
Chombo homepage. http://seesar.lbl.gov/anag/chombo.
 
6
E. D'Azevedo, M. R. Fahey, and R. T. Mills. Vectorized sparse matrix multiply for compressed row storage format. In International Conference on Computational Science (ICCS), pages 99--106, 2005.
 
7
FFTW speed tests. http://www.fftw.org.
 
8
B. Flachs, S. Asano, S. Dhong, et al. A streaming processor unit for a cell processor. ISSCC Dig. Tech. Papers, pages 134--135, February 2005.
9
 
10
Ibm cell specifications. http://www.research.ibm.com/cell/home.html.
 
11
 
12
The Berkeley Intelligent RAM (IRAM) Project. http://iram.cs.berkeley.edu.
13
 
14
15
16
 
17
P. Keltcher, S. Richardson, S. Siu, et al. An equal area comparison of embedded dram and sram memory architectures for a chip multiprocessor. Technical report, HP Laboratories, April 2000.
 
18
 
19
M. Kondo, H. Okawara, H. Nakamura, et al. Scima: A novel processor architecture for high performance computing. In 4th International Conference on High Performance Computing in the Asia Pacific Region, volume 1, May 2000.
 
20
21
 
22
 
23
 
24
L. Oliker, R. Biswas, J. Borrill, et al. A performance evaluation of the Cray X1 for scientific applications. In Proc. 6th International Meeting on High Performance Computing for Computational Science, 2004.
 
25
Ornl cray x1 evaluation. http://www.csm.ornl.gov/~dunigan/cray.
 
26
 
27
D. Pham, S. Asano, M. Bollier, et al. The design and implementation of a first-generation cell processor. ISSCC Dig. Tech. Papers, pages 184--185, February 2005.
 
28
Sony press release. http://www.scei.co.jp/corporate/release/pdf/050517e.pdf.
 
29
M. Suzuoki et al. A microprocessor with a 128-bit cpu, ten floating point macs, four floating-point dividers, and an mpeg-2 decoder. IEEE Solid State Circuits, 34(1), November 1999.
 
30
 
31
 
32

CITED BY  37

Collaborative Colleagues:
Samuel Williams: colleagues
John Shalf: colleagues
Leonid Oliker: colleagues
Shoaib Kamil: colleagues
Parry Husbands: colleagues
Katherine Yelick: colleagues