| Hybrid access-specific software cache techniques for the cell BE architecture |
| Full text |
Pdf
(874 KB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: Programming the memory hierarchy
table of contents
Pages 292-302
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
Marc Gonzàlez
|
Barcelona Supercomputing Center, Barcelona, Spain
|
|
Nikola Vujic
|
Barcelona Supercomputing Center, Barcelona, Spain
|
|
Xavier Martorell
|
Barcelona Supercomputing Center, Barcelona, Spain
|
|
Eduard Ayguadé
|
Barcelona Supercomputing Center, Barcelona, Spain
|
|
Alexandre E. Eichenberger
|
T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
|
|
Tong Chen
|
T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
|
|
Zehra Sura
|
T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
|
|
Tao Zhang
|
T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
|
|
Kevin O'Brien
|
T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
|
|
Kathryn O'Brien
|
T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 224, Citation Count: 1
|
|
|
ABSTRACT
Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that classifies at compile time memory accesses in two classes, high-locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software cache overhead in the innermost loop. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed code-optimizations translate into 3.5 to 8.4 speedup factors, compared to a traditional software cache approach. As a result, we demonstrate that the Cell BE processor can be a competitive alternative to a modern server-class multi-core such as the IBM Power5 processor for a set of parallel NAS applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. E. Eichenberger , J. K. O'Brien , K. M. O'Brien , P. Wu , T. Chen , P. H. Oden , D. A. Prener , J. C. Shepherd , B. So , Z. Sura , A. Wang , T. Zhang , P. Zhao , M. K. Gschwind , R. Archambault , Y. Gao , R. Koo, Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture, IBM Systems Journal, v.45 n.1, p.59-84, January 2006
|
| |
2
|
|
| |
3
|
D. Pham et al., "The Design and Implementation of a First-Generation CELL Processor," in the Proceedings of the IEEE International Solid-State Circuits Conference, 2005.
|
| |
4
|
M. Gschwind et al., "A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor," In Hot Chips 17, 2005.
|
| |
5
|
T. Chen et al., "Optimizing the use of static buffers for DMA on a Cell chip," in the Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2006.
|
| |
6
|
Alexandre E. Eichenberger , Kathryn O'Brien , Kevin O'Brien , Peng Wu , Tong Chen , Peter H. Oden , Daniel A. Prener , Janice C. Shepherd , Byoungro So , Zehra Sura , Amy Wang , Tao Zhang , Peng Zhao , Michael Gschwind, Optimizing Compiler for the CELL Processor, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.161-172, September 17-21, 2005
[doi> 10.1109/PACT.2005.33]
|
| |
7
|
D. Bailey et al. "The NAS parallel benchmarks," Technical Report TR RNR-91-002, NASA Ames, 1991.
|
| |
8
|
|
| |
9
|
C. A. Moritz et al., "Hot Pages: Software Caching for Raw Microprocessors," MIT-LCS Technical Memo LCS-TM-599, 1999.
|
| |
10
|
J. B. Fryman et al., "SoftCache: A Technique for Power and Area Reduction in Embedded Systems," CERCS; GIT-CERCS-03-06
|
 |
11
|
|
| |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
J. Hoeflinger and B. de Supinski, "The OpenMP Memory Model," in the Proceedings of the First International Workshop on OpenMP, 2005.
|
| |
16
|
P. Altevogt et al., "IBM BladeCenter QS21 Hardware Performance," IBM Technical White Paper WP101245, 2008.
|
 |
17
|
|
 |
18
|
Tong Chen , Tao Zhang , Zehra Sura , Mar Gonzales Tallada, Prefetching irregular references for software cache on cell, Proceedings of the sixth annual IEEE/ACM international symposium on Code generation and optimization, April 05-09, 2008, Boston, MA, USA
[doi> 10.1145/1356058.1356079]
|
CITED BY
|
|
Tao Liu , Haibo Lin , Tong Chen , John Kevin O'Brien , Ling Shao, DBDB: optimizing DMATransfer for the cell be architecture, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|