ACM Home Page
Please provide us with feedback. Feedback
Hybrid access-specific software cache techniques for the cell BE architecture
Full text PdfPdf (874 KB)
Source
PACT archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques table of contents
Toronto, Ontario, Canada
SESSION: Programming the memory hierarchy table of contents
Pages 292-302  
Year of Publication: 2008
ISBN:978-1-60558-282-5
Authors
Marc Gonzàlez  Barcelona Supercomputing Center, Barcelona, Spain
Nikola Vujic  Barcelona Supercomputing Center, Barcelona, Spain
Xavier Martorell  Barcelona Supercomputing Center, Barcelona, Spain
Eduard Ayguadé  Barcelona Supercomputing Center, Barcelona, Spain
Alexandre E. Eichenberger  T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
Tong Chen  T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
Zehra Sura  T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
Tao Zhang  T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
Kevin O'Brien  T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
Kathryn O'Brien  T.J. Watson IBM Research Center, Yorktown Heights, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 224,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1454115.1454156
What is a DOI?

ABSTRACT

Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that classifies at compile time memory accesses in two classes, high-locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software cache overhead in the innermost loop. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed code-optimizations translate into 3.5 to 8.4 speedup factors, compared to a traditional software cache approach. As a result, we demonstrate that the Cell BE processor can be a competitive alternative to a modern server-class multi-core such as the IBM Power5 processor for a set of parallel NAS applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
D. Pham et al., "The Design and Implementation of a First-Generation CELL Processor," in the Proceedings of the IEEE International Solid-State Circuits Conference, 2005.
 
4
M. Gschwind et al., "A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor," In Hot Chips 17, 2005.
 
5
T. Chen et al., "Optimizing the use of static buffers for DMA on a Cell chip," in the Proceedings of the International Workshop on Languages and Compilers for Parallel Computing, 2006.
 
6
 
7
D. Bailey et al. "The NAS parallel benchmarks," Technical Report TR RNR-91-002, NASA Ames, 1991.
 
8
 
9
C. A. Moritz et al., "Hot Pages: Software Caching for Raw Microprocessors," MIT-LCS Technical Memo LCS-TM-599, 1999.
 
10
J. B. Fryman et al., "SoftCache: A Technique for Power and Area Reduction in Embedded Systems," CERCS; GIT-CERCS-03-06
11
 
12
13
 
14
 
15
J. Hoeflinger and B. de Supinski, "The OpenMP Memory Model," in the Proceedings of the First International Workshop on OpenMP, 2005.
 
16
P. Altevogt et al., "IBM BladeCenter QS21 Hardware Performance," IBM Technical White Paper WP101245, 2008.
17
18


Collaborative Colleagues:
Marc Gonzàlez: colleagues
Nikola Vujic: colleagues
Xavier Martorell: colleagues
Eduard Ayguadé: colleagues
Alexandre E. Eichenberger: colleagues
Tong Chen: colleagues
Zehra Sura: colleagues
Tao Zhang: colleagues
Kevin O'Brien: colleagues
Kathryn O'Brien: colleagues