|
ABSTRACT
We present a compiler for machines with an explicitly managed memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Allen, D. Chase, V. Luchangco, J.-W. Maessen, S. Ryu, G. Steele, and S. Tobin-Hochstadt. The Fortress language specification version 0.707. Technical report. Sun Microsystems, 2005.
|
| |
2
|
B. Alpern, L. Carter, and J. Ferrante. Modeling parallel computers as memory hierarchies. In Programming Models for Massively Parallel Computers, 1993.
|
 |
3
|
|
| |
4
|
D. Callahan, B. L. Chamberlain, and H. P. Zima. The Cascade high productivity language. In Proceedings of the Ninth International Workshop on High Level Parallel Programming Models and Supportive Environments, pages 52--60. IEEE Computer Society, 2004.
|
 |
5
|
David Callahan , Ken Kennedy , Allan Porterfield, Software prefetching, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.40-52, April 08-11, 1991, Santa Clara, California, United States
|
| |
6
|
W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. University of California-Berkeley Technical Report: CCS-TR-99-157, 1999.
|
| |
7
|
Bradford L. Chamberlain , Sung-Eun Choi , E Christopher Lewis , Lawrence Snyder , W. Derrick Weathersby , Calvin Lin, The Case for High-Level Parallel Programming in ZPL, IEEE Computational Science & Engineering, v.5 n.3, p.76-86, July 1998
[doi> 10.1109/99.714604]
|
 |
8
|
Philippe Charles , Christian Grothoff , Vijay Saraswat , Christopher Donawa , Allan Kielstra , Kemal Ebcioglu , Christoph von Praun , Vivek Sarkar, X10: an object-oriented approach to non-uniform cluster computing, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
| |
9
|
A. Chow, G. Fossum, and D. Brokenshire. A programming example: Large FFT on the Cell Broadband Engine. http://www-306.ibm.com/ chips/techlib/techlib.nsf/techdocs/0AA2394A505EF0FB872570AB005BF0F1, 2005.
|
| |
10
|
L. Cico, R. Cooper, and J. Greene. Performance and programmability of the IBM/Sony/Toshiba Cell Broadband Engine processor. In Workshop on Edge Computing Using New Commodity Architectures (EDGE), 2006.
|
| |
11
|
ClearSpeed. CSX600 Processor Datasheet. http://www.clearspeed.com/, 2005.
|
| |
12
|
ClearSpeed. CSX600 Processor Datasheet. http://www.clearspeed.com/, 2005.
|
 |
13
|
|
| |
14
|
A. E. Eichenberger , J. K. O'Brien , K. M. O'Brien , P. Wu , T. Chen , P. H. Oden , D. A. Prener , J. C. Shepherd , B. So , Z. Sura , A. Wang , T. Zhang , P. Zhao , M. K. Gschwind , R. Archambault , Y. Gao , R. Koo, Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture, IBM Systems Journal, v.45 n.1, p.59-84, January 2006
|
| |
15
|
Alexandre E. Eichenberger , Kathryn O'Brien , Kevin O'Brien , Peng Wu , Tong Chen , Peter H. Oden , Daniel A. Prener , Janice C. Shepherd , Byoungro So , Zehra Sura , Amy Wang , Tao Zhang , Peng Zhao , Michael Gschwind, Optimizing Compiler for the CELL Processor, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.161-172, September 17-21, 2005
[doi> 10.1109/PACT.2005.33]
|
 |
16
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Memory---Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188543]
|
 |
17
|
|
| |
18
|
|
| |
19
|
T. Fukushige, J. Makino, and A. Kawai. GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems. Publications of the Astronomical Society of Japan, 57:1009--1021, dec 2005.
|
 |
20
|
|
| |
21
|
|
| |
22
|
IBM. Cell Broadband Engine Architecture Version 1.01. http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/1AEEE1270EA2776387257060006E61BA, August 8 2005.
|
| |
23
|
U. J. Kapasi, P. Mattson, W. J. Dally, J. D. Owens, and B. Towles. Stream scheduling. In Proceedings of the 3rd Workshop on Media and Streaming Processors, pages 101--106, 2001.
|
| |
24
|
Ujval J. Kapasi , Scott Rixner , William J. Dally , Brucek Khailany , Jung Ho Ahn , Peter Mattson , John D. Owens, Programmable Stream Processors, Computer, v.36 n.8, p.54-62, August 2003
[doi> 10.1109/MC.2003.1220582]
|
| |
25
|
K. Kennedy, B. Broom, K. Cooper, J. Dongarra, R. Fowler, D. Gannon, L. Johnsson, J. Mellor-Crummey, and L. Torczon. Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries. Journal of Parallel Distributed Computing, 61:1803--1826, December 2001.
|
| |
26
|
|
 |
27
|
|
| |
28
|
|
| |
29
|
S. McPeak and D. Wilderson. Elsa: The Elkhound-based C/C++Parser. http://www.cs.berkeley.edu/~smcpeak/elkhound, 2005.
|
| |
30
|
Message Passing Interface Forum. MPI: A Message Passing Interface Standard, May 1994.
|
 |
31
|
|
| |
32
|
D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S.Weitzel, D.Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first-generation CELL processor. In IEEE International Solid-State Circuits Conference, 2005.
|
| |
33
|
RapidMind. http://rapidmind.net/.
|
| |
34
|
|
 |
35
|
|
| |
36
|
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. In ACM 1998 Workshop on Java for High-Performance Network Computing, 1998.
|
CITED BY 8
|
|
|
|
|
Mike Houston , Ji-Young Park , Manman Ren , Timothy Knight , Kayvon Fatahalian , Alex Aiken , William Dally , Pat Hanrahan, A portable runtime interface for multi-level memory hierarchies, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
|
|
|
Tarik Saidani , Stéphane Piskorski , Lionel Lacassagne , Samir Bouaziz, Parallelization schemes for memory optimization on the cell processor: a case study of image processing algorithm, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.9-16, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
Li Wang , Xuejun Yang , Jingling Xue , Yu Deng , Xiaobo Yan , Tao Tang , Quan Hoang Nguyen, Optimizing scientific application loops on stream processors, ACM SIGPLAN Notices, v.43 n.7, July 2008
|
|
|
Roger Ferrer , Marc González , Federico Silla , Xavier Martorell , Eduard Ayguadé, Evaluation of memory performance on the cell BE with the SARC programming model, Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture, p.77-84, October 26-26, 2008, Toronto, Canada
|
|
|
Tao Liu , Haibo Lin , Tong Chen , John Kevin O'Brien , Ling Shao, DBDB: optimizing DMATransfer for the cell be architecture, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|