|
ABSTRACT
We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within it. We have implemented a complete programming system, including a compiler and runtime systems for Cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance running Sequoia programs on both of these platforms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alfred V. Aho , Ravi Sethi , Jeffrey D. Ullman, Compilers: principles, techniques, and tools, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1986
|
| |
2
|
Allen, E., Chase, D., Luchangco, V., Maessen, J.-W., Ryu, S., Steele, G., and Tobin-Hochstadt., S., 2005. The Fortress language specification version 0.707. Technical report. Sun Microsystems.
|
| |
3
|
Alpern, B., Carter, L., and Ferrante, J. 1993. Modeling parallel computers as memory hierarchies. In Proc. Programming Models for Massively Parallel Computers.
|
| |
4
|
Alpern, B., Carter, L., Feig, E., and Selker, T. 1994. The uniform memory hierarchy model of computation. Algorithmica 12, 2/3, 72--109.
|
| |
5
|
|
| |
6
|
|
 |
7
|
Ganesh Bikshandi , Jia Guo , Daniel Hoeflinger , Gheorghe Almasi , Basilio B. Fraguela , María J. Garzarán , David Padua , Christoph von Praun, Programming for parallelism and locality with hierarchically tiled arrays, Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming, March 29-31, 2006, New York, New York, USA
[doi> 10.1145/1122971.1122981]
|
 |
8
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
 |
9
|
Ian Buck , Tim Foley , Daniel Horn , Jeremy Sugerman , Kayvon Fatahalian , Mike Houston , Pat Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM Transactions on Graphics (TOG), v.23 n.3, August 2004
|
| |
10
|
Callahan, D., Chamberlain, B. L., and Zima, H. P. 2004. The Cascade high productivity language. In Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, IEEE Computer Society, 52--60.
|
| |
11
|
Carlson, W. W., Draper, J. M., Culler, D. E., Yelick, K., Brooks, E., and Warren, K., 1999. Introduction to UPC and language specification. University of California-Berkeley Technical Report: CCS-TR-99-157.
|
 |
12
|
Philippe Charles , Christian Grothoff , Vijay Saraswat , Christopher Donawa , Allan Kielstra , Kemal Ebcioglu , Christoph von Praun , Vivek Sarkar, X10: an object-oriented approach to non-uniform cluster computing, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications, October 16-20, 2005, San Diego, CA, USA
|
| |
13
|
Chow, A., Fossum, G., and Brokenshire, D., 2005. A programming example: Large FFT on the Cell Broadband Engine.
|
 |
14
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
| |
15
|
|
| |
16
|
William J. Dally , Francois Labonte , Abhishek Das , Patrick Hanrahan , Jung-Ho Ahn , Jayanth Gummaraju , Mattan Erez , Nuwan Jayasena , Ian Buck , Timothy J. Knight , Ujval J. Kapasi, Merrimac: Supercomputing with Streams, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.35, November 15-21, 2003
|
| |
17
|
Deitz, S. J., Chamberlain, B. L., and Snyder, L. 2004. Abstractions for dynamic data distribution. In Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, IEEE Computer Society, 42--51.
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
Fukushige, T., Makino, J., and Kawai, A. 2005. GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems. Publications of the Astronomical Society of Japan 57 (dec), 1009--1021.
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
Intel, 2005. Math kernel library. http://www.intel.com/software/products/mkl.
|
 |
27
|
|
| |
28
|
|
| |
29
|
Kennedy, K., Broom, B., Cooper, K., Dongarra, J., Fowler, R., Gannon, D., Johnsson, L., Mellor-Crummey, J., and Torczon, L. 2001. Telescoping languages: A strategy for automatic generation of scientific problem-solving systems from annotated libraries. Journal of Parallel Distributed Computing 61 (December), 1803--1826.
|
| |
30
|
Francois Labonte , Peter Mattson , William Thies , Ian Buck , Christos Kozyrakis , Mark Horowitz, The Stream Virtual Machine, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.267-277, September 29-October 03, 2004
[doi> 10.1109/PACT.2004.29]
|
 |
31
|
|
| |
32
|
|
| |
33
|
McPeak, S., and Wilkerson, D., 2005. Elsa: The Elkhound-based C/C++ Parser. http://www.cs.berkeley.edu/~smcpeak/elkhound.
|
 |
34
|
|
| |
35
|
Pham, D., Asano, S., Bolliger, M., Day, M. N., Hofstee, H. P., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., and Yazawa, K. 2005. The design and implementation of a first-generation CELL processor. In IEEE International Solid-State Circuits Conference.
|
| |
36
|
|
| |
37
|
Whaley, R. C., Petitet, A., and Dongarra, J. J. 2001. Automated empirical optimization of software and the ATLAS project. Parallel Computing 27, 1--2, 3--35.
|
| |
38
|
Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., and Aiken, A. 1998. Titanium: A high-performance Java dialect. In ACM 1998 Workshop on Java for High-Performance Network Computing.
|
CITED BY 34
|
|
|
|
|
|
|
|
|
|
|
Timothy J. Knight , Ji Young Park , Manman Ren , Mike Houston , Mattan Erez , Kayvon Fatahalian , Alex Aiken , William J. Dally , Pat Hanrahan, Compilation for explicitly managed memory hierarchies, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mike Houston , Ji-Young Park , Manman Ren , Timothy Knight , Kayvon Fatahalian , Alex Aiken , William Dally , Pat Hanrahan, A portable runtime interface for multi-level memory hierarchies, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
|
|
|
Scott Schneider , Jae-Seung Yeom , Benjamin Rose , John C. Linford , Adrian Sandu , Dimitrios S. Nikolopoulos, A comparison of programming models for multiprocessors with explicitly managed memory hierarchies, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|
|
|
|
|
|
|
|
|
|
|
Jia Guo , Ganesh Bikshandi , Basilio B. Fraguela , Maria J. Garzaran , David Padua, Programming with tiles, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
Muthu Manikandan Baskaran , Uday Bondhugula , Sriram Krishnamoorthy , J. Ramanujam , Atanas Rountev , P. Sadayappan, Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
DaeGon Kim , Lakshminarayanan Renganarayanan , Dave Rostron , Sanjay Rajopadhye , Michelle Mills Strout, Multi-level tiling: M for the price of one, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
Filip Blagojevic , Dimitrios S. Nikolopoulos , Alexandros Stamatakis , Christos D. Antonopoulos , Matthew Curtis-Maury, Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems, Parallel Computing, v.33 n.10-11, p.700-719, November, 2007
|
|
|
|
|
|
Tarik Saidani , Stéphane Piskorski , Lionel Lacassagne , Samir Bouaziz, Parallelization schemes for memory optimization on the cell processor: a case study of image processing algorithm, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.9-16, September 16-16, 2007, Brasov, Romania
|
|
|
|
|
|
Mark Silberstein , Assaf Schuster , Dan Geiger , Anjul Patney , John D. Owens, Efficient computation of sum-products on GPUs through software-managed cache, Proceedings of the 22nd annual international conference on Supercomputing, June 07-12, 2008, Island of Kos, Greece
|
|
|
Jaejin Lee , Sangmin Seo , Chihun Kim , Junghyun Kim , Posung Chun , Zehra Sura , Jungwon Kim , SangYong Han, COMIC: a coherent shared memory interface for cell be, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
Manman Ren , Ji Young Park , Mike Houston , Alex Aiken , William J. Dally, A tuning framework for software-managed memory hierarchies, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
Jacob Leverich , Hideho Arakida , Alex Solomatnikov , Amin Firoozshahian , Mark Horowitz , Christos Kozyrakis, Comparative evaluation of memory models for chip multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), v.5 n.3, p.1-30, November 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Filip Blagojevic , Costin Iancu , Katherine Yelick , Matthew Curtis-Maury , Dimitrios S. Nikolopoulos , Benjamin Rose, Scheduling dynamic parallelism on accelerators, Proceedings of the 6th ACM conference on Computing frontiers, May 18-20, 2009, Ischia, Italy
|
|
|
Jeremy S. Meredith , Gonzalo Alvarez , Thomas A. Maier , Thomas C. Schulthess , Jeffrey S. Vetter, Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study, Parallel Computing, v.35 n.3, p.151-163, March, 2009
|
|
|
Tao Liu , Haibo Lin , Tong Chen , John Kevin O'Brien , Ling Shao, DBDB: optimizing DMATransfer for the cell be architecture, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
John H. Kelm , Daniel R. Johnson , Matthew R. Johnson , Neal C. Crago , William Tuohy , Aqeel Mahesri , Steven S. Lumetta , Matthew I. Frank , Sanjay J. Patel, Rigel: an architecture and scalable programming interface for a 1000-core accelerator, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|