| A portable runtime interface for multi-level memory hierarchies |
| Full text |
Pdf
(379 KB)
|
Source
|
Principles and Practice of Parallel Programming
archive
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
table of contents
Salt Lake City, UT, USA
SESSION: Runtime systems
table of contents
Pages 143-152
Year of Publication: 2008
ISBN:978-1-59593-795-7
|
|
Authors
|
|
Mike Houston
|
Stanford University, Stanford, CA, USA
|
|
Ji-Young Park
|
Stanford University, Stanford, CA, USA
|
|
Manman Ren
|
Stanford University, Stanford, CA, USA
|
|
Timothy Knight
|
Stanford University, Stanford, CA, USA
|
|
Kayvon Fatahalian
|
Stanford University, Stanford, CA, USA
|
|
Alex Aiken
|
Stanford University, Stanford, CA, USA
|
|
William Dally
|
Stanford University, Stanford, CA, USA
|
|
Pat Hanrahan
|
Stanford University, Stanford, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 16, Downloads (12 Months): 200, Citation Count: 3
|
|
|
ABSTRACT
We present a platform independent runtime interface for moving data and computation through parallel machines with multi-level memory hierarchies. We show that this interface can be used as a compiler target and can be implemented easily and efficiently on a variety of platforms. The interface design allows us to compose multiple runtimes, achieving portability across machines with multiple memory levels. We demonstrate portability of programs across machines with two memory levels with runtime implementations for multi-core/SMP machines, the STI Cell Broadband Engine, a distributed memory cluster, and disk systems. We also demonstrate portability across machines with multiple memory levels by composing runtimes and running on a cluster of SMP nodes, out-of-core algorithms on a Sony Playstation 3 pulling data from disk, and a cluster of Sony Playstation 3's. With this uniform interface, we achieve good performance for our applications and maximize bandwidth and computational resources on these system configurations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Umut A. Acar , Guy E. Blelloch , Robert D. Blumofe, The data locality of work stealing, Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, p.1-12, July 09-13, 2000, Bar Harbor, Maine, United States
[doi> 10.1145/341800.341801]
|
| |
2
|
B. Alpern, L. Carter, and J. Ferrante. Modeling parallel computers as memory hierarchies. In Proc. Programming Models for Massively Parallel Computers, 1993.
|
| |
3
|
ANL. MPICH2. http://www-unix.mcs.anl.gov/mpi/mpich2, 2007.
|
 |
4
|
Robert D. Blumofe , Christopher F. Joerg , Bradley C. Kuszmaul , Charles E. Leiserson , Keith H. Randall , Yuli Zhou, Cilk: an efficient multithreaded runtime system, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.207-216, July 19-21, 1995, Santa Barbara, California, United States
|
 |
5
|
Ian Buck , Tim Foley , Daniel Horn , Jeremy Sugerman , Kayvon Fatahalian , Mike Houston , Pat Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM Transactions on Graphics (TOG), v.23 n.3, August 2004
|
| |
6
|
W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. University of California-Berkeley Technical Report: CCS-TR-99-157, 1999.
|
| |
7
|
A. Chow, G. Fossum, and D. Brokenshire. A programming example: Large FFT on the Cell Broadband Engine, 2005.
|
| |
8
|
|
| |
9
|
S. J. Deitz, B. L. Chamberlain, and L. Snyder. Abstractions for dynamic data distribution. In Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, pages 42--51. IEEE Computer Society, 2004.
|
 |
10
|
|
 |
11
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188543]
|
 |
12
|
|
| |
13
|
T. Fukushige, J. Makino, and A. Kawai. GRAPE-6A: A single-card GRAPE-6 for parallel PC-GRAPE cluster systems. Publications of the Astronomical Society of Japan, 57:1009--1021, dec 2005.
|
| |
14
|
Al Geist , Adam Beguelin , Jack Dongarra , Weicheng Jiang , Robert Manchek , Vaidy Sunderam, PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing, MIT Press, Cambridge, MA, 1995
|
| |
15
|
|
| |
16
|
|
| |
17
|
IBM. IBM BladeCenter QS20. http://www.ibm.com/technology/splash/qs20, 2007.
|
| |
18
|
IBM. IBM Cell Broadband Engine Software Development Kit. http://www.alphaworks.ibm.com/tech/cellsw, 2007.
|
| |
19
|
Intel. Math kernel library. http://www.intel.com/software/products/mkl, 2005.
|
 |
20
|
Laxmikant V. Kale , Sanjeev Krishnan, CHARM++: a portable concurrent object oriented system based on C++, Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications, p.91-108, September 26-October 01, 1993, Washington, D.C., United States
|
 |
21
|
Timothy J. Knight , Ji Young Park , Manman Ren , Mike Houston , Mattan Erez , Kayvon Fatahalian , Alex Aiken , William J. Dally , Pat Hanrahan, Compilation for explicitly managed memory hierarchies, Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, March 14-17, 2007, San Jose, California, USA
[doi> 10.1145/1229428.1229477]
|
| |
22
|
Francois Labonte , Peter Mattson , William Thies , Ian Buck , Christos Kozyrakis , Mark Horowitz, The Stream Virtual Machine, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.267-277, September 29-October 03, 2004
[doi> 10.1109/PACT.2004.29]
|
| |
23
|
MPIF. MPI: A message passing interface standard. In International Journal of Supercomputer Applications, pages 165--416, 1994.
|
| |
24
|
MPIF. MPI-2: Extensions to the Message-Passing Interface. Technical Report, University of Tennessee, Knoxville, 1996.
|
 |
25
|
|
| |
26
|
Sony. Sony Playstation 3. http://www.us.playstation.com/PS3, 2007.
|
| |
27
|
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance Java dialect. In ACM 1998 Workshop on Java for High-Performance Network Computing, Stanford, California, 1998.
|
CITED BY 3
|
|
|
|
|
Manman Ren , Ji Young Park , Mike Houston , Alex Aiken , William J. Dally, A tuning framework for software-managed memory hierarchies, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
Jeremy S. Meredith , Gonzalo Alvarez , Thomas A. Maier , Thomas C. Schulthess , Jeffrey S. Vetter, Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study, Parallel Computing, v.35 n.3, p.151-163, March, 2009
|
|