| A tuning framework for software-managed memory hierarchies |
| Full text |
Pdf
(1.13 MB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: Programming the memory hierarchy
table of contents
Pages 280-291
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
Manman Ren
|
Stanford University, Stanford, CA, USA
|
|
Ji Young Park
|
Stanford University, Stanford, CA, USA
|
|
Mike Houston
|
Stanford University, Stanford, CA, USA
|
|
Alex Aiken
|
Stanford University, Stanford, CA, USA
|
|
William J. Dally
|
Stanford University, Stanford, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 12, Downloads (12 Months): 168, Citation Count: 0
|
|
|
ABSTRACT
Achieving good performance on a modern machine with a multi-level memory hierarchy, and in particular on a machine with software-managed memories, requires precise tuning of programs to the machine's particular characteristics. A large program on a multi-level machine can easily expose tens or hundreds of inter-dependent parameters which require tuning, and manually searching the resultant large, non-linear space of program parameters is a tedious process of trial-and-error. In this paper we present a general framework for automatically tuning general applications to machines with software-managed memory hierarchies. We evaluate our framework by measuring the performance of benchmarks that are tuned for a range of machines with different memory hierarchy configurations: a cluster of Intel P4 Xeon processors, a single Cell processor, and a cluster of Sony Playstation3's.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Allen and K. Kennedy. Optimizing Compilers for Mordern Architectures. 2001.
|
 |
2
|
Muthu Manikandan Baskaran , Uday Bondhugula , Sriram Krishnamoorthy , J. Ramanujam , Atanas Rountev , P. Sadayappan, Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
[doi> 10.1145/1345206.1345210]
|
 |
3
|
|
 |
4
|
Jeff Bilmes , Krste Asanovic , Chee-Whye Chin , Jim Demmel, Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology, Proceedings of the 11th international conference on Supercomputing, p.340-347, July 07-11, 1997, Vienna, Austria
[doi> 10.1145/263580.263662]
|
 |
5
|
Ian Buck , Tim Foley , Daniel Horn , Jeremy Sugerman , Kayvon Fatahalian , Mike Houston , Pat Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM Transactions on Graphics (TOG), v.23 n.3, August 2004
|
| |
6
|
|
| |
7
|
A. Chow, G. Fossum, and D. Brokenshire. A programming example: Large FFT on the Cell Broadband Engine, 2005.
|
 |
8
|
Albert Cohen , Marc Sigler , Sylvain Girbal , Olivier Temam , David Parello , Nicolas Vasilache, Facilitating the search for compositions of program transformations, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088169]
|
| |
9
|
William J. Dally , Francois Labonte , Abhishek Das , Patrick Hanrahan , Jung-Ho Ahn , Jayanth Gummaraju , Mattan Erez , Nuwan Jayasena , Ian Buck , Timothy J. Knight , Ujval J. Kapasi, Merrimac: Supercomputing with Streams, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.35, November 15-21, 2003
|
| |
10
|
|
| |
11
|
|
| |
12
|
A. E. Eichenberger , J. K. O'Brien , K. M. O'Brien , P. Wu , T. Chen , P. H. Oden , D. A. Prener , J. C. Shepherd , B. So , Z. Sura , A. Wang , T. Zhang , P. Zhao , M. K. Gschwind , R. Archambault , Y. Gao , R. Koo, Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture, IBM Systems Journal, v.45 n.1, p.59-84, January 2006
|
| |
13
|
Alexandre E. Eichenberger , Kathryn O'Brien , Kevin O'Brien , Peng Wu , Tong Chen , Peter H. Oden , Daniel A. Prener , Janice C. Shepherd , Byoungro So , Zehra Sura , Amy Wang , Tao Zhang , Peng Zhao , Michael Gschwind, Optimizing Compiler for the CELL Processor, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.161-172, September 17-21, 2005
[doi> 10.1109/PACT.2005.33]
|
 |
14
|
Kayvon Fatahalian , Daniel Reiter Horn , Timothy J. Knight , Larkhoon Leem , Mike Houston , Ji Young Park , Mattan Erez , Manman Ren , Alex Aiken , William J. Dally , Pat Hanrahan, Sequoia: programming the memory hierarchy, Proceedings of the 2006 ACM/IEEE conference on Supercomputing, November 11-17, 2006, Tampa, Florida
[doi> 10.1145/1188455.1188543]
|
| |
15
|
M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005. special issue on "Program Generation, Optimization, and Platform Adaptation".
|
| |
16
|
|
| |
17
|
G. Fursin, M. O'Boyle, and P. Knijnenburg. Evaluating iterative compilation. In Proc. Languages and Compilers for Parallel Computers (LCPC), pages 305--315, 2002.
|
| |
18
|
|
| |
19
|
Sylvain Girbal , Nicolas Vasilache , Cédric Bastoul , Albert Cohen , David Parello , Marc Sigler , Olivier Temam, Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies, International Journal of Parallel Programming, v.34 n.3, p.261-317, June 2006
[doi> 10.1007/s10766-006-0012-3]
|
 |
20
|
Mike Houston , Ji-Young Park , Manman Ren , Timothy Knight , Kayvon Fatahalian , Alex Aiken , William Dally , Pat Hanrahan, A portable runtime interface for multi-level memory hierarchies, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
[doi> 10.1145/1345206.1345229]
|
| |
21
|
Ujval J. Kapasi , Scott Rixner , William J. Dally , Brucek Khailany , Jung Ho Ahn , Peter Mattson , John D. Owens, Programmable Stream Processors, Computer, v.36 n.8, p.54-62, August 2003
[doi> 10.1109/MC.2003.1220582]
|
| |
22
|
|
| |
23
|
P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O'Boyle. Iterative compilation. pages 171--187, 2002.
|
| |
24
|
|
| |
25
|
M. D. McCool. Data-parallel programming on the Cell BE and the GPU using the RapidMind development platform. In GSPx Multicore Applications Conference, 2006.
|
| |
26
|
D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first-generation CELL processor. In IEEE International Solid-State Circuits Conference, 2005.
|
| |
27
|
S. Pop, A. Cohen, C. Bastoul, S. Girbal, P. Jouvelot, G.-A. Silber, and N. Vasilache. Graphite: Loop optimizations based on the polyhedral model for gcc. In Proceedings of the 4th GCC Developper's summit, 2006.
|
 |
28
|
Louis-Noël Pouchet , Cédric Bastoul , Albert Cohen , John Cavazos, Iterative optimization in the polyhedral model: part ii, multidimensional time, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, June 07-13, 2008, Tucson, AZ, USA
|
| |
29
|
|
| |
30
|
M. PÃijschel, B. Singer, J. Xiong, J. Moura, J. Johnson, D. Padua, M. Veloso, and R. Johnson. Spiral: A generator for platform-adapted libraries of signal processing algorithms. 2004.
|
| |
31
|
A. Qasem and K. Kennedy. A cache-conscious profitablility model for empirical tuning of loop fusion. In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2005), 2005.
|
 |
32
|
|
| |
33
|
|
 |
34
|
Uday Bondhugula , Albert Hartono , J. Ramanujam , P. Sadayappan, A practical automatic polyhedral parallelizer and locality optimizer, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, June 07-13, 2008, Tucson, AZ, USA
|
| |
35
|
|
| |
36
|
|
|