ABSTRACT
The Roofline model offers insight on how to improve the performance of software and hardware.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
AMD. Software Optimization Guide for AMD Family 10h Processors, Publication 40546, Apr. 2008; www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/40546.pdf.
|
 |
3
|
|
| |
4
|
Asanovic, K., Bodik, R., Catanzaro, B., Gebis, J., Keutzer, K., Patterson, D., Plishker, W., Shalf, J., Williams, S., and Yelick, K. The Landscape of Parallel Computing Research: A View from Berkeley Technical Report UCB/EECS-2006-183. EECS, University of California, Berkeley, Dec. 2006.
|
| |
5
|
Bienia, C., Kumar, S., Singh, J., and Li, K. The PARSEC Benchmark Suite: Characterization and Architectural Implications, Technical Report TR-811-008. Princeton University, Jan. 2008.
|
| |
6
|
Bird, S., Waterman, A., Klues, K., Datta, K., Liu, R., Nishtala, R., Williams, S., Asanovi, K., Demmel, J., Patterson, D., and Yelick, K. A case for sensible performance counters. Submitted to the First USENIX Workshop on Hot Topics in Parallelism (Berkeley CA, Mar. 30--31, 2009); www.usenix.org/events/hotpar09/.
|
| |
7
|
Eric L. Boyd , Waqar Azeem , Hsien-Hsin Lee , Tien-Pao Shih , Shih-Hao Hung , Edward S. Davidson, A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1, Proceedings of the 1994 International Conference on Parallel Processing, p.188-192, August 15-19, 1994
[doi> 10.1109/ICPP.1994.30]
|
| |
8
|
|
 |
9
|
|
| |
10
|
Chong, J. Private communication on financial PDE solvers, 2008.
|
| |
11
|
Colella, P. Defining Software Requirements for Scientific Computing, Presentation, 2004.
|
| |
12
|
Kaushik Datta , Mark Murphy , Vasily Volkov , Samuel Williams , Jonathan Carter , Leonid Oliker , David Patterson , John Shalf , Katherine Yelick, Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
| |
13
|
Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, R., and Yelick, K. Self-adapting linear algebra algorithms and software. Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Adaptation 93, 2 (2005).
|
| |
14
|
|
| |
15
|
Frigo, M. and Johnson, S. The design and implementation of FFTW3. Proceedings of the IEEE: Special Issue on Program Generation, Optimization, and Platform Adaptation 93, 2 (2005).
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
Little, J.D.C. A proof of the queueing formula L = λ W. Operations Research 9, 3 (1961), 383--387.
|
| |
22
|
McCalpin, J. STREAM: Sustainable Memory Bandwidth in High-Performance Computers, 1995; www.cs.virginia.edu/stream.
|
 |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
Richard Vuduc , James W. Demmel , Katherine A. Yelick , Shoaib Kamil , Rajesh Nishtala , Benjamin Lee, Performance optimizations and bounds for sparse matrix-vector multiply, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-35, November 16, 2002, Baltimore, Maryland
|
| |
27
|
Williams, S. Autotuning Performance on Multicore Computers, Ph.D. Thesis. University of California, Berkeley, Dec. 2008; www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-164.html.
|
| |
28
|
Williams, S., Carter, J., Oliker, L., Shalf, J., and Yelick, K. Lattice Boltzmann simulation optimization on leading multicore platforms. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing Symposium (Miami, FL, Apr. 14--18, 2008), 1--14.
|
 |
29
|
Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362674]
|
 |
30
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
CITED BY 6
|
|
|
|
|
|
|
|
Samuel Williams , Jonathan Carter , Leonid Oliker , John Shalf , Katherine Yelick, Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms, Journal of Parallel and Distributed Computing, v.69 n.9, p.762-777, September, 2009
|
|
|
Krste Asanovic , Rastislav Bodik , James Demmel , Tony Keaveny , Kurt Keutzer , John Kubiatowicz , Nelson Morgan , David Patterson , Koushik Sen , John Wawrzynek , David Wessel , Katherine Yelick, A view of the parallel computing landscape, Communications of the ACM, v.52 n.10, October 2009
|
|
|
|
|
|
Yoshiei Sato , Ryuichi Nagaoka , Akihiro Musa , Ryusuke Egawa , Hiroyuki Takizawa , Koki Okabe , Hiroaki Kobayashi, Performance tuning and analysis of future vector processors based on the roofline model, Proceedings of the 10th MEDEA workshop on MEmory performance: DEaling with Applications, systems and architecture, September 13-13, 2009, Raleigh, North Carolina
|
|