| Optimization of sparse matrix-vector multiplication on emerging multicore platforms |
| Full text |
Pdf
(800 KB)
|
Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2007 ACM/IEEE conference on Supercomputing - Volume 00
table of contents
Reno, Nevada
SESSION: Benchmarking
table of contents
Article No. 38
Year of Publication: 2007
ISBN:978-1-59593-764-3
|
|
Authors
|
|
Samuel Williams
|
Lawrence Berkeley National Laboratory, Berkeley, CA and University of California at Berkeley, Berkeley, CA
|
|
Leonid Oliker
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
Richard Vuduc
|
Lawrence Livermore National Laboratory, Livermore, CA
|
|
John Shalf
|
Lawrence Berkeley National Laboratory, Berkeley, CA
|
|
Katherine Yelick
|
Lawrence Berkeley National Laboratory, Berkeley, CA and University of California at Berkeley, Berkeley, CA
|
|
James Demmel
|
University of California at Berkeley, Berkeley, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 36, Downloads (12 Months): 288, Citation Count: 12
|
|
|
ABSTRACT
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientific study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
K. Asanovic, R. Bodik, B. Catanzaro, et al. The landscape of parallel computing research: A view from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 2006.
|
| |
2
|
D. Bailey. Little's law and high performance computing. In RNR Technical Report, 1997.
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
R. Geus and S. Röllin. Towards a fast parallel sparse matrix-vector multiplication. In E. H. D'Hollander, J. R. Joubert, F. J. Peters, and H. Sips, editors, Proceedings of the International Conference on Parallel Computing (ParCo), pages 308--315. Imperial College Press, 1999.
|
 |
7
|
|
| |
8
|
Michael Gschwind , H. Peter Hofstee , Brian Flachs , Martin Hopkins , Yukio Watanabe , Takeshi Yamazaki, Synergistic Processing in Cell's Multicore Architecture, IEEE Micro, v.26 n.2, p.10-24, March 2006
[doi> 10.1109/MM.2006.41]
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix vector multiply using unroll-and-jam. In Proc. LACSI Symposium, Santa Fe, NM, USA, October 2002.
|
| |
13
|
|
 |
14
|
|
| |
15
|
D. J. Rose. A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations. Graph Theory and Computing, pages 183--217, 1973.
|
| |
16
|
|
| |
17
|
S. Toledo. Improving memory-system performance of sparse matrix-vector multiplication. In Eighth SIAM Conference on Parallel Processing for Scientific Computing, March 1997.
|
| |
18
|
|
| |
19
|
|
| |
20
|
R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC 2005, Journal of Physics: Conference Series, San Francisco, CA, June 2005.
|
| |
21
|
R. Vuduc, A. Gyulassy, J. W. Demmel, and K. A. Yelick. Memory hierarchy optimizations and bounds for sparse ATAx. In Proceedings of the ICCS Workshop on Parallel Linear Algebra, volume LNCS, Melbourne, Australia, June 2003. Springer.
|
| |
22
|
R. Vuduc, S. Kamil, J. Hsu, R. Nishtala, J. W. Demmel, and K. A. Yelick. Automatic performance tuning and analysis of sparse triangular solve. In ICS 2002: Workshop on Performance Optimization via High-Level Languages and Libraries, New York, USA, June 2002.
|
 |
23
|
|
| |
24
|
J. W. Willenbring, A. A. Anda, and M. Heroux. Improving sparse matrix-vector product kernel performance and availabillity. In Proc. Midwest Instruction and Computing Symposium, Mt. Pleasant, IA, 2006.
|
| |
25
|
Samuel Williams , John Shalf , Leonid Oliker , Shoaib Kamil , Parry Husbands , Katherine Yelick, Scientific computing Kernels on the cell processor, International Journal of Parallel Programming, v.35 n.3, p.263-298, June 2007
[doi> 10.1017/s10766-007-0034-5]
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
Kaushik Datta , Mark Murphy , Vasily Volkov , Samuel Williams , Jonathan Carter , Leonid Oliker , David Patterson , John Shalf , Katherine Yelick, Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, Proceedings of the 2008 ACM/IEEE conference on Supercomputing, November 15-21, 2008, Austin, Texas
|
|
|
|
|
|
|
|
|
Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Parallel Computing, v.35 n.3, p.178-194, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Samuel Williams , Jonathan Carter , Leonid Oliker , John Shalf , Katherine Yelick, Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms, Journal of Parallel and Distributed Computing, v.69 n.9, p.762-777, September, 2009
|
|