|
ABSTRACT
Pattern-based Representation (PBR) is a novel approach to improving the performance of Sparse Matrix-Vector Multiply (SMVM) numerical kernels. Motivated by our observation that many matrices can be divided into blocks that share a small number of distinct patterns, we generate custom multiplication kernels for frequently recurring block patterns. The resulting reduction in index overhead significantly reduces memory bandwidth requirements and improves performance. Unlike existing methods, PBR requires neither detection of dense blocks nor zero filling, making it particularly advantageous for matrices that lack dense nonzero concentrations. SMVM kernels for PBR can benefit from explicit prefetching and vectorization, and are amenable to parallelization. We present sequential and parallel performance results for PBR on two current multicore architectures, which show that PBR outperforms available alternatives for the matrices to which it is applicable.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. C. Agarwal , F. G. Gustavson , M. Zubair, A high performance algorithm using pre-processing for the sparse matrix-vector multiplication, Proceedings of the 1992 ACM/IEEE conference on Supercomputing, p.32-41, November 16-20, 1992, Minneapolis, Minnesota, United States
|
| |
2
|
A. Antoulas, D. Sorensen, and S. Gugercin. A survey of model reduction methods for large-scale systems. In Structured Matrices in Operator Theory, Numerical Analysis, Control, Signal and Image Processing, Contemporary Mathematics, number 280, pages 193--219. AMS Publications, 2001.
|
| |
3
|
C. Beattie and S. Gugercin. Inexact solves in krylov-based model reduction. In Proceedings of the 45th IEEE Conference on Decision and Control, pages 3405--3411, December 2006.
|
 |
4
|
|
| |
5
|
|
| |
6
|
T. Davis. The University of Florida sparse matrix collection. http://www.cise.u .edu/research/sparse/matrices.
|
| |
7
|
E. F. D'Azevedo, M. R. Fahey, and R. T. Mills. Vectorized sparse matrix multiply for compressed row storage format. In International Conference on Computational Science, May 2005.
|
 |
8
|
Y. Fukui , H. Yoshida , S. Higono, Supercomputing of circuits simulation, Proceedings of the 1989 ACM/IEEE conference on Supercomputing, p.81-85, November 12-17, 1989, Reno, Nevada, United States
[doi> 10.1145/76263.76272]
|
| |
9
|
|
| |
10
|
T. Gneiting and A. E. Raftery. Weather forecasting with ensemble methods. Science (Washington, D.C.), 310(5746):248, 2005.
|
| |
11
|
Georgios Goumas , Kornilios Kourtis , Nikos Anastopoulos , Vasileios Karakasis , Nectarios Koziris, Understanding the Performance of Sparse Matrix-Vector Multiplication, Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008), p.283-292, February 13-15, 2008
[doi> 10.1109/PDP.2008.41]
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.
|
| |
19
|
|
| |
20
|
H. Merlitz, T. Herges, and W. Wenzel. Fluctuation analysis and accuracy of a large-scale in silico screen. Journal of Computational Chemistry, 25(13):1568, October 2004.
|
| |
21
|
H. Meuer, E. Strohmaier, J. Dongarra, and H. Simon. Top500 supercomputing sites, Nov. 2008. http://www.top500.org/.
|
| |
22
|
R. Nishtala, R. Vuduc, J. Demmel, and K. Yelick. Performance modeling and analysis of cache blocking in sparse matrix vector multiply. Technical report, Berkeley, EECS Dept., 2004.
|
| |
23
|
R. Nishtala, R. Vuduc, J. W. Demmel, and K. A. Yelick. When cache blocking sparse matrix vector multiply works and why. In Proceedings of the PARA'04: Workshop on the State-of-the-art in Scientific Computing, 2004.
|
 |
24
|
|
 |
25
|
|
| |
26
|
D. J. Rose. A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations. Graph Theory and Computing, pages 183--217, 1973.
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, San Francisco, CA, USA, June 2005. Institute of Physics Publishing.
|
| |
31
|
R. Vuduc, S. Kamil, J. Hsu, R. Nishtala, J. W. Demmel, and K. A. Yelick. Automatic performance tuning and analysis of sparse triangular solve. In ICS 2002: Workshop on Performance Optimization via High-Level Languages and Libraries, 2002.
|
| |
32
|
R. W. Vuduc and H.-J. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In High Performance Computing and Communcations, volume 3726 of Lecture Notes in Computer Science, pages 807--816. Springer, 2005. Editors: Laurence Tianruo Yang and Omer F. Rana and Beniamino Di Martino and Jack Dongarra.
|
 |
33
|
|
 |
34
|
Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362674]
|
 |
35
|
|
INDEX TERMS
Primary Classification:
G.
Mathematics of Computing
G.1
NUMERICAL ANALYSIS
G.1.3
Numerical Linear Algebra
Subjects:
Sparse, structured, and very large systems (direct and iterative methods)
Additional Classification:
C.
Computer Systems Organization
C.4
PERFORMANCE OF SYSTEMS
Subjects:
Performance attributes
General Terms:
Algorithms,
Performance
Keywords:
bcoo,
bcsr,
csr,
iterative solvers,
matrix splitting,
memory bandwidth,
pbr,
prefetching,
smvm,
sparse computations,
spmv,
vectorization
|