ACM Home Page
Please provide us with feedback. Feedback
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
Full text PdfPdf (862 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 23rd international conference on Supercomputing table of contents
Yorktown Heights, NY, USA
SESSION: Optimizing parallel applications table of contents
Pages 100-109  
Year of Publication: 2009
ISBN:978-1-60558-498-0
Authors
Mehmet Belgin  Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Godmar Back  Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Calvin J. Ribbens  Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 103,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1542275.1542294
What is a DOI?

ABSTRACT

Pattern-based Representation (PBR) is a novel approach to improving the performance of Sparse Matrix-Vector Multiply (SMVM) numerical kernels. Motivated by our observation that many matrices can be divided into blocks that share a small number of distinct patterns, we generate custom multiplication kernels for frequently recurring block patterns. The resulting reduction in index overhead significantly reduces memory bandwidth requirements and improves performance. Unlike existing methods, PBR requires neither detection of dense blocks nor zero filling, making it particularly advantageous for matrices that lack dense nonzero concentrations. SMVM kernels for PBR can benefit from explicit prefetching and vectorization, and are amenable to parallelization. We present sequential and parallel performance results for PBR on two current multicore architectures, which show that PBR outperforms available alternatives for the matrices to which it is applicable.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
A. Antoulas, D. Sorensen, and S. Gugercin. A survey of model reduction methods for large-scale systems. In Structured Matrices in Operator Theory, Numerical Analysis, Control, Signal and Image Processing, Contemporary Mathematics, number 280, pages 193--219. AMS Publications, 2001.
 
3
C. Beattie and S. Gugercin. Inexact solves in krylov-based model reduction. In Proceedings of the 45th IEEE Conference on Decision and Control, pages 3405--3411, December 2006.
4
 
5
 
6
T. Davis. The University of Florida sparse matrix collection. http://www.cise.u .edu/research/sparse/matrices.
 
7
E. F. D'Azevedo, M. R. Fahey, and R. T. Mills. Vectorized sparse matrix multiply for compressed row storage format. In International Conference on Computational Science, May 2005.
8
 
9
 
10
T. Gneiting and A. E. Raftery. Weather forecasting with ensemble methods. Science (Washington, D.C.), 310(5746):248, 2005.
 
11
12
 
13
 
14
 
15
16
 
17
 
18
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.
 
19
 
20
H. Merlitz, T. Herges, and W. Wenzel. Fluctuation analysis and accuracy of a large-scale in silico screen. Journal of Computational Chemistry, 25(13):1568, October 2004.
 
21
H. Meuer, E. Strohmaier, J. Dongarra, and H. Simon. Top500 supercomputing sites, Nov. 2008. http://www.top500.org/.
 
22
R. Nishtala, R. Vuduc, J. Demmel, and K. Yelick. Performance modeling and analysis of cache blocking in sparse matrix vector multiply. Technical report, Berkeley, EECS Dept., 2004.
 
23
R. Nishtala, R. Vuduc, J. W. Demmel, and K. A. Yelick. When cache blocking sparse matrix vector multiply works and why. In Proceedings of the PARA'04: Workshop on the State-of-the-art in Scientific Computing, 2004.
24
25
 
26
D. J. Rose. A graph-theoretic study of the numerical solution of sparse positive definite systems of linear equations. Graph Theory and Computing, pages 183--217, 1973.
 
27
 
28
 
29
 
30
R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proceedings of SciDAC 2005, Journal of Physics: Conference Series, San Francisco, CA, USA, June 2005. Institute of Physics Publishing.
 
31
R. Vuduc, S. Kamil, J. Hsu, R. Nishtala, J. W. Demmel, and K. A. Yelick. Automatic performance tuning and analysis of sparse triangular solve. In ICS 2002: Workshop on Performance Optimization via High-Level Languages and Libraries, 2002.
 
32
R. W. Vuduc and H.-J. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In High Performance Computing and Communcations, volume 3726 of Lecture Notes in Computer Science, pages 807--816. Springer, 2005. Editors: Laurence Tianruo Yang and Omer F. Rana and Beniamino Di Martino and Jack Dongarra.
33
34
35

Collaborative Colleagues:
Mehmet Belgin: colleagues
Godmar Back: colleagues
Calvin J. Ribbens: colleagues