| Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems |
| Full text |
Pdf
(683 KB)
|
Source
|
International Conference on Supercomputing
archive
Proceedings of the 22nd annual international conference on Supercomputing
table of contents
Island of Kos, Greece
SESSION: Algorithms & applications 2
table of contents
Pages 195-204
Year of Publication: 2008
ISBN:978-1-60558-158-3
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 168, Citation Count: 0
|
|
|
ABSTRACT
Sparse matrix-vector (SpMV) multiplication is a widely used kernel in scientific applications. In these applications, the SpMV multiplication is usually deeply nested within multiple loops and thus executed a large number of times. We have observed that there can be significant performance variability, due to irregular memory access patterns. Static performance optimizations are difficult because the patterns may be known only at runtime. In this paper, we propose adaptive runtime tuning mechanisms to improve the parallel performance on distributed memory systems. Our adaptive iteration-to-process mapping mechanism balances computational load at runtime with negligible overhead (1% on average), and our runtime communication selection algorithm searches for the best communication method for a given data distribution and mapping. Actual runs on 26 real matrices show that our runtime tuning system reduces execution time up to 68.8% (30.9% on average) over a base block-distributed parallel algorithm on distributed systems with 32 nodes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The message passing interface (MPI) standard {online}. available: http://www-unix.mcs.anl.gov/mpi/.
|
| |
2
|
J. I. Aliaga and V. Hernandez. Symmetric sparse matrix-vector product on distributed memory multiprocessors. Conf. on Parallel Computing and Transputer Applications, 1992.
|
| |
3
|
R. H. Bisseling and W. Meesen. Communication balancing in parallel sparse matrix-vector multiplication. Electronic Transactions on Numerical Analysis, 21:47--65, 2005.
|
| |
4
|
E. G. Boman and U. Catalyurek. Constrained fine-grain parallel sparse matrix distribution. SIAM Workshop on Combinatorial Scientific Computing, 2007.
|
| |
5
|
|
| |
6
|
T. Davis. University of Florida Sparse Matrix Collection {online}. available: http://www.cise.ufl.edu/research/sparse/matrices/.
|
| |
7
|
G. E. Fagg, J. Pjesivac-Grbovic, G. Bosilca, T. Angskun, J. J. Dongarra, and E. Jeannot. Flexible collective communication tuning architecture applied to Open MPI. Euro PVM/MPI, 2006.
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
D. Jin and S. G. Ziavras. A super-programming technique for large sparse matrix multiplication on PC clusters. IEICE Trans. Inf. & Syst., E87-D(7), July 2004.
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
R. Rabenseifner. Automatic MPI counter profiling of all users: First results on a CRAY T3E 900-512. Euro PVM/MPI, pages 35--42, 1999.
|
| |
16
|
|
| |
17
|
R. Shahnaz, A. Usman, and I. R. Chughtai. Implementation and evaluation of parallel sparse matrix-vector products on distributed memory parallel computers. IEEE Int. Conf. on Cluster Computing, pages 1--6, 2006.
|
| |
18
|
|
| |
19
|
Sathish S. Vadhiyar , Graham E. Fagg , Jack Dongarra, Automatically tuned collective communications, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.3-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
20
|
|
 |
21
|
Samuel Williams , Leonid Oliker , Richard Vuduc , John Shalf , Katherine Yelick , James Demmel, Optimization of sparse matrix-vector multiplication on emerging multicore platforms, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362674]
|
| |
22
|
|
|