ACM Home Page
Please provide us with feedback. Feedback
Adaptive Strassen's matrix multiplication
Full text PdfPdf (770 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 21st annual international conference on Supercomputing table of contents
Seattle, Washington
SESSION: Application optimization table of contents
Pages: 284 - 292  
Year of Publication: 2007
ISBN:978-1-59593-768-1
Authors
Paolo D'Alberto  Yahoo! Sunnyvale, CA
Alexandru Nicolau  University of California, Irvine
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 125,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1274971.1275010
What is a DOI?

ABSTRACT

Strassen's matrix multiplication (MM) has benefits with respect to any (highly tuned) implementations of MM because Strassen's reduces the total number of operations. Strassen achieved this operation reduction by replacing computationally expensive MMs with matrix additions (MAs). For architectures with simple memory hierarchies, having fewer operations directly translates into an efficient utilization of the CPU and, thus, faster execution. However, for modern architectures with complex memory hierarchies, the operations introduced by the MAs have a limited in-cache data reuse and thus poor memory-hierarchy utilization, thereby overshadowing the (improved) CPU utilization, and making Strassen's algorithm (largely) useless on its own.

In this paper, we investigate the interaction between Strassen's effective performance and the memory-hierarchy organization. We show how to exploit Strassen's full potential across different architectures. We present an easy-to-use adaptive algorithm that combines a novel implementation of Strassen's idea with the MM from automatically tuned linear algebra software (ATLAS) or GotoBLAS. An additional advantage of our algorithm is that it applies to any size and shape matrices and works equally well with row or column major layout. Our implementation consists of introducing a final step in the ATLAS/GotoBLAS-installation process that estimates whether or not we can achieve any additional speedup using our Strassen's adaptation algorithm. Then we install our codes, validate our estimates, and determine the specific performance.

We show that, by the right combination of Strassen's with ATLAS/GotoBLAS, our approach achieves up to 30%/22% speed-up versus ATLAS/GotoBLAS alone on modern high-performance single processors. We consider and present the complexity and the numerical analysis of our algorithm, and, finally, we show performance for 17 (uniprocessor) systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
 
5
R. P. Brent. Error analysis of algorithms for matrix multiplication and triangular decomposition using Winograd's identity. Numerische Mathematik, 16:145--156, 1970.
6
 
7
H. Cohn, R. Kleinberg, B. Szegedy, and C. Umans. Group-theoretic algorithms for matrix multiplication, Nov 2005.
8
 
9
 
10
P. D'Alberto and A. Nicolau. Using recursion to boost ATLAS's performance. In The Sixth International Symposium on High Performance Computing (ISHPC-VI), 2005.
 
11
J. Demmel, J. Dongarra, E. Eijkhout, E. Fuentes, E. Petitet, V. Vuduc, R. C. Whaley, and K. Yelick. Self-Adapting linear algebra algorithms and software. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2), 2005.
 
12
J. Demmel, J. Dumitriu, O. Holtz, and R. Kleinberg. Fast matrix multiplication is stable, Mar 2006.
13
 
14
N. Eiron, M. Rodeh, and I. Steinwarts. Matrix multiplication: a case study of algorithm engineering. In Proceedings WAE'98, Saarbrücken, Germany, Aug 1998.
15
 
16
M. Frigo and S. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):216--231, 2005.
 
17
 
18
K. Goto and R. A. van de Geijn. Anatomy of high-performance matrix multiplication. ACM Transactions on Mathematical Software.
19
20
 
21
 
22
 
23
I. Kaporin. A practical algorithm for faster matrix multiplication. Numerical Linear Algebra with Applications, 6(8):687--700, 1999. Centre for Supercomputer and Massively Parallel Applications, Computing Centre of the Russian Academy of Sciences, Vavilova 40, Moscow 117967, Russia.
 
24
 
25
P. Knight. Fast rectangular matrix multiplication and QR-decomposition. Linear algebra and its applications, 221:69--81, 1995.
 
26
 
27
V. Pan. Strassen's algorithm is not optimal: Trililnear technique of aggregating, uniting and canceling for constructing fast algorithms for matrix operations. In FOCS, pages 166--176, 1978.
 
28
 
29
D. Priest. Algorithms for arbitrary precision floating point arithmetic. In P. Kornerup and D. W. Matula, editors, Proceedings of the 10th IEEE Symposium on Computer Arithmetic (Arith-10), pages 132--144, Grenoble, France, 1991. IEEE Computer Society Press, Los Alamitos, CA.
 
30
M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. W. Singer, J. Xiong, F. Franchetti, A. Gačić, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2), 2005.
 
31
V. Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 14(3):354--356, 1969.
 
32
 
33



REVIEW

"James Harold Davenport : Reviewer"

This paper studies how Strassen’s matrix multiplication, a divide-and-conquer approach, can be used in practice on real machines with today’s architectures. The authors are essentially only considering single processors. Though some du  more...

Collaborative Colleagues:
Paolo D'Alberto: colleagues
Alexandru Nicolau: colleagues