| Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology |
| Full text |
Pdf
(1.44 MB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 11th international conference on Supercomputing
table of contents
Vienna, Austria
Pages: 340 - 347
Year of Publication: 1997
ISBN:0-89791-902-5
|
|
Authors
|
|
Jeff Bilmes
|
CS Division, University of California at Berkeley, Berkeley, CA and International Computer Science Institute, Berkeley, CA
|
|
Krste Asanovic
|
CS Division, University of California at Berkeley, Berkeley, CA and International Computer Science Institute, Berkeley, CA
|
|
Chee-Whye Chin
|
CS Division, University of California at Berkeley, Berkeley, CA and International Computer Science Institute, Berkeley, CA
|
|
Jim Demmel
|
CS Division, University of California at Berkeley, Berkeley, CA and International Computer Science Institute, Berkeley, CA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 52, Citation Count: 69
|
|
|
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
ABB+92
|
E. Anderson , Z. Bai , C. Bischof , J. Demmel , J. Dongarra , J. Du Croz , A. Greenbaum , S. Hammarling , A. McKenney , S. Ostrouchov , D. Sorensen, LAPACK's user's guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1992
|
| |
ACF95
|
|
| |
AGZ94
|
R. Agarwal, F. Gustavson, and M. Zuber. IBM Engineering and Scientific Subroutine Library Guide and Reference, 1994. Available through IBM branch offices.
|
| |
BAD+
|
J. Bilmes, K. Asanoic J. Demmd, D. Lain, and C.Wo Chin. The PHiPAC WWW home page. http://www.icsi.berkely.edu/~bilmes/phipac
|
| |
BAD+96
|
J. Bilmes , K. Asanovic , Jim Demmel , D. L %A , C. Chin, Optimizing Matrix Multiply using PHiPAC: a Portable,High-Performance, ANSI C Coding Methodology, University of Tennessee, Knoxville, TN, 1996
|
| |
BLL93
|
B.Kagstorm, P. Ling, and C. Van Loan. Portable high performance GEMM-based level 3 BLAS. In R.F. Sincovec et M., editor, Parallel Processing for Scientific Computing, pages 339-346, Phfladdphia, 1993. SIAM Pubficafions.
|
| |
BLS91
|
|
| |
CDD+96
|
J. Choi , J. Demmel , I. Dhillon , J. Dongarra , S. Ostrouchov , A. Petitet , K. Staney , D. Walker , R. C. Whaley, LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance, University of Tennessee, Knoxville, TN, 1995
|
| |
CFH95
|
|
 |
DCDH90
|
|
 |
DCHHS88
|
|
| |
GL89
|
G.H. Gdub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1989.
|
| |
KHM94
|
|
 |
LHKK79
|
|
 |
LRW91
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
 |
MS95
|
|
| |
SMP+96
|
Rafael H. Saavedra-Barrera , Weihua Mao , Daeyeon Park , Jacqueline Chame , Sungdo Moon, The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching, Proceedings of the 10th International Parallel Processing Symposium, p.39-45, April 15-19, 1996
|
 |
WL91
|
|
| |
Wol96
|
|
CITED BY 69
|
|
|
|
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
|
|
|
|
|
|
|
|
|
|
|
|
Richard Vuduc , James W. Demmel , Katherine A. Yelick , Shoaib Kamil , Rajesh Nishtala , Benjamin Lee, Performance optimizations and bounds for sparse matrix-vector multiply, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-35, November 16, 2002, Baltimore, Maryland
|
|
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
|
|
|
|
|
|
Shigeo Itou , Satoshi Matsuoka , Hirokazu Hasegawa, AJaPACK: experiments in performance portable parallel Java numerical libraries, Proceedings of the ACM 2000 conference on Java Grande, p.140-149, June 03-04, 2000, San Francisco, California, United States
|
|
|
Michael A. Bender , Giridhar Pemmasani , Steven Skiena , Pavel Sumazin, Finding least common ancestors in directed acyclic graphs, Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, p.845-854, January 07-09, 2001, Washington, D.C., United States
|
|
|
Christian Weiß , Wolfgang Karl , Markus Kowarschik , Ulrich Rüde, Memory characteristics of iterative methods, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.31-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
Gerald Baumgartner , David E. Bernholdt , Daniel Cociorva , Robert Harrison , So Hirata , Chi-Chung Lam , Marcel Nooijen , Russell Pitzer , J. Ramanujam , P. Sadayappan, A high-level approach to synthesis of high-performance codes for quantum chemistry, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-10, November 16, 2002, Baltimore, Maryland
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Markus Püschel , José M. F. Moura , Bryan Singer , Jianxin Xiong , Jeremy Johnson , David Padua , Manuela Veloso , Robert W. Johnson, Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms, International Journal of High Performance Computing Applications, v.18 n.1, p.21-45, February 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
P. M. W. Knijnenburg , T. Kisuki , M. F. P. O'Boyle, Iterative compilation, Embedded processor design challenges: systems, architectures, modeling, and simulation-SAMOS, Springer-Verlag New York, Inc., New York, NY, 2002
|
|
|
|
|
|
|
|
|
|
|
|
J. Dongarra , G. Bosilca , Z. Chen , V. Eijkhout , G. E. Fagg , E. Fuentes , J. Langou , P. Luszczek , J. Pjesivac-Grbovic , K. Seymour , H. You , S. S. Vadhiyar, Self-adapting numerical software (SANS) effort, IBM Journal of Research and Development, v.50 n.2/3, p.223-238, March 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lamia Youseff , Keith Seymour , Haihang You , Jack Dongarra , Rich Wolski, The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software, Proceedings of the 17th international symposium on High performance distributed computing, June 23-27, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Muthu Manikandan Baskaran , Uday Bondhugula , Sriram Krishnamoorthy , J. Ramanujam , Atanas Rountev , P. Sadayappan, Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
DaeGon Kim , Lakshminarayanan Renganarayanan , Dave Rostron , Sanjay Rajopadhye , Michelle Mills Strout, Multi-level tiling: M for the price of one, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
|
|
|
|
|
|
|
|
|
Manman Ren , Ji Young Park , Mike Houston , Alex Aiken , William J. Dally, A tuning framework for software-managed memory hierarchies, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
Hans Zima , Mary Hall , Chun Chen , Jaqueline Chame, Model-guided autotuning of high-productivity languages for petascale computing, Proceedings of the 18th ACM international symposium on High performance distributed computing, p.151-166, June 11-13, 2009, Garching, Germany
|
|
|
Jason Ansel , Cy Chan , Yee Lok Wong , Marek Olszewski , Qin Zhao , Alan Edelman , Saman Amarasinghe, PetaBricks: a language and compiler for algorithmic choice, ACM SIGPLAN Notices, v.44 n.6, June 2009
|
|
|
Lamia Youseff , Keith Seymour , Haihang You , Dmitrii Zagorodnov , Jack Dongarra , Rich Wolski, Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software, Cluster Computing, v.12 n.2, p.101-122, June 2009
|
|