APPENDICES and SUPPLEMENTS
|
|
The proof of Theorem 3.1 is given in an electronic appendix, available online in the ACM Digital Library.
|
ABSTRACT
In the past two decades, some major efforts have been made to reduce exact (e.g. integer, rational, polynomial) linear algebra problems to matrix multiplication in order to provide algorithms with optimal asymptotic complexity. To provide efficient implementations of such algorithms one need to be careful with the underlying arithmetic. It is well known that modular techniques such as the Chinese remainder algorithm or the p-adic lifting allow very good practical performance, especially when word size arithmetic is used. Therefore, finite field arithmetic becomes an important core for efficient exact linear algebra libraries. In this article, we study high performance implementations of basic linear algebra routines over word size prime fields: especially matrix multiplication; our goal being to provide an exact alternate to the numerical BLAS library. We show that this is made possible by a careful combination of numerical computations and asymptotically faster algorithms. Our kernel has several symbolic linear algebra applications enabled by diverse matrix multiplication reductions: symbolic triangularization, system solving, determinant, and matrix inverse implementations are thus studied.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
E. Anderson , Z. Bai , C. Bischof , L. S. Blackford , J. Demmel , Jack J. Dongarra , J. Du Croz , S. Hammarling , A. Greenbaum , A. McKenney , D. Sorensen, LAPACK Users' guide (third ed.), Society for Industrial and Applied Mathematics, Philadelphia, PA, 1999
|
| |
3
|
|
| |
4
|
Brassel, M., Giorgi, P., and Pernet, C. 2003. LUdivine: A symbolic block LU factorisation for matrices over finite fields using BLAS. Poster, http://ljk.imag.fr/membres/Jean−Guillaume.Dumas/FFLAS/FFLAS_Download/ludivine_poster_eccad2003.ps.gz.
|
| |
5
|
Bunch, J. R. and Hopcroft, J. E. 1974. Triangular factorization and inversion by fast matrix multiplication. Mathematics of Computation 28, 231--236.
|
| |
6
|
Chen, Z. and Storjohann, A. 2003. Effective reductions to matrix multiplication. 9th International Conference on Applications of Computer Algebra (ACA’2003), Raleigh, North Carolina State University.
|
| |
7
|
|
| |
8
|
Courrieu, P. 2005. Fast computation of Moore-Penrose inverse matrices. Neural Information Processing - Letters and Reviews 8, 2 (Aug.), 25--29.
|
| |
9
|
Dixon, J. D. 1982. Exact solution of linear equations using p-adic expansions. Numerische Mathematik 40, 137--141.
|
| |
10
|
Dongarra, J. and Eijkhout, V. 2003. Self-adapting numerical software and automatic tuning of heuristics. Lecture Notes in Computer Science 2660, 759--770.
|
 |
11
|
|
| |
12
|
|
| |
13
|
Dumas, J.-G., , Giorgi, P., and Pernet, C. 2006. FFLAS-FFPACK: Finite field linear algebra subroutine/package. Software, http://ciel.ccsd.cnrs.fr/ciel−00000025. Feb.
|
| |
14
|
Dumas, J.-G. 2004. Efficient dot product over finite fields. In Proceedings of the Seventh International Workshop on Computer Algebra in Scientific Computing, (Yalta, Ukraine). V. G. Ganzha, E. W. Mayr, and E. V. Vorozhtsov, Eds. Technische Universität München, Germany, 139--154.
|
| |
15
|
Dumas, J.-G. 2007. Q-adic transform revisited. Tech. Rep. 0710.0510 {cs.SC}, ArXiv. Oct. http://hal.archives-ouvertes.fr/hal-00173894.
|
| |
16
|
Dumas, J.-G., Gautier, T., Giesbrecht, M., Giorgi, P., Hovinen, B., Kaltofen, E., Saunders, B. D., Turner, W. J., and Villard, G. 2002a. LinBox: A generic library for exact linear algebra. In Proceedings of the 2002 International Congress of Mathematical Software (Beijing, China). A. M. Cohen, X.-S. Gao, and N. Takayama, Eds. World Scientific Pub, 40--50.
|
 |
17
|
|
 |
18
|
|
| |
19
|
Dumas, J.-G., Pernet, C., and Roch, J.-L. 2006. Adaptive triangular system solving. In Challenges in Symbolic Computation Software. Dagstuhl Seminar proceedings 06271, paper 770.
|
 |
20
|
|
| |
21
|
Dumas, J.-G., Pernet, C., and Zhou, W. 2007. Memory efficient scheduling of Strassen-Winograd’s matrix multiplication algorithm. Tech. rep., arXiv:0707.2347v2. Aug. http://arxiv.org/abs/0707.2347v2.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Giorgi, P. 2003. From BLAS routines to finite field exact linear algebra solutions. 9th International Conference on Applications of Computer Algebra (ACA’2003), Raleigh, North Carolina State University.
|
 |
26
|
Pascal Giorgi , Claude-Pierre Jeannerod , Gilles Villard, On the complexity of polynomial matrix computations, Proceedings of the 2003 international symposium on Symbolic and algebraic computation, p.135-142, August 03-06, 2003, Philadelphia, PA, USA
[doi> 10.1145/860854.860889]
|
| |
27
|
|
| |
28
|
Goto, K. and van de Geijn, R. 2002. On reducing tlb misses in matrix multiplication. Tech. Rep. TR-2002-55, University of Texas. Nov. FLAME working note #9.
|
| |
29
|
Fred G. Gustavson , André Henriksson , Isak Jonsson , Bo Kågström , Per Ling, Recursive Blocked Data Formats and BLAS's for Dense Linear Algebra Algorithms, Proceedings of the 4th International Workshop on Applied Parallel Computing, Large Scale Scientific and Industrial Problems, p.195-206, June 14-17, 1998
|
 |
30
|
|
| |
31
|
Steven Huss-Lederman , Elaine M. Jacobson , Anna Tsao , Thomas Turnbull , Jeremy R. Johnson, Implementation of Strassen's algorithm for matrix multiplication, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.32-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/369028.369096]
|
| |
32
|
Huss-Lederman, S., Jacobson, E. M., Johnson, J. R., Tsao, A., and Turnbull, T. 1996b. Strassen’s algorithm for matrix multiplication : modeling analysis, and implementation. Tech. rep., Center for Computing Sciences. Nov. CCS-TR-96-17.
|
| |
33
|
Ibarra, O. H., Moran, S., and Hui, R. 1982. A generalization of the fast LUP matrix decomposition algorithm and applications. J. Algor. 3, 1 (Mar.), 45--56.
|
| |
34
|
|
| |
35
|
|
| |
36
|
Laderman, J., Pan, V., and Sha, X.-H. 1992. On practical algorithms for accelerated matrix multiplication. Linear Algebra Appl. 162--164, 557--588.
|
| |
37
|
Montgomery, P. L. 1985. Modular multiplication without trial division. Math. Computat. 44, 170 (Apr.), 519--521.
|
| |
38
|
Montgomery, P. L. 1995. A block Lanczos algorithm for finding dependencies over gf(2). In Proceedings of the 1995 International Conference on the Theory and Application of Cryptographic Techniques (Saint-Malo, France). L. C. Guillou and J.-J. Quisquater, Eds. Lecture Notes in Computer Science, vol. 921. 106--120.
|
| |
39
|
Noble, B. 1966. A method for computing the generalized inverse of a matrix. SIAM J. Numer. Anal. 3, 4 (Dec.), 582--584.
|
| |
40
|
|
| |
41
|
Pernet, C. 2001. Implementation of Winograd’s matrix multiplication over finite fields using ATLAS level 3 BLAS. Tech. Rep. RR011122, Laboratoire Informatique et Distribution. July. http://ljk.imag.fr/membres/Jean−Guillaume.Dumas/FFLAS/FFLAS_Download/FFLAS_technical_report.ps.gz.
|
 |
42
|
|
| |
43
|
Shoup, V. 2002. NTL 5.3: A library for doing number theory. www.shoup.net/ntl.
|
| |
44
|
|
| |
45
|
Strassen, V. 1969. Gaussian elimination is not optimal. Numerische Mathematik 13, 354--356.
|
| |
46
|
Whaley, R. C., Petitet, A., and Dongarra, J. J. 2001. Automated empirical optimizations of software and the ATLAS project. Para. Comput. 27, 1--2 (Jan.), 3--35. http://www.netlib.org/utk/people/JackDongarra/PAPERS/atlas_pub.pdf.
|
| |
47
|
Zassenhaus, H. 1978. A remark on the Hensel factorization method. Mathematics of Computation 32, 141 (Jan.), 287--292.
|
|