|
ABSTRACT
Compute-intensive multi-dimensional summations that involve products of several arrays arise in the modeling of electronic structure of materials. Sometimes several alternative formulations of a computation, representing different space-time trade-offs, are possible. By computing and storing some intermediate arrays, reduction of the number of arithmetic operations is possible, but the size of intermediate temporary arrays may be prohibitively large. Loop fusion can be applied to reduce memory requirements, but that could impede effective tiling to minimize memory access costs. This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays. An algorithm is presented that addresses the selection of tile sizes and choice of loops for fusion, with the objective of minimizing cache misses while keeping the total memory usage within a given limit. Experimental results are reported that demonstrate the effectiveness of the combined loop tiling and fusion transformations performed by using the developed framework.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Nawaaz Ahmed , Nikolay Mateev , Keshav Pingali, Synthesizing transformations for locality enhancement of imperfectly-nested loop nests, Proceedings of the 14th international conference on Supercomputing, p.141-152, May 08-11, 2000, Santa Fe, New Mexico, United States
[doi> 10.1145/335231.335245]
|
 |
2
|
Jennifer M. Anderson , Saman P. Amarasinghe , Monica S. Lam, Data and computation transformations for multiprocessors, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.166-178, July 19-21, 1995, Santa Barbara, California, United States
|
| |
3
|
W. Aulbur. Parallel Implementation of Quasiparticle Calculations of Semiconductors and Insulators, Ph.D. Dissertation, Ohio State University, Columbus, OH, October 1996.
|
| |
4
|
K. L. Bak, P. Jorgensen, J. Olsen, W. Klopper. Accuracy of atomization energies and reaction enthalpies in standard and extrapolated electronic wave function/basis set calculations. J. Chem. Phys., Vol. 112, pp. 9229-9242, 2000.
|
| |
5
|
L. Carter, J. Ferrante and S. F. Hummel. Efficient Parallelism via Hierarchical Tiling. Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, Philadelphia, PA, February 1995.
|
 |
6
|
|
 |
7
|
Siddhartha Chatterjee , John R. Gilbert , Robert Schreiber , Shang-Hua Teng, Automatic array alignment in data-parallel programs, Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.16-28, March 1993, Charleston, South Carolina, United States
[doi> 10.1145/158511.158517]
|
 |
8
|
|
| |
9
|
T. H. Dunning, Jr. A roadmap for the calculation of molecular binding energies. J. Phys. Chem. A, 2000 (in press).
|
| |
10
|
J. Foresman and A. Frisch. Exploring Chemistry with Electronic Structure Methods: A Guide to Using Gaussian, Second Edition. Gaussian, Inc., Pittsburgh, PA, 1996.
|
| |
11
|
|
 |
12
|
Somnath Ghosh , Margaret Martonosi , Sharad Malik, Precise miss analysis for program transformations with caches of arbitrary associativity, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.228-239, October 02-07, 1998, San Jose, California, United States
|
| |
13
|
High Performance Computational Chemistry Group. NWChem, A computational chemistry package for parallel computers, Version 3.3, 1999. Pacific Northwest National Laboratory, Richland, WA 99352.
|
 |
14
|
|
| |
15
|
M. S. Hybertsen and S. G. Louie. Electronic Correlation in Semiconductors and Insulators: Band Gaps and Quasiparticle Energies. Phys. Rev. B, 34, 5390 (1986).
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
Induprakas Kodukula , Nawaaz Ahmed , Keshav Pingali, Data-centric multi-level blocking, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.346-357, June 16-18, 1997, Las Vegas, Nevada, United States
|
 |
20
|
Induprakas Kodukula , Keshav Pingali , Robert Cox , Dror Maydan, An experimental evaluation of tiling and shackling for memory hierarchy management, Proceedings of the 13th international conference on Supercomputing, p.482-491, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305243]
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
C. Lam, P. Sadayappan and R. Wenger. On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution. Parallel Processing Letters, Vol. 7 No. 2, pp. 157-168, 1997.
|
| |
26
|
C. Lam, P. Sadayappan and R. Wenger. Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines. Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, MN, March 1997.
|
 |
27
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
28
|
T. J. Lee and G. E. Scuseria. Achieving chemical accuracy with coupled cluster theory. In S. R. Langhoff (Ed.), Quantum Mechanical Electronic Structure Calculations with Chemical Accuracy, pp. 47-109, Kluwer Academic, 1997.
|
| |
29
|
|
 |
30
|
|
| |
31
|
J. M. L. Martin. In P. v. R. Schleyer, P. R. Schreiner, N. L. Allinger, T. Clark, J. Gasteiger, P. Kollman, H. F. Schaefer III (Eds.), Encyclopedia of Computational Chemistry. Wiley & Sons, Berne (Switzerland). Vol. 1, pp. 115-128, 1998.
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
 |
35
|
|
| |
36
|
K. A. Peterson and T. H. Dunning, Jr. (1997). The CO molecule: Role of basis set and correlation treatment in the calculation of molecular properties. J. Molec. Struct. (Theochem), Vol. 400, pp. 93-117.
|
 |
37
|
|
 |
38
|
|
| |
39
|
H. N. Rojas, R. W. Godby and R. J. Needs. Space-Time Method for Ab-Initio Calculations of Self-Energies and Dielectric Response Functions of Solids. Phys. Rev. Lett., 74, 1827, (1995).
|
| |
40
|
Michael W. Schmidt , Kim K. Baldridge , Jerry A. Boatz , Steven T. Elbert , Mark S. Gordon , Jan H. Jensen , Shiro Koseki , Nikita Matsunaga , Kiet A. Nguyen , Shujun Su , Theresa L. Windus , Michel Dupuis , John A. Montgomery, Jr., General atomic and molecular electronic structure system, Journal of Computational Chemistry, v.14 n.11, p.1347-1363, Nov. 1993
[doi> 10.1002/jcc.540141112]
|
| |
41
|
S. Singhai and K. S. McKinley. Loop Fusion for Parallelism and Locality. Mid-Atlantic States Student Workshop on Programming Languages and Systems, MASPLAS '96, April 1996.
|
| |
42
|
S. Singhai and K. S. McKinley. A Parameterized Loop Fusion Algorithm for Improving Parallelism and Cache Locality. The Computer Journal, 40(6):340-355, 1997.
|
 |
43
|
|
| |
44
|
J. F. Stanton, J. Gauss, J. D. Watts, M. Nooijen, N. Oliphant, S. A. Perera, P. G. Szalay, W. J. Lauderdale, S. A. Kucharski, S. R. Gwaltney, S. Beck, A. Balkov' a, D. E. Bernholdt, K. K. Baeck, P. Rozyczko, H. Sekino, C. Hober, and R. J. Bartlett. ACES II, a software product of the Quantum Theory Project, University of Florida. Integral packages included are VMOL (J. Alml of and P. R. Taylor); VPROPS (P. Taylor) ABACUS; (T. Helgaker, H. J. Aa. Jensen, P. Jorgensen, J. Olsen, and P. R. Taylor).
|
 |
45
|
|
| |
46
|
|
| |
47
|
|
CITED BY 6
|
|
Daniel Cociorva , Gerald Baumgartner , Chi-Chung Lam , P. Sadayappan , J. Ramanujam , Marcel Nooijen , David E. Bernholdt , Robert Harrison, Space-time trade-off optimization for a class of electronic structure calculations, ACM SIGPLAN Notices, v.37 n.5, May 2002
|
|
|
Gerald Baumgartner , David E. Bernholdt , Daniel Cociorva , Robert Harrison , So Hirata , Chi-Chung Lam , Marcel Nooijen , Russell Pitzer , J. Ramanujam , P. Sadayappan, A high-level approach to synthesis of high-performance codes for quantum chemistry, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-10, November 16, 2002, Baltimore, Maryland
|
|
|
|
|
|
Xiaoyang Gao , Swarup Kumar Sahoo , Chi-Chung Lam , J. Ramanujam , Qingda Lu , Gerald Baumgartner , P. Sadayappan, Performance modeling and optimization of parallel out-of-core tensor contractions, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
|
|
|
Sandhya Krishnan , Sriram Krishnamoorthy , Gerald Baumgartner , Chi-Chung Lam , J. Ramanujam , P. Sadayappan , Venkatesh Choppella, Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver, Journal of Parallel and Distributed Computing, v.66 n.5, p.659-673, May 2006
|
|
|
|
|