|
ABSTRACT
We develop from first principles an exact model of the behavior of loop nests executing in a memory hicrarchy, by using a nontraditional classification of misses that has the key property of composability. We use Presburger formulas to express various kinds of misses as well as the state of the cache at the end of the loop nest. We use existing tools to simplify these formulas and to count cache misses. The model is powerful enough to handle imperfect loop nests and various flavors of non-linear array layouts based on bit interleaving of array indices. We also indicate how to handle modest levels of associativity, and how to perform limited symbolic analysis of cache behavior. The complexity of the formulas relates to the static structure of the loop nest rather than to its dynamic trip count, allowing our model to gain efficiency in counting cache misses by exploiting repetitive patterns of cache behavior. Validation against cache simulation confirms the exactness of our formulation. Our method can serve as the basis for a static performance predictor to guide program and data transformations to improve performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
Tod Amon , Gaetano Borriello , Taokuan Hu , Jiwen Liu, Symbolic timing verification of timing diagrams using Presburger formulas, Proceedings of the 34th annual conference on Design automation, p.226-231, June 09-13, 1997, Anaheim, California, United States
[doi> 10.1145/266021.266071]
|
 |
7
|
Tod Amon , Gaetano Borriello , Jiwen Liu, Making complex timing relationships readable: Presburger formula simplicication using don't cares, Proceedings of the 35th annual conference on Design automation, p.586-590, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277198]
|
| |
8
|
|
| |
9
|
|
| |
10
|
P. Boulet and X. Redon. SPPoC: fonctionnemen et applications. Research Report 00-04, LIFL (Laboratoire de Recherche en Informatique de l'Universite des Sciences et Technologies de Lille), 2000. In French. Also see http://www.lifl.fr/west/sppoc/.
|
| |
11
|
M. Brehob and R. Enbody. A mathematical model of locality and caching. Technical Report TR-MSU-CPS-96-TBD, Michigan State University, Nov. 1996.
|
| |
12
|
|
 |
13
|
Steve Carr , Kathryn S. McKinley , Chau-Wen Tseng, Compiler optimizations for improving data locality, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.252-262, October 05-07, 1994, San Jose, California, United States
|
 |
14
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305231]
|
 |
15
|
Siddhartha Chatterjee , Alvin R. Lebeck , Praveen K. Patnala , Mithuna Thottethodi, Recursive array layouts and fast parallel matrix multiplication, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.222-231, June 27-30, 1999, Saint Malo, France
[doi> 10.1145/305619.305645]
|
| |
16
|
S. Chatterjee and S. Sen. Cache-efficient matrix transposition. In Proceedings of HPCA-6, pages 195-205, Toulouse, France, Jan. 2000.
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
| |
20
|
P. Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23-54, 1991.
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
Somnath Ghosh , Margaret Martonosi , Sharad Malik, Precise miss analysis for program transformations with caches of arbitrary associativity, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.228-239, October 02-07, 1998, San Jose, California, United States
|
 |
26
|
|
| |
27
|
Philip J. Hanlon , Dean Chung , Siddhartha Chatterjee , Daniela Genius , Alvin R. Lebeck , Erin Parker, The combinatorics of cache misses during matrix multiplication, Journal of Computer and System Sciences, v.63 n.1, p.80-126, August 2001
[doi> 10.1006/jcss.2001.1756]
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott. The Omega Calculator and Library, version 1.1.0, Nov. 1996.
|
| |
34
|
W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott. The Omega Library Version 1.1.0 Interface Guide, Nov. 1996.
|
| |
35
|
|
| |
36
|
W. Kelly and W. Pugh. Finding legal reordering transformations using mappings. Technical Report CS-TR-3297, Department of Compute Science, University of Maryland, College Park, MD, June 1994.
|
 |
37
|
|
 |
38
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
39
|
|
 |
40
|
|
 |
41
|
|
| |
42
|
V. Loechner. PolyLib: A Library for Manipulating Parameterized Polyhedra, Mar. 1999.
|
 |
43
|
Margaret Martonosi , Anoop Gupta , Thomas Anderson, MemSpy: analyzing memory system bottlenecks in programs, Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.1-12, June 01-05, 1992, Newport, Rhode Island, United States
|
 |
44
|
|
| |
45
|
|
| |
46
|
D. C. Oppen. A 2 2 2 pn upper bound on the complexity of Presburger arithmetic. J. Comput. Syst. Sci., 16(3):323-332, July 1978.
|
 |
47
|
Yunheung Paek , Jay Hoeflinger , David Padua, Simplification of array access patterns for compiler optimizations, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.60-71, June 17-19, 1998, Montreal, Quebec, Canada
|
| |
48
|
|
 |
49
|
|
 |
50
|
|
 |
51
|
|
| |
52
|
U. Sch~ning. Complexity of Presburger arithmetic with fixed quantifier dimension. Theory of Computing Systems, 30:423-428, 1997.
|
| |
53
|
N. Shibata, K. Okana, T. Higashino, and K. Taniguchi. A decision algorithm dor prenex form rational Presburger sentences based on combinatorial geometry. In Proceedings of the 2nd International Conference on Discrete Mathematics and Theoretical Computer Science and the 5th Australasian Theory Symposium (DMTCS'99+CATS'99), pages 344-359, Jan. 1999.
|
 |
54
|
|
| |
55
|
The Stanford Compiler Group. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. http://suif.stanford.edu.
|
| |
56
|
R. A. Sugumar and S. G. Abraham. Efficient simulation of multiple cache configurations using binomial trees. Technical Report CSE-TR-111-91, 1991.
|
 |
57
|
|
| |
58
|
|
| |
59
|
|
 |
60
|
|
 |
61
|
|
 |
62
|
|
 |
63
|
David A. Wood , Mark D. Hill , R. E. Kessler, A model for estimating trace-sample miss ratios, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.79-89, May 21-24, 1991, San Diego, California, United States
|
| |
64
|
H. Zhang and M. Martonosi. Mathematical cache miss analysis for pointer data structures. In Proceedings of the SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, VA, Mar. 2001. CD-ROM.
|
CITED BY 37
|
|
|
|
|
Allan Snavely , Laura Carrington , Nicole Wolter , Jesus Labarta , Rosa Badia , Avi Purkayastha, A framework for performance modeling and prediction, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-17, November 16, 2002, Baltimore, Maryland
|
|
|
|
|
|
Richard Vuduc , James W. Demmel , Katherine A. Yelick , Shoaib Kamil , Rajesh Nishtala , Benjamin Lee, Performance optimizations and bounds for sparse matrix-vector multiply, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-35, November 16, 2002, Baltimore, Maryland
|
|
|
|
|
|
Jaydeep Marathe , Frank Mueller , Tushar Mohan , Bronis R. de Supinski , Sally A. McKee , Andy Yoo, METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
|
|
|
H. S. Kim , N. Vijaykrishnan , M. Kandemir , E. Brockmeyer , F. Catthoor , M. J. Irwin, Estimating influence of data layout optimizations on SDRAM energy consumption, Proceedings of the 2003 international symposium on Low power electronics and design, August 25-27, 2003, Seoul, Korea
|
|
|
|
|
|
|
|
|
Sven Verdoolaege , Rachid Seghir , Kristof Beyls , Vincent Loechner , Maurice Bruynooghe, Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations, Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, September 22-25, 2004, Washington DC, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jaydeep Marathe , Frank Mueller , Tushar Mohan , Sally A. Mckee , Bronis R. De Supinski , Andy Yoo, METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies, ACM Transactions on Programming Languages and Systems (TOPLAS), v.29 n.2, p.12-es, April 2007
|
|
|
|
|
|
|
|
|
Arun Kejariwal , Alexandru Nicolau , Utpal Banerjee , Alexander V. Veidenbaum , Constantine D. Polychronopoulos, Cache-aware iteration space partitioning, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Arun Kejariwal , Alexandru Nicolau , Utpal Banerjee , Alexander V. Veidenbaum , Constantine D. Polychronopoulos, Cache-aware partitioning of multi-dimensional iteration spaces, Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, May 04-April 06, 2009, Haifa, Israel
|
|
|
|
|