|
ABSTRACT
The gap between processor and main memory performance increases every year. In order to overcome this problem, cache memories are widely used. However, they are only effective when programs exhibit sufficient data locality. Compile-time program transformations can significantly improve the performance of the cache. To apply most of these transformations, the compiler requires a precise knowledge of the locality of the different sections of the code, both before and after being transformed.Cache miss equations (CMEs) allow us to obtain an analytical and precise description of the cache memory behavior for loop-oriented codes. Unfortunately, a direct solution of the CMEs is computationally intractable due to its NP-complete nature.This article proposes a fast and accurate approach to estimate the solution of the CMEs. We use sampling techniques to approximate the absolute miss ratio of each reference by analyzing a small subset of the iteration space. The size of the subset, and therefore the analysis time, is determined by the accuracy selected by the user. In order to reduce the complexity of the algorithm to solve CMEs, effective mathematical techniques have been developed to analyze the subset of the iteration space that is being considered. These techniques exploit some properties of the particular polyhedra represented by CMEs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
Glenn Ammons , Thomas Ball , James R. Larus, Exploiting hardware performance counters with flow and context sensitive profiling, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.85-96, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
4
|
Ayguadé, E. et al. 1995. A uniform internal representation for high-level and instruction-level transformations. Tech rep. UPC-DAC-95-02. Universitat Politècnica de Catalunya, Barcelona, Spain.
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
 |
9
|
Steve Carr , Kathryn S. McKinley , Chau-Wen Tseng, Compiler optimizations for improving data locality, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.252-262, October 05-07, 1994, San Jose, California, United States
|
 |
10
|
Siddhartha Chatterjee , Vibhor V. Jain , Alvin R. Lebeck , Shyam Mundhra , Mithuna Thottethodi, Nonlinear array layouts for hierarchical memory systems, Proceedings of the 13th international conference on Supercomputing, p.444-453, June 20-25, 1999, Rhodes, Greece
[doi> 10.1145/305138.305231]
|
 |
11
|
Siddhartha Chatterjee , Erin Parker , Philip J. Hanlon , Alvin R. Lebeck, Exact analysis of the cache behavior of nested loops, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.286-297, June 2001, Snowbird, Utah, United States
|
 |
12
|
|
 |
13
|
|
| |
14
|
DeGroot, M. 1998. Probability and Statistics. Addison-Wesley, Reading, MA.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
Somnath Ghosh , Margaret Martonosi , Sharad Malik, Precise miss analysis for program transformations with caches of arbitrary associativity, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.228-239, October 02-07, 1998, San Jose, California, United States
|
 |
21
|
|
 |
22
|
Somnath Ghosh , Margaret Martonosi , Sharad Malik, Automated cache optimizations using CME driven diagnosis, Proceedings of the 14th international conference on Supercomputing, p.316-326, May 08-11, 2000, Santa Fe, New Mexico, United States
[doi> 10.1145/335231.335262]
|
 |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
Hill, M. n.d. DineroIII: a uniprocessor cache simulator (http://www.cs.wisc.edu/∼larus/warts.html).
|
| |
27
|
|
 |
28
|
|
| |
29
|
Kreisel, G. and Krevine, J. L. 1967. Elements of Mathematical Logic. North-Holland, Amsterdam, The Netherlands.
|
 |
30
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
31
|
|
| |
32
|
|
 |
33
|
Margaret Martonosi , Anoop Gupta , Thomas Anderson, MemSpy: analyzing memory system bottlenecks in programs, Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.1-12, June 01-05, 1992, Newport, Rhode Island, United States
|
 |
34
|
Margaret Martonosi , Anoop Gupta , Thomas Anderson, Effectiveness of trace sampling for performance debugging tools, Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.248-259, May 10-14, 1993, Santa Clara, California, United States
|
| |
35
|
McCabe, M. 1989. Introduction to the Practice of Statistics. Freeman & Co., New York, NY.
|
 |
36
|
|
| |
37
|
MIPS. 1988. RISCompiler Languages Programmer's Guide. MIPS Computer Systems, Sunnyvale, CA.
|
 |
38
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
39
|
Padua, D. et al. 1994. Polaris Developer's Document. Available online at http://polaris.is.uiuc.edu/polaris/polaris--developer/polaris--developer.html.
|
 |
40
|
|
 |
41
|
|
 |
42
|
|
| |
43
|
|
| |
44
|
Sánchez, F. and González, A. 1998. Fast, flexible and accurate data locality analysis. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT'98).
|
 |
45
|
|
| |
46
|
|
 |
47
|
O. Temam , C. Fricker , W. Jalby, Cache interference phenomena, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.261-271, May 16-20, 1994, Nashville, Tennessee, United States
|
 |
48
|
|
 |
49
|
|
| |
50
|
|
| |
51
|
Vera, X., Llosa, J., and González, A. 2002. Near-optimal padding for removing conflict misses. In Proceedings of the 15th Workshop on Languages and Compilers for Parallel Computers (LCPC'02).
|
| |
52
|
|
| |
53
|
Wilde, D. 1993. A library for doing polyhedral operations. Tech. rep. 785, Oregon State University.
|
 |
54
|
|
 |
55
|
|
REVIEW
"Eno Thereska : Reviewer"
The disparity between central processing unit (CPU) and memory speeds is addressed in this paper. Memory speeds lag behind CPU speeds. Caches are often used to mitigate this problem; programs that exhibit a high degree of locality can benefit most
more...
|