|
ABSTRACT
Applications running on the StrongARM SA-1110 or XScale processor cores can specify cache mapping for each virtual page to achieve better cache utilization. In this work, we describe a method to efficiently perform cache mapping. Under this scheme, we select a number of loops for sampling. These loops are selected automatically based on clock profiling information. We formulate the optimal cache mapping problem as an Integer Linear Programming (ILP) problem. Experiments performed on 14 test programs show speedups in 13 of them (over the default mapping) after applying our sample-based cache mapping scheme. The geometric mean of program speedups for all the 14 test programs is 1.098. Furthermore, compared with a previous heuristic method which uses the full memory trace, the sample-based method performs cache mapping faster by an order of magnitude without sacrificing the quality of mapping.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Kennith K. Chan, Cyrus C. Hay, John R. Keller, Gordon P. Kurpanek, Francis X. Schumacher, and Jason Zheng. Design of HP PA 7200 CPU. In Hewlett-Packard Journal, February 1996.
|
| |
3
|
C-H. Chi and H. Deitz. Improving cache performance by selective cache bypass. In the 22nd Hawaii International Conference on System Science, pages 277--285, January 1989.
|
| |
4
|
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Proceedings of Design Automation Conference, Los Angeles, June 2000.
|
 |
5
|
|
| |
6
|
M. Hirzel and T. Chilimbi. Bursty tracing: A framework for low-overhead temporal profiling, 2001.
|
| |
7
|
ILOG Inc. ILOG CPLEX 7.1 Reference Manual. 2001.
|
| |
8
|
Intel Corporation. Intel StrongARM SA-1110 microprocessor developer's manual. http://www.intel.com/design/strong/manuals/278240.htm, October 2001.
|
| |
9
|
Intel Corporation. Intel PXA250 and PXA210 application processor developer's manual. http://www.intel.com/design/pca/applicationspro cessors/manuals/278693.htm, February 2002.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
Zhiyuan Li and Rong Xu. Page mapping for heterogeneously partitioned caches: Complexity and heuristics. Journal of Embedded Computing, accepted.
|
| |
14
|
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9:78--117, 1970.
|
| |
15
|
V. Milutinovic, M. Tomasevic, B. Markovic, and M. Tremblay. A new cache architecture concept: the split temporal/spatial cache. In Proceedings of 8th Mediterranean Electrotechnical Conference, pages 1108--1111, May 1996.
|
| |
16
|
Jude A. Rivers and Edward S. Davidson. Reducing conflicts in direct-mapped caches with a temporality-based design. In Proceedings of the 1996 International Conference on Parallel Processing, volume 1, pages 154--163, 1996.
|
 |
17
|
Jude A. Rivers , Edward S. Tam , Gary S. Tyson , Edward S. Davidson , Matt Farrens, Utilizing reuse information in data cache management, Proceedings of the 12th international conference on Supercomputing, p.449-456, July 1998, Melbourne, Australia
[doi> 10.1145/277830.277941]
|
 |
18
|
|
| |
19
|
|
| |
20
|
Gary Tyson , Matthew Farrens , John Matthews , Andrew R. Pleszkun, A modified approach to data cache management, Proceedings of the 28th annual international symposium on Microarchitecture, p.93-103, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
| |
21
|
|
| |
22
|
Rong Xu and Zhiyuan Li. Using cache mapping to improve memory performance of handheld devices. In Proceedings of the 4th IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2004), pages 115--122, Austin, Texas, 2004.
|
|