|
ABSTRACT
Miss rate curves (MRCs) are useful in a number of contexts. In our research, online L2 cache MRCs enable us to dynamically identify optimal cache sizes when cache-partitioning a shared-cache multicore processor. Obtaining L2 MRCs has generally been assumed to be expensive when done in software and consequently, their usage for online optimizations has been limited. To address these problems and opportunities, we have developed a low-overhead software technique to obtain L2 MRCs online on current processors, exploiting features available in their performance monitoring units so that no changes to the application source code or binaries are required. Our technique, called RapidMRC, requires a single probing period of roughly 221 million processor cycles (147 ms), and subsequently 124 million cycles (83 ms) to process the data. We demonstrate its accuracy by comparing the obtained MRCs to the actual L2 MRCs of 30 applications taken from SPECcpu2006, SPECcpu2000, and SPECjbb2000. We show that RapidMRC can be applied to sizing cache partitions, helping to achieve performance improvements of up to 27%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
C. Antonopoulos, D. Nikolopoulos, and T. Papatheodorou. Scheduling algorithms with bus bandwidth considerations for SMPs. In ICPP, pages 547--554, 2003.
|
 |
3
|
Reza Azimi , Livio Soares , Michael Stumm , Thomas Walsh , Angela Demke Brown, Path: page access tracking to improve memory management, Proceedings of the 6th international symposium on Memory management, October 21-22, 2007, Montreal, Quebec, Canada
[doi> 10.1145/1296907.1296914]
|
 |
4
|
|
 |
5
|
Rajeev Balasubramonian , David Albonesi , Alper Buyuktosunoglu , Sandhya Dwarkadas, Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.245-257, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360153]
|
| |
6
|
|
 |
7
|
|
| |
8
|
E. Berg, H. Zeffer, and E. Hagersten. A statistical multiprocessor cache model. In ISPASS, pages 89--99, 2006.
|
| |
9
|
D. Bruening, E. Duesterwald, and S. Amarasinghe. Design and implementation of a dynamic optimization framework for Windows. In FDDO, 2001.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
J. Edler and M. Hill. Dinero IV trace-driven uniprocessor cache simulator. URL http://www.cs.wisc.edu/~markhill/DineroIV.
|
| |
14
|
Alexandra Fedorova , Margo Seltzer , Christoper Small , Daniel Nussbaum, Performance of multithreaded chip multiprocessors and implications for operating system design, Proceedings of the annual conference on USENIX Annual Technical Conference, p.26-26, April 10-15, 2005, Anaheim, CA
|
 |
15
|
|
 |
16
|
|
 |
17
|
Ravi Iyer , Li Zhao , Fei Guo , Ramesh Illikkal , Srihari Makineni , Don Newell , Yan Solihin , Lisa Hsu , Steve Reinhardt, QoS policies and architecture for cache/memory in CMP platforms, ACM SIGMETRICS Performance Evaluation Review, v.35 n.1, June 2007
|
| |
18
|
Jong Min Kim , Jongmoo Choi , Jesung Kim , Sam H. Noh , Sang Lyul Min , Yookun Cho , Chong Sang Kim, A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references, Proceedings of the 4th conference on Symposium on Operating System Design & Implementation, p.9-9, October 22-25, 2000, San Diego, California
|
| |
19
|
|
 |
20
|
Yul H. Kim , Mark D. Hill , David A. Wood, Implementing stack simulation for highly-associative memories, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.212-213, May 21-24, 1991, San Diego, California, United States
|
| |
21
|
|
| |
22
|
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, pages 367--378, 2008.
|
| |
23
|
|
 |
24
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
| |
25
|
R. Mattson, J. Gecsei, D. Slutz, and I. Traiger. Evaluation techniques and storage hierarchies. IBM Systems J., 9(2):78--117, 1970.
|
 |
26
|
Ke Meng , Russ Joseph , Robert P. Dick , Li Shang, Multi-optimization power management for chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454141]
|
 |
27
|
|
 |
28
|
R. H. Patterson , G. A. Gibson , E. Ginting , D. Stodolsky , J. Zelenka, Informed prefetching and caching, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.79-95, December 03-06, 1995, Copper Mountain, Colorado, United States
|
| |
29
|
|
 |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
 |
34
|
|
 |
35
|
|
 |
36
|
|
| |
37
|
L. Soares, D. Tam, and M. Stumm. Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In MICRO, 2008.
|
| |
38
|
|
 |
39
|
|
| |
40
|
|
| |
41
|
|
| |
42
|
D. Tam, R. Azimi, L. Soares, and M. Stumm. Managing shared L2 caches on multicore systems in software. In WIOSCA, pages 26--33, 2007.
|
 |
43
|
|
| |
44
|
|
| |
45
|
|
| |
46
|
|
 |
47
|
Pin Zhou , Vivek Pandey , Jagadeesan Sundaresan , Anand Raghuraman , Yuanyuan Zhou , Sanjeev Kumar, Dynamic tracking of page miss ratio curve for memory management, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
48
|
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
C.4
PERFORMANCE OF SYSTEMS
Subjects:
Measurement techniques
Additional Classification:
C.
Computer Systems Organization
C.4
PERFORMANCE OF SYSTEMS
Subjects:
Modeling techniques
D.
Software
D.4
OPERATING SYSTEMS
D.4.8
Performance
Subjects:
Modeling and prediction;
Measurements
I.
Computing Methodologies
I.6
SIMULATION AND MODELING
I.6.4
Model Validation and Analysis
General Terms:
Experimentation,
Management,
Measurement,
Performance
Keywords:
cache management,
cache partitioning,
chip multiprocessor,
dynamic optimization,
hardware performance counters,
miss rate curve,
multicore processor,
online optimization,
performance monitoring unit,
shared cache,
shared cache management
|