|
ABSTRACT
The role of the operating system (OS) in managing shared resources such as CPU time, memory, peripherals, and even energy is well motivated and understood [23]. Unfortunately, one key resource—lower-level shared cache in chip multi-processors—is commonly managed purely in hardware by rudimentary replacement policies such as least-recentlyused (LRU). The rigid nature of the hardware cache management policy poses a serious problem since there is no single best cache management policy across all sharing scenarios. For example, the cache management policy for a scenario where applications from a single organization are running under "best effort" performance expectation is likely to be different from the policy for a scenario where applications from competing business entities (say, at a third party data center) are running under a minimum service level expectation. When it comes to managing shared caches, there is an inherent tension between flexibility and performance. On one hand, managing the shared cache in the OS offers immense policy flexibility since it may be implemented in software. Unfortunately, it is prohibitively expensive in terms of performance for the OS to be involved in managing temporally fine-grain events such as cache allocation. On the other hand, sophisticated hardware-only cache management techniques to achieve fair sharing or throughput maximization have been proposed. But they offer no policy flexibility.This paper addresses this problem by designing architectural support for OS to efficiently manage shared caches with a wide variety of policies. Our scheme consists of a hardware cache quota management mechanism, an OS interface and a set of OS level quota orchestration policies. The hardware mechanism guarantees that OS-specified quotas are enforced in shared caches, thus eliminating the need for (and the performance penalty of) temporally fine-grained OS intervention. The OS retains policy flexibility since it can tune the quotas during regularly scheduled OS interventions. We demonstrate that our scheme can support a wide range of policies including policies that provide (a) passive performance differentiation, (b) reactive fairness by miss-rate equalization and (c) reactive performance differentiation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
M. J. Buco , R. N. Chang , L. Z. Luan , C. Ward , J. L. Wolf , P. S. Yu, Utility computing SLA management based upon business objectives, IBM Systems Journal, v.43 n.1, p.159-178, January 2004
|
 |
3
|
D. R. Cheriton , A. Gupta , P. D. Boyle , H. A. Goosen, The VMP multiprocessor: initial experience, refinements, and performance evaluation, Proceedings of the 15th Annual International Symposium on Computer architecture, p.410-421, May 30-June 02, 1988, Honolulu, Hawaii, United States
|
| |
4
|
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic Cache Partitioning via Columnization. In Proceedings of Design Automation Conference, Los Angeles, june 2000.
|
| |
5
|
Joseph R. Eykholt, Steve R. Kleiman, Steve Barton, Roger Faulkner, Anil Shivalingiah, Mark Smith, Dan Stein, Jim Voll, Mary Weeks, and Dock Williams. Beyond Multiprocessing: Multithreading the SunOS Kernel. In Proceedings of theSummer 1992 USENIX Technical Conference and Exhibition, pages 11--18, San Antontio, TX, USA, 1992.
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
Philip Machanick , Pierre Salverda , Lance Pompe, Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.105-114, October 02-07, 1998, San Jose, California, United States
|
| |
13
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916]
|
 |
14
|
Milo M. K. Martin , Daniel J. Sorin , Bradford M. Beckmann , Michael R. Marty , Min Xu , Alaa R. Alameldeen , Kevin E. Moore , Mark D. Hill , David A. Wood, Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, ACM SIGARCH Computer Architecture News, v.33 n.4, November 2005
[doi> 10.1145/1105734.1105747]
|
| |
15
|
Jim Mauro. The Solaris Process Model: Managing Thread Execution and Wait Times in the System Clock Handler, 2000.
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
Nauman Rafique, Won-Taek Lim, and Mithuna Thottethodi. Architectural Support for Operating System-Driven CMP Cache Management. Technical Report TR ECE 06-11, Purdue University, 2006.
|
| |
20
|
|
| |
21
|
Alex Settle, Dan Connors, Enric Gibert, and Antonio Gonzalez. A Dynamically Reconfigurable Cache for Multithreaded Processors. Journal of Embedded Computing: Special Issue on Single-Chip Multi-core Architectures, December 2005.
|
| |
22
|
P. Shivakumar and Norm P. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power and Area Model. Technical Report, 2001.
|
| |
23
|
|
| |
24
|
|
| |
25
|
|
| |
26
|
sun.com. Solaris Containers-Resource Management and Solaris Zones. In System Administration Guide.
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
CITED BY 21
|
|
|
|
|
|
|
|
|
|
|
Fei Guo , Hari Kannan , Li Zhao , Ramesh Illikkal , Ravi Iyer , Don Newell , Yan Solihin , Christos Kozyrakis, From chaos to QoS: case studies in CMP resource management, ACM SIGARCH Computer Architecture News, v.35 n.1, March 2007
|
|
|
|
|
|
Ravi Iyer , Li Zhao , Fei Guo , Ramesh Illikkal , Srihari Makineni , Don Newell , Yan Solihin , Lisa Hsu , Steve Reinhardt, QoS policies and architecture for cache/memory in CMP platforms, ACM SIGMETRICS Performance Evaluation Review, v.35 n.1, June 2007
|
|
|
|
|
|
|
|
|
Yunlian Jiang , Xipeng Shen , Jie Chen , Rahul Tripathi, Analysis and approximation of optimal co-scheduling on chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
Lu Peng , Jih-Kwon Peir , Tribuvan K. Prakash , Carl Staelin , Yen-Kuang Chen , David Koppelman, Memory hierarchy performance measurement of commercial dual-core desktop processors, Journal of Systems Architecture: the EUROMICRO Journal, v.54 n.8, p.816-828, August, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrew Herdrich , Ramesh Illikkal , Ravi Iyer , Don Newell , Vineet Chadha , Jaideep Moses, Rate-based QoS techniques for cache/memory in CMP platforms, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
|
|
|
|
|