|
ABSTRACT
Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and their workloads range from single-threaded and multithreaded applications to complex virtual machines (VMs), a shared cache resource will be consumed by these different entities generating heterogeneous memory access streams exhibiting different locality properties and varying memory sensitivity. As a result, conventional cache management approaches that treat all memory accesses equally are bound to result in inefficient space utilization and poor performance even for applications with good locality properties. To address this problem, this paper presents a new cache management framework (CQoS) that (1) recognizes the heterogeneity in memory access streams, (2) introduces the notion of QoS to handle the varying degrees of locality and latency sensitivity and (3) assigns and enforces priorities to streams based on latency sensitivity, locality degree and application performance needs. To achieve this, we propose CQoS options for priority classification, priority assignment and priority enforcement. We briefly describe CQoS priority classification and assignment options -- ranging from user-driven and developer-driven to compiler-detected and flow-based approaches. Our focus in this paper is on CQoS mechanisms for priority enforcement -- these include (1) selective cache allocation, (2) static/dynamic set partitioning and (3) heterogeneous cache regions. We discuss the architectural design and implementation complexity of these CQoS options. To evaluate the performance trade-offs for these options, we have modeled these CQoS options in a cache simulator and evaluated their performance in CMP platforms running network-intensive server workloads. Our simulation results show the effectiveness of our proposed options and make the case for CQoS in future multi-threaded/multi-core platforms since it improves shared cache efficiency and increases overall system performance as a result.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
K. Beyls, "Faster Computing through Software-Controlled Cache Replacement," http://escher.elis.ugent.be/publ/Edocs/DOC/P102_118.pdf
|
| |
3
|
|
| |
4
|
D. Clark et. al., "An analysis of TCP Processing overhead", IEEE Communications, June 1989.
|
 |
5
|
Tal Garfinkel , Ben Pfaff , Jim Chow , Mendel Rosenblum , Dan Boneh, Terra: a virtual machine-based platform for trusted computing, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
| |
6
|
R. Iyer, "CASPER: Cache Architecture, Simulation and Performance Exploration using Re-streams," Intel's Design and Test Technology Conference (DTTC), 2001.
|
| |
7
|
R. Iyer, "On Modeling and Analyzing Cache Hierarchies using CASPER," MASCOTS-11, 2003.
|
| |
8
|
P. Jain, et al., "Software Assisted Cache Replacement and Prefetching Pollution Control," http://www.csail.mit.edu/research/abstracts/abstracts03/architecture/24jain.pdf
|
 |
9
|
|
| |
10
|
S.T. King, George W. Dunlap, Peter M. Chen, "Operating System Support for Virtual Machines", Proceedings of the 2003 Annual USENIX Technical Conference, June 2003.
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
S. Makineni and R. Iyer, "Performance Characterization of TCP/IP Packet Processing in Commercial Workloads," IEEE WWC-6, 2003.
|
| |
15
|
D. Marr et al., "Hyper-Threading Technology Architecture and Microarchitecture" Intel Technology Journal, 2002. http://www.intel.com/technology/itj/2002/volume06issue01/
|
| |
16
|
M. Martin, et al., "Token Coherence: A New Framework for Shared-Memory Multiprocessors," IEEE Micro Special Issue, Nov-Dec 2003.
|
| |
17
|
N. Megido, "Adaptive Replacement Cache," IBM T.J. Watson Research Center, http://www.almaden.ibm.com/cs/people/dmodha/arc-fast.pdf
|
| |
18
|
D. Minturn, et al., "Exploiting Architectural Techniques for Improving TCP/IP Processing Performance," submitted to a conference.
|
| |
19
|
|
| |
20
|
J. B. Postel, "Transmission Control Protocol", RFC 793, Information Sciences Institute, Sept. 1981.
|
| |
21
|
|
 |
22
|
Parthasarathy Ranganathan , Vijay S. Pai , Hazim Abdel-Shafi , Sarita V. Adve, The interaction of software prefetching with ILP processors in shared-memory systems, Proceedings of the 24th annual international symposium on Computer architecture, p.144-156, June 01-04, 1997, Denver, Colorado, United States
|
| |
23
|
|
| |
24
|
SimpleScalar LLC, http://www.simplescalar.com
|
 |
25
|
|
| |
26
|
"SPECweb99 Design Document," available at http://www.specbench.org/osg/web99/docs/whitepaper.html
|
| |
27
|
P. Stenstrom, "A Survey of Cache Coherence Protocols," IEEE Computer, 1990.
|
| |
28
|
E. Suh, L. Rudolph and S. Devadas, "Dynamic Partitioning of Shared Cache Memory," Journal of Supercomputing, July 2002.
|
| |
29
|
"The TTCP Benchmark", http://ftp.arl.mil/~mike/ttcp.html
|
 |
30
|
|
 |
31
|
|
| |
32
|
VMware Inc., "VMware is Virtual Infrastructure", http://www.vmware.com/vinfrastructure/
|
 |
33
|
|
 |
34
|
|
| |
35
|
L. Zhao, et al., "Efficient Cache Structures and Policies for Server Network Acceleration," submitted to a conference.
|
CITED BY 31
|
|
Jaehyuk Huh , Changkyu Kim , Hazim Shafi , Lixin Zhang , Doug Burger , Stephen W. Keckler, A NUCA substrate for flexible CMP cache sharing, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lisa R. Hsu , Steven K. Reinhardt , Ravishankar Iyer , Srihari Makineni, Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
Fei Guo , Hari Kannan , Li Zhao , Ramesh Illikkal , Ravi Iyer , Don Newell , Yan Solihin , Christos Kozyrakis, From chaos to QoS: case studies in CMP resource management, ACM SIGARCH Computer Architecture News, v.35 n.1, March 2007
|
|
|
|
|
|
|
|
|
Ravi Iyer , Li Zhao , Fei Guo , Ramesh Illikkal , Srihari Makineni , Don Newell , Yan Solihin , Lisa Hsu , Steve Reinhardt, QoS policies and architecture for cache/memory in CMP platforms, ACM SIGMETRICS Performance Evaluation Review, v.35 n.1, June 2007
|
|
|
|
|
|
|
|
|
Keshavan Varadarajan , S. K. Nandy , Vishal Sharda , Amrutur Bharadwaj , Ravi Iyer , Srihari Makineni , Donald Newell, Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.433-442, December 09-13, 2006
|
|
|
|
|
|
Aamer Jaleel , William Hasenplaugh , Moinuddin Qureshi , Julien Sebot , Simon Steely, Jr. , Joel Emer, Adaptive insertion policies for managing shared caches, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lu Peng , Jih-Kwon Peir , Tribuvan K. Prakash , Carl Staelin , Yen-Kuang Chen , David Koppelman, Memory hierarchy performance measurement of commercial dual-core desktop processors, Journal of Systems Architecture: the EUROMICRO Journal, v.54 n.8, p.816-828, August, 2008
|
|
|
Li Zhao , Ravi Iyer , Jaideep Moses , Ramesh Illikkal , Srihari Makineni , Don Newell, Exploring Large-Scale CMP Architectures Using ManySim, IEEE Micro, v.27 n.4, p.21-33, July 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrew Herdrich , Ramesh Illikkal , Ravi Iyer , Don Newell , Vineet Chadha , Jaideep Moses, Rate-based QoS techniques for cache/memory in CMP platforms, Proceedings of the 23rd international conference on Supercomputing, June 08-12, 2009, Yorktown Heights, NY, USA
|
|
|
|
|
|
|
|
|
|
|