|
ABSTRACT
As chip multiprocessors (CMPs) become increasingly mainstream, architects have likewise become more interested in how best to share a cache hierarchy among multiple simultaneous threads of execution. The complexity of this problem is exacerbated as the number of simultaneous threads grows from two or four to the tens or hundreds. However, there is no consensus in the architectural community on what "best" means in this context. Some papers in the literature seek to equalize each thread's performance loss due to sharing, while others emphasize maximizing overall system performance. Furthermore, the specific effect of these goals varies depending on the metric used to define "performance".In this paper we label equal performance targets as Communist cache policies and overall performance targets as Utilitarian cache policies. We compare both of these models to the most common current model of a free-for-all cache (a Capitalist policy). We consider various performance metrics, including miss rates, bandwidth usage, and IPC, including both absolute and relative values of each metric. Using analytical models and behavioral cache simulation, we find that the optimal partitioning of a shared cache can vary greatly as different but reasonable definitions of optimality are applied. We also find that, although Communist and Utilitarian targets are generally compatible, each policy has workloads for which it provides poor overall performance or poor fairness, respectively. Finally, we find that simple policies like LRU replacement and static uniform partitioning are not sufficient to provide near-optimal performance under any reasonable definition, indicating that some thread-aware cache resource allocation mechanism is required.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Proceedings of Design Automation Conference, Los Angeles, June 2000.
|
| |
3
|
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance of multithreaded chip multiprocessors and implications for operating system design. In Proc. 2005 USENIX Technical Conference, pages 395--398, 2005.
|
| |
4
|
R. Goodwins. Does hyperthreading hurt server performance? http://news.com.com/Does+hyperthreading+hurt+server+performance/2100-1006_3-5965435.html?tag=nefd.top, Nov. 2005.
|
 |
5
|
Jaehyuk Huh , Changkyu Kim , Hazim Shafi , Lixin Zhang , Doug Burger , Stephen W. Keckler, A NUCA substrate for flexible CMP cache sharing, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088154]
|
| |
6
|
Intel Corp. Next leap in microprocessor architecture: Intel core duo. White paper. http://ces2006.akamai.com.edgesuite.net/yonahassets/CoreDuo_WhitePaper.pdf.
|
| |
7
|
R. R. Iyer. On modeling and analyzing cache hierarchies using CASPER. In Proc. 11th Int'l Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, pages 182--187, Oct. 2003.
|
 |
8
|
|
| |
9
|
R. Kalla, B. Sinharoy, and J. M. Tendler. Ibm power5 chip: A dual-core multithreaded processor. IEEE Micro, 24(2):40--47, Mar. 2004.
|
| |
10
|
|
| |
11
|
|
| |
12
|
S. R. Kunkel, R. J. Eickemeyer, M. H. Lipasti, T. J. Mullins, B. O'Krafka, H. Rosenberg, S. P. VanderWiel, P. L. Vitale, and L. D. Whitley. A performance methodology for commercial servers. IBM Journal of Research and Development, 44(6):851--871, November 2000.
|
| |
13
|
M5 Development Team. The M5 Simulator. http://m5.eecs.umich.edu.
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
| |
18
|
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic cache partitioning for simultaneous multithreading systems. In Proc. 13th IASTED Int'l Conference on Parallel and Distributed Computing Systems, 2001.
|
| |
19
|
|
 |
20
|
|
 |
21
|
David A. Wood , Mark D. Hill , R. E. Kessler, A model for estimating trace-sample miss ratios, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.79-89, May 21-24, 1991, San Diego, California, United States
|
CITED BY 17
|
|
|
|
|
|
|
|
|
|
|
Fei Guo , Hari Kannan , Li Zhao , Ramesh Illikkal , Ravi Iyer , Don Newell , Yan Solihin , Christos Kozyrakis, From chaos to QoS: case studies in CMP resource management, ACM SIGARCH Computer Architecture News, v.35 n.1, March 2007
|
|
|
|
|
|
|
|
|
Ravi Iyer , Li Zhao , Fei Guo , Ramesh Illikkal , Srihari Makineni , Don Newell , Yan Solihin , Lisa Hsu , Steve Reinhardt, QoS policies and architecture for cache/memory in CMP platforms, ACM SIGMETRICS Performance Evaluation Review, v.35 n.1, June 2007
|
|
|
|
|
|
|
|
|
Yunlian Jiang , Xipeng Shen , Jie Chen , Rahul Tripathi, Analysis and approximation of optimal co-scheduling on chip multiprocessors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|