|
ABSTRACT
Multithreading is widely used to increase processor throughput. As the number of shared resources increase, managing them while guaranteeing predicted performance becomes a major problem. Attempts have been made in previous work to ease this via different fairness mechanisms. In this article, we present a new approach to control the resource allocation and sharing via a service level agreement (SLA)-based mechanism; that is, via an agreement in which multithreaded processors guarantee a minimal level of service to the running threads. We introduce a new metric, CSLA, for conformance to SLA in multithreaded processors and show that controlling resources using with SLA allows for higher gains than are achievable by previously suggested fairness techniques. It also permits improving one metric (e.g., power) while maintaining SLA in another (e.g., performance). We compare SLA enforcement to schemes based on other fairness metrics, which are mostly targeted at equalizing execution parameters. We show that using SLA rather than fairness based algorithms provides a range of acceptable execution points from which we can select the point that best fits our optimization target, such as maximizing the weighted speedup (sum of the speedups of the individual threads) or reducing power. We demonstrate the effectiveness of the new SLA approach using switch-on-event (coarse-grained) multithreading. Our weighted speedup improvement scheme successfully enforces SLA while improving the weighted speedup by an average of 10% for unbalanced threads. This result is significant when compared with performance losses that may be incurred by fairness enforcement methods. When optimizing for power reduction in unbalanced threads SLA enforcement reduces the power by an average of 15%. SLA may be complemented by other power reduction methods to achieve further power savings and maintain the same service level for the threads. We also demonstrate differentiated SLA, where weighted speedup is maximized while each thread may have a different throughput constraint.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Avi-Itzhak, B. and Levy, H. 2004. On measuring fairness in queues. Adv. Appl. Probab. 36, 3, 919--936.
|
| |
3
|
Bennett, J. and Zhang, H. 1996. WF2Q: worst-case fair weighted fair queueing. In Proceedings of the 15th Annual Joint Conference of the IEEE Computer Societies. IEEE, 1.
|
| |
4
|
Borkenhagen, J. M., Eickemeyer, R. J., Kalla, R. N., and Kunkel, S. R. 2000. A multi-threaded PowerPC processor for commercial servers. IBM J. Res. Dev. 44, 6, 885--898.
|
| |
5
|
|
| |
6
|
Cazorla, F., Fernandez, E., Ramirez, A., and Valero, M. 2003. Improving memory latency aware fetch policies for SMT processors. In Proceedings of the 5th International Symposium on High-Performance Computing. Springer.
|
| |
7
|
Francisco J. Cazorla , Peter M. W. Knijnenburg , Rizos Sakellariou , Enrique Fernandez , Alex Ramirez , Mateo Valero, Predictable Performance in SMT Processors: Synergy between the OS and SMTs, IEEE Transactions on Computers, v.55 n.7, p.785-799, July 2006
[doi> 10.1109/TC.2006.108]
|
| |
8
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Peter M. W. Knijnenburg , Rizos Sakellariou , Enrique Fernández, QoS for High-Performance SMT Processors in Embedded Systems, IEEE Micro, v.24 n.4, p.24-31, July 2004
[doi> 10.1109/MM.2004.37]
|
| |
9
|
Cazorla, F. J., Knijnenburg, P. M., Sakellariou, R., Fernandez, E., Ramirez, A., and Valero, M. 2004. Feasibility of QoS for SMT by resource allocation. In Proceedings of the International Euro-Par Conference. Springer, 535--540.
|
| |
10
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Enrique Fernandez, Dynamically Controlled Resource Allocation in SMT Processors, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.171-182, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.17]
|
 |
11
|
|
| |
12
|
Chen, S. and Ma, P. 2007. FROCM: A fair and low-overhead method in SMT processor. In Proceedings of the 3rd International Conference on High-Performance Computing and Communications. Springer.
|
| |
13
|
Cooksey, R., Jourdan, S., and Grunwald, D. 2002. A stateless, content-directed data prefetching mechanism. SIGOPS Oper. Syst. Rev. 36, 5, 279--290.
|
| |
14
|
CPU2000. Standard Performance Evaluation Corporation, Spec CPU2000.
|
| |
15
|
Richard J. Eickemeyer , Ross E. Johnson , Steven R. Kunkel , Beng-Hong Lim , Mark S. Squillante , Ching-Farn Eric Wu, Evaluation of Multithreaded Processors and Thread-Switch Policies, Proceedings of the International Symposium on High Performance Computing, p.75-90, November 04-06, 1997
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
Golestani, S. 1994. A self-clocked fair queueing scheme for broadband applications. In Proceedings of the 13th Annual Joint Conference of the IEEE Computer and Communication Societies (INFOCOM'94). IEEE, 636--646.
|
| |
21
|
|
| |
22
|
|
| |
23
|
Gwennap, L. 1995. Intel's P6 uses decoupled superscalar design. Micro-processor Rep. 9, 2.
|
| |
24
|
Halfhill, T. R. 2006. Intel goes quad. Micro-processor Rep.
|
 |
25
|
Lisa R. Hsu , Steven K. Reinhardt , Ravishankar Iyer , Srihari Makineni, Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
[doi> 10.1145/1152154.1152161]
|
 |
26
|
|
 |
27
|
Ravi Iyer , Li Zhao , Fei Guo , Ramesh Illikkal , Srihari Makineni , Don Newell , Yan Solihin , Lisa Hsu , Steve Reinhardt, QoS policies and architecture for cache/memory in CMP platforms, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
| |
28
|
Jain, R., Chiu, D., and Hawe, W. 1998. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. Arxiv preprint cs.NI/9809099. http://adsabs.harvard.edu/abs/1998cs........9099J
|
| |
29
|
|
 |
30
|
|
| |
31
|
|
| |
32
|
Krewell, K. 2006. Intel looks to core for success. Micro-processor Rep.
|
| |
33
|
|
| |
34
|
Luo, K., Gummaraju, J., and Franklin, M. 2001. Balancing throughput and fairness in SMT processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE, 164--171.
|
| |
35
|
McGregor, J. 2007. The New x86 Landscape. Micro-processor rep.
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
 |
39
|
|
| |
40
|
|
 |
41
|
Trevor Pering , Tom Burd , Robert Brodersen, The simulation and evaluation of dynamic voltage scaling algorithms, Proceedings of the 1998 international symposium on Low power electronics and design, p.76-81, August 10-12, 1998, Monterey, California, United States
[doi> 10.1145/280756.280790]
|
| |
42
|
Raasch, S. E. and Reinhardt, S. K. 1999. Applications of thread prioritization in SMT processors. In Proceedings of the Workshop on Multi-threaded Execution and Compilation. ACM.
|
| |
43
|
|
 |
44
|
|
 |
45
|
|
| |
46
|
|
| |
47
|
Singhal, R., Venkatraman, K., Cohn, E., Holm, J., Koufaty, D., Lin, M.-J., Madhav, M., Mattwandel, M., Nidhi, N., Pearce, J., and Seshadri, M. 2004. Performance analysis and validation of the Intel Pentium4 processor on 90nm technology. Intel Tech. J. 8.
|
 |
48
|
|
| |
49
|
Tarjan, D., Thoziyoor, S., and Jouppi, N. 2006. CACTI 4.0. Tech. rep. HP Laboratories.
|
 |
50
|
|
| |
51
|
|
| |
52
|
|
| |
53
|
|
 |
54
|
|
 |
55
|
|
|