ACM Home Page
Please provide us with feedback. Feedback
Hill-climbing SMT processor resource distribution
Full text PdfPdf (5.40 MB)
Source
ACM Transactions on Computer Systems (TOCS) archive
Volume 27 ,  Issue 1  (February 2009) table of contents
Article No. 1  
Year of Publication: 2009
ISSN:0734-2071
Authors
Seungryul Choi  Google
Donald Yeung  University of Maryland, College Park, MD
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 223,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1482619.1482620
What is a DOI?

ABSTRACT

The key to high performance in Simultaneous MultiThreaded (SMT) processors lies in optimizing the distribution of shared resources to active threads. Existing resource distribution techniques optimize performance only indirectly. They infer potential performance bottlenecks by observing indicators, like instruction occupancy or cache miss counts, and take actions to try to alleviate them. While the corrective actions are designed to improve performance, their actual performance impact is not known since end performance is never monitored. Consequently, potential performance gains are lost whenever the corrective actions do not effectively address the actual bottlenecks occurring in the pipeline.

We propose a different approach to SMT resource distribution that optimizes end performance directly. Our approach observes the impact that resource distribution decisions have on performance at runtime, and feeds this information back to the resource distribution mechanisms to improve future decisions. By evaluating many different resource distributions, our approach tries to learn the best distribution over time. Because we perform learning online, learning time is crucial. We develop a hill-climbing algorithm that quickly learns the best distribution of resources by following the performance gradient within the resource distribution space. We also develop several ideal learning algorithms to enable deeper insights through limit studies.

This article conducts an in-depth investigation of hill-climbing SMT resource distribution using a comprehensive suite of 63 multiprogrammed workloads. Our results show hill-climbing outperforms ICOUNT, FLUSH, and DCRA (three existing SMT techniques) by 11.4%, 11.5%, and 2.8%, respectively, under the weighted IPC metric. A limit study conducted using our ideal learning algorithms shows our approach can potentially outperform the same techniques by 19.2%, 18.0%, and 7.6%, respectively, thus demonstrating additional room exists for further improvement. Using our ideal algorithms, we also identify three bottlenecks that limit online learning speed: local maxima, phased behavior, and interepoch jitter. We define metrics to quantify these learning bottlenecks, and characterize the extent to which they occur in our workloads. Finally, we conduct a sensitivity study, and investigate several extensions to improve our hill-climbing technique.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. CS TR 1342, University of Wisconsin-Madison. June.
 
2
3
 
4
Dorai, G. K., Yeung, D., and Choi, S. 2003. Optimizing SMT processors for high single-thread performance. J. Instruction-Level Parallel. 5, 1--35.
 
5
 
6
Goncalves, R., Ayguade, E., Valero, M., and Navau, P. O. A. 2001. Performance evaluation of decoding and dispatching stages in simultaneous multithreaded architectures. In Proceedings of the 13th Symposium on Computer Architecture and High Performance Computing.
 
7
8
 
9
 
10
Luo, K., Gummaraju, J., and Franklin, M. 2001. Balancing throughput and fairness in SMT processors. In Proceedings of the International Symposium on Performance Analysis of Systems and Software. IEEE Computer Society, 164--171.
 
11
 
12
Marr, D. T., Binns, F., Hill, D., Hinton, G., Koufaty, D., Miller, J. A., and Upton, M. 2002. Hyper-Threading technology architecture and microarchitecture. Intel Technol. J. 6, 1, 4--15.
 
13
Pentium4. 2002. Intel Pentium 4 processor. http://www.intel.com/design/Pentium4/index.htm.
 
14
Raasch, S. E. and Reinhardt, S. K. 1999. Applications of thread prioritization in SMT processors. In Proceedings of the Multithreaded Execution, Architecture, and Compilation Workshop.
 
15
 
16
17
18
19
 
20
21

Collaborative Colleagues:
Seungryul Choi: colleagues
Donald Yeung: colleagues