ACM Home Page
Please provide us with feedback. Feedback
Adaptive reorder buffers for SMT processors
Full text PdfPdf (517 KB)
Source PACT archive
Proceedings of the 15th international conference on Parallel architectures and compilation techniques table of contents
Seattle, Washington, USA
SESSION: Out-of-order microarchitecture table of contents
Pages: 244 - 253  
Year of Publication: 2006
ISBN:1-59593-264-X
Authors
Joseph Sharkey  State University of New York, Binghamton, NY
Deniz Balkan  State University of New York, Binghamton, NY
Dmitry Ponomarev  State University of New York, Binghamton, NY
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 42,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1152154.1152192
What is a DOI?

ABSTRACT

In SMT processors, the complex interplay between private and shared datapath resources needs to be considered in order to realize the full performance potential. In this paper, we show that blindly increasing the size of the per-thread reorder buffers to provide a larger number of in-flight instructions does not result in the expected performance gains but, quite in contrast, degrades the instruction throughput for virtually all multithreaded workloads. The reason for this performance loss is the excessive pressure on the shared datapath resources, especially the instruction scheduling logic. We propose intelligent mechanisms for dynamically adapting the number of reorder buffer entries allocated to each thread in an effort to avoid such allocations if they detrimentally impact the scheduler. We achieve this goal through categorizing the program execution into issue-bound and commit-bound phases and only performing the buffer allocations to the threads operating in commit-bound phases. Our adaptive technique achieves improvements of 21% in instruction throughput and 10% in the fairness metric compared to the best performing baseline configuration with static ROBs.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Burger, T. Austin. "The SimpleScalar tool set: Version 2.0." Tech. Report, Dept. of CS, Univ. of Wisconsin-Madison, June 1997 and documentation for all Simplescalar releases.
2
 
3
 
4
F. Cazorla, et al. "Improving Memory Latency Aware Fetch Policies for SMT Processors." in Proc International Symposium on High Performance Computing, 2003.
 
5
 
6
 
7
K. Luo, et al. "Balancing Throughput and Fairness in SMT Processors." in Proc ISPASS, 2001.
 
8
 
9
 
10
11
 
12
13
14
 
15
 
16
D. Marr, et al, "Hyperthreading Technology Architecture and Microarchitecture", Intel Tech. Journal, vol. 6, No.1, Feb. 2002.
17
 
18
 
19
20
21
 
22
J. Sharkey, "M-Sim: A Flexible, Multi-threaded Simulation Environment." Tech. Report CS-TR-05-DP1, Department of Computer Science, SUNY Binghamton, 2005. http://www.cs.binghamton.edu/~jsharke/m-sim


Collaborative Colleagues:
Joseph Sharkey: colleagues
Deniz Balkan: colleagues
Dmitry Ponomarev: colleagues