|
ABSTRACT
A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-latency load aware SMT fetch policies limit the amount of resources allocated by a stalled thread by identifying long-latency loads and preventing the thread from fetching more instructions—and in some implementations, instructions beyond the long-latency load are flushed to release allocated resources. This article proposes an SMT fetch policy that takes into account the available memory-level parallelism (MLP) in a thread. The key idea proposed in this article is that in case of an isolated long-latency load (i.e., there is no MLP), the thread should be prevented from allocating additional resources. However, in case multiple independent long-latency loads overlap (i.e., there is MLP), the thread should allocate as many resources as needed in order to fully expose the available MLP. MLP-aware fetch policies achieve better performance for MLP-intensive threads on SMT processors, leading to higher overall system throughput and shorter average turnaround time than previously proposed fetch policies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Anant Agarwal , John Kubiatowicz , David Kranz , Beng-Hong Lim , Donald Yeung , Godfrey D'Souza , Mike Parkin, Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors, IEEE Micro, v.13 n.3, p.48-61, May 1993
[doi> 10.1109/40.216748]
|
| |
2
|
Anderson, D. W., Sparacio, F. J., and Tomasulo, R. M. 1967. The IBM system/360 model 91: machine philosophy and instruction-handling. IBM J. Res. Dev. 11, 1, 8--24.
|
| |
3
|
|
| |
4
|
Francisco J. Cazorla , Alex Ramirez , Mateo Valero , Enrique Fernandez, Dynamically Controlled Resource Allocation in SMT Processors, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.171-182, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.17]
|
 |
5
|
|
 |
6
|
|
 |
7
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
 |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Glew, A. 1998. MLP yes! ILP no! In Proceedings of the Architectural Support for Programming Languages and Operating Systems Wild and Crazy Idea Session.
|
| |
13
|
John, L. K. 2006. Aggregating performance metrics over a benchmark suite. In Performance Evaluation and Benchmarking, L. K. John and L. Eeckhout Eds. CRC Press, Boca Raton, FL, 47--58.
|
| |
14
|
Karkhanis, T. and Smith, J. E. 2002. A day in the life of a data cache miss. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues (WMPI'02). ACM, New York.
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
Luo, K., Gummaraju, J., and Franklin, M. 2001. Balancing throughput and fairness in SMT processors. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'01). IEEE, Los Alamitos, CA, 164--171.
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
Ramirez, T., Pajuelo, A., Santana, O. J., and Valero, M. 2008. Runahead threads to improve SMT performance. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA'08). IEEE, Los Alamitos, CA, 149--158.
|
 |
27
|
|
 |
28
|
|
 |
29
|
|
 |
30
|
Srikanth T. Srinivasan , Ravi Rajwar , Haitham Akkary , Amit Gandhi , Mike Upton, Continual flow pipelines, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
 |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
Tullsen, D. 1996. Simulation and modeling of a simultaneous multithreading processor. In Proceedings of the 22nd Annual Computer Measurement Group Conference. Curran Associates, Red Hook, NY.
|
| |
35
|
|
 |
36
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
37
|
|
| |
38
|
Eric Tune , Rakesh Kumar , Dean M. Tullsen , Brad Calder, Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.183-194, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.8]
|
 |
39
|
|
|