|
ABSTRACT
A mechanistic model for out-of-order superscalar processors is developed and then applied to the study of microarchitecture resource scaling. The model divides execution time into intervals separated by disruptive miss events such as branch mispredictions and cache misses. Each type of miss event results in characterizable performance behavior for the execution time interval. By considering an interval's type and length (measured in instructions), execution time can be predicted for the interval. Overall execution time is then determined by aggregating the execution time over all intervals. The mechanistic model provides several advantages over prior modeling approaches, and, when estimating performance, it differs from detailed simulation of a 4-wide out-of-order processor by an average of 7%. The mechanistic model is applied to the general problem of resource scaling in out-of-order superscalar processors. First, we use the model to determine size relationships among microarchitecture structures in a balanced processor design. Second, we use the mechanistic model to study scaling of both pipeline depth and width in balanced processor designs. We corroborate previous results in this area and provide new results. For example, we show that at optimal design points, the pipeline depth times the square root of the processor width is nearly constant. Finally, we consider the behavior of unbalanced, overprovisioned processor designs based on insight gained from the mechanistic model. We show that in certain situations an overprovisioned processor may lead to improved overall performance. Designs where a processor's dispatch width is wider than its issue width are of particular interest.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vikas Agarwal , M. S. Hrishikesh , Stephen W. Keckler , Doug Burger, Clock rate versus IPC: the end of the road for conventional microarchitectures, Proceedings of the 27th annual international symposium on Computer architecture, p.248-259, June 2000, Vancouver, British Columbia, Canada
|
 |
2
|
|
| |
3
|
Brooks, D., Martonosi, M., and Bose, P. 2000. Abstraction via separable components: An empirical study of absolute and relative accuracy in processor performance modeling. Tech. rep. RC 21909, IBM Research Division, T. J. Watson Research Center. December.
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
Stijn Eyerman , Lieven Eeckhout , Tejas Karkhanis , James E. Smith, A performance counter architecture for computing accurate CPI components, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
| |
13
|
|
| |
14
|
Eyerman, S., Smith, J. E., and Eeckhout, L. 2006b. Characterizing the branch misprediction penalty. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 48--58.
|
 |
15
|
|
| |
16
|
Glew, A. 1998. MLP yes! ILP no! In ASPLOS Wild and Crazy Idea Session.
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
M. S. Hrishikesh , Doug Burger , Norman P. Jouppi , Stephen W. Keckler , Keith I. Farkas , Premkishore Shivakumar, The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, Proceedings of the 29th annual international symposium on Computer architecture, May 25-29, 2002, Anchorage, Alaska
|
 |
21
|
Engin Ïpek , Sally A. McKee , Rich Caruana , Bronis R. de Supinski , Martin Schulz, Efficiently exploring architectural design spaces via predictive modeling, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
| |
22
|
Joseph, P. J., Vaswani, K., and Thazhuthaveetil, M. J. 2006a. Construction and use of linear regression models for processor performance analysis. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture (HPCA), 99--108.
|
| |
23
|
|
| |
24
|
Karkhanis, T. and Smith, J. E. 2002. A day in the life of a data cache miss. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues (WMPI) held in conjunction with ISCA.
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
 |
32
|
|
| |
33
|
|
| |
34
|
|
 |
35
|
|
 |
36
|
Daniel J. Sorin , Vijay S. Pai , Sarita V. Adve , Mary K. Vernon , David A. Wood, Analytic evaluation of shared-memory systems with ILP processors, Proceedings of the 25th annual international symposium on Computer architecture, p.380-391, June 27-July 02, 1998, Barcelona, Spain
|
 |
37
|
|
 |
38
|
Srikanth T. Srinivasan , Ravi Rajwar , Haitham Akkary , Amit Gandhi , Mike Upton, Continual flow pipelines, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
39
|
Viji Srinivasan , David Brooks , Michael Gschwind , Pradip Bose , Victor Zyuban , Philip N. Strenski , Philip G. Emma, Optimizing pipelines for power and performance, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
40
|
|
| |
41
|
|
 |
42
|
|
| |
43
|
|
|