|
ABSTRACT
Higher microprocessor frequencies accentuate the performance cost of memory accesses. This is especially noticeable in the Intel's IA32 architecture where lack of registers results in increased number of memory accesses. This paper presents novel, non-speculative technique that partially hides the increasing load-to-use latency, by allowing the early issue of load instructions. Early load address resolution relies on register tracking to safely compute the addresses of memory references in the front-end part of the processor pipeline. Register tracking enables decode-time computation of register values by tracking simple operations of the form reg±immediate. Register tracking may be performed in any pipeline stage following instruction decode and prior to execution.
Several tracking schemes are proposed in this paper:
- Stack pointer tracking allows safe early resolution of stack references by keeping track of the value of the ESP register (the stack pointer). About 25% of all loads are stack loads and 95% of these loads may be resolved in the front-end.
- Absolute address tracking allows the early resolution of constant-address loads.
- Displacement-based tracking tackles all loads with addresses of the form reg±immediate by tracking the values of all general-purpose registers. This class corresponds to 82% of all loads, and about 65% of these loads can be safely resolved in the front-end pipeline.
The paper describes the tracking schemes, analyzes their performance potential in a deeply pipelined processor and discusses the integration of tracking with memory disambiguation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
 |
4
|
Michael Bekerman , Stephan Jourdan , Ronny Ronen , Gilad Kirshenboim , Lihu Rappoport , Adi Yoaz , Uri Weiser, Correlated load-address predictors, Proceedings of the 26th annual international symposium on Computer architecture, p.54-63, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
5
|
|
 |
6
|
Sangyeun Cho , Pen-Chung Yew , Gyungho Lee, Decoupling local variable accesses in a wide-issue superscalar processor, Proceedings of the 26th annual international symposium on Computer architecture, p.100-110, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
7
|
|
 |
8
|
|
| |
9
|
R. J. Eickemeyer and S. Vassiliadis, A Load-Instruction Unit for Pipelined Processors, in IBM Journal of Research and Development, July 93.
|
 |
10
|
|
 |
11
|
|
| |
12
|
Pentium Pro Family Developer Manual, Volume 2: Programmer s Reference Manual, Intel Corporation, 1996
|
| |
13
|
Stephen Jourdan , Ronny Ronen , Michael Bekerman , Bishara Shomar , Adi Yoaz, A novel renaming scheme to exploit value temporal locality through physical register reuse and unification, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.216-225, November 1998, Dallas, Texas, United States
|
 |
14
|
Mikko H. Lipasti , Christopher B. Wilkerson , John Paul Shen, Value locality and load value prediction, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.138-147, October 01-04, 1996, Cambridge, Massachusetts, United States
|
 |
15
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
| |
16
|
|
| |
17
|
|
| |
18
|
R. Valentine, G. Sheaffer, R. Ronen, I. Spillinger and A. Yoaz, Out-of-order Superscalar Microprocessor with a Renaming Device that Maps Instructions from Memory to Registers, U.S. Patent 5,838,941, November 1998.
|
 |
19
|
Adi Yoaz , Mattan Erez , Ronny Ronen , Stephan Jourdan, Speculation techniques for improving load related instruction scheduling, Proceedings of the 26th annual international symposium on Computer architecture, p.42-53, May 01-04, 1999, Atlanta, Georgia, United States
|
CITED BY 15
|
|
Byung-Kwon Chung , Jinsuo Zhang , Jih-Kwon Peir , Shih-Chang Lai , Konrad Lai, Direct load: dependence-linked dataflow resolution of load address and cache coordinate, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
|
|
|
Michael Huang , Jose Renau , Seung-Moon Yoo , Josep Torrellas, L1 data cache decomposition for energy efficiency, Proceedings of the 2001 international symposium on Low power electronics and design, p.10-15, August 2001, Huntington Beach, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David N. Armstrong , Hyesoon Kim , Onur Mutlu , Yale N. Patt, Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.119-128, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|