|
||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||
ABSTRACT
In the presence of a long-latency instruction as a L2 miss, the issue queue (IQ) may fill with instructions dependent on the L2 miss; consequently, the IQ will not expose instruction-level parallelism until resolving the miss. In the scope of memory-latency tolerant processors, we propose delaying the insertion into the IQ of the instructions dependent on load instructions predicted to miss L2. These instructions will be stored in an instruction buffer instead of being inserted in the IQ. After resolving the L2 miss, the dependent instructions will be inserted into the IQ. Results show that the proposal reduces the total number of replays from 37% (integer benchs) to 61% (floating-point benchs), the average performance degradation is, at most, 2%, and the average overall-chip energy-consumption reduction is around 8% in FP benchs. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
General Terms:
Keywords:
Collaborative Colleagues:
|
||||||||||||||||||||||||||||||||||||||||||||||