|
ABSTRACT
Buffering more in-flight instructions in an out-of-order microprocessor is a straightforward and effective method to help tolerate the long latencies generally associated with off-chip memory accesses. One of the main challenges of buffering a large number of instructions, however, is the implementation of a scalable and efficient mechanism to detect memory access order violations as a result of out-of-order scheduling of load and store instructions. Traditional CAM-based associative queues can be very slow and energy consuming. In this paper, instead of using the traditional age-based load queue to record load addresses, we explicitly record age information in address-indexed hash tables to achieve the same functionality of detecting premature loads. This alternative design eliminates associative searches and significantly reduces the energy consumption of the load queue. With simple techniques to reduce the number of false positives, performance degradation is kept at a minimum.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
D. Burger and T. Austin. The SimpleScalar Tool Set, Version 2.0. Technical report 1342, Computer Sciences Department, University of Wisconsin-Madison, June 1997.
|
 |
4
|
|
| |
5
|
F. Castro, D. Chaver, L. Pinuel, M. Prieto, M. Huang, and F. Tirado. A Power-Efficient and Scalable Load-Store Queue Design. In International Workshop on Power And Timing Modeling, Optimization and Simulation. September 2005. Lecture Notes in Computer Science Vol. 2236(8):1--9.
|
| |
6
|
Fernando Castro , Daniel Chaver , Luis Pinuel , Manuel Prieto , Francisco Tirado , Michael Huang, Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism., Proceedings of the 2005 International Conference on Computer Design, p.617-624, October 02-05, 2005
[doi> 10.1109/ICCD.2005.70]
|
| |
7
|
Compaq Computer Corporation. Alpha 21264/EV6 Microprocessor Hardware Reference Manual, September 2000. Order number: DS-0027B-TE.
|
 |
8
|
|
 |
9
|
|
| |
10
|
R. Huang, A. Garg, and M. Huang. Software-Hardware Cooperative Memory Disambiguation. In International Symposium on High-Performance Computer Architecture. February 2006.
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
J. Tendler, J. Dodson, J. Fields, H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Journal of Research and Development, Vol. 46(1):5--25, January 2002.
|
 |
17
|
|
CITED BY 3
|
|
Francisco J. Mesa-Martínez , Michael C. Huang , Jose Renau, SEED: scalable, efficient enforcement of dependences, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
Fernando Castro , Luis Pinuel , Daniel Chaver , Manuel Prieto , Michael Huang , Francisco Tirado, DMDC: Delayed Memory Dependence Checking through Age-Based Filtering, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.297-308, December 09-13, 2006
|
|
|
F. Castro , D. Chaver , L. Pinuel , M. Prieto , F. Tirado, Using age registers for a simple load-store queue filtering, Journal of Systems Architecture: the EUROMICRO Journal, v.55 n.2, p.79-89, February, 2009
|
|