|
ABSTRACT
For maximum performance, an out-of-order processor must issue load instructions as early as possible, while avoiding memory-order violations with prior store instructions that write to the same memory location. One approach is to use memory dependence prediction to identify the stores upon which a load depends, and communicate that information to the instruction scheduler. We designate the set of stores upon which each load has depended as the load's "store set". The processor can discover and use a load's store set to accurately predict the earliest time the load can safely execute. We show that store sets accurately predict memory dependencies in the context of large instruction window, superscalar machines, and allow for near-optimal performance compared to an instruction scheduler with perfect knowledge of memory dependencies. In addition, we explore the implementation aspects of store sets, and describe a low cost implementation that achieves nearly optimal performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
K. Ycagcr. The MIPS R10000 Supcrscalar Microprocessor. Hot Chips VII, 1995.
|
| |
3
|
|
| |
4
|
Intel Corporation. Pentium(R) Pro Developer's Manual. McGraw-Hill, June 1997.
|
 |
5
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
| |
6
|
J. Hesson, J. LeBlanc, S. Ciavaglia. Apparatus to Dynamically Control the Out-Of-Order Execution Of Load-Store Instructions, US. Patent 5,615,350, Filed Dec. 1995, Issued Mar. 1997.
|
| |
7
|
|
| |
8
|
|
| |
9
|
S. McFarling. Combining Branch Predictors. WRL Technical Note TN-36, June I993.
|
| |
10
|
|
| |
11
|
S.Steely, D. Sager, D. Fite. Memory Reference Tagging US Patent 5,619,662 Filed Aug. 1994, Issued Apr. 1997.
|
| |
12
|
|
CITED BY 85
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Byung-Kwon Chung , Jinsuo Zhang , Jih-Kwon Peir , Shih-Chang Lai , Konrad Lai, Direct load: dependence-linked dataflow resolution of load address and cache coordinate, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
|
|
Jih-Kwon Peir , Shih-Chang Lai , Shih-Lien Lu , Jared Stark , Konrad Lai, Bloom filtering cache misses for accurate data speculation and prefetching, Proceedings of the 16th international conference on Supercomputing, June 22-26, 2002, New York, New York, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael Bekerman , Adi Yoaz , Freddy Gabbay , Stephan Jourdan , Maxim Kalaev , Ronny Ronen, Early load address resolution via register tracking, ACM SIGARCH Computer Architecture News, v.28 n.2, p.306-315, May 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amirali Baniasadi , Andreas Moshovos, Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.337-347, December 2000, Monterey, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Stephen Jourdan , Ronny Ronen , Michael Bekerman , Bishara Shomar , Adi Yoaz, A novel renaming scheme to exploit value temporal locality through physical register reuse and unification, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.216-225, November 1998, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mingju Li , Elizabeth Varki , Swapnil Bhatia , Arif Merchant, TaP: table-based prefetching for storage caches, Proceedings of the 6th USENIX Conference on File and Storage Technologies, p.1-16, February 26-29, 2008, San Jose, California
|
|
|
Antonia Zhai , Christopher B. Colohan , J. Gregory Steffan , Todd C. Mowry, Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, p.39, March 20-24, 2004, Palo Alto, California
|
|
|
Neil Vachharajani , Matthew J. Bridges , Jonathan Chang , Ram Rangan , Guilherme Ottoni , Jason A. Blome , George A. Reis , Manish Vachharajani , David I. August, RIFLE: An Architectural Framework for User-Centric Information-Flow Security, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.243-254, December 04-08, 2004, Portland, Oregon
|
|
|
|
|
|
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, Selective writeback: exploiting transient values for energy-efficiency and performance, Proceedings of the 2006 international symposium on Low power electronics and design, October 04-06, 2006, Tegernsee, Bavaria, Germany
|
|
|
|
|
|
Changpeng Fang , Steve Carr , Soner Önder , Zhenlin Wang, Feedback-directed memory disambiguation through store distance analysis, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
|
|
|
|
|
|
|
|
|
Steven Swanson , Andrew Schwerin , Martha Mercaldi , Andrew Petersen , Andrew Putnam , Ken Michelson , Mark Oskin , Susan J. Eggers, The WaveScalar architecture, ACM Transactions on Computer Systems (TOCS), v.25 n.2, p.4-es, May 2007
|
|
|
Deniz Balkan , Joseph Sharkey , Dmitry Ponomarev , Kanad Ghose, SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency, Proceedings of the 15th international conference on Parallel architectures and compilation techniques, September 16-20, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
Smruti R. Sarangi , Wei Liu, Josep Torrellas , Yuanyuan Zhou, ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.257-270, November 12-16, 2005, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Aaron Smith , Jon Gibson , Bertrand Maher , Nick Nethercote , Bill Yoder , Doug Burger , Kathryn S. McKinle , Jim Burrill, Compiling for EDGE Architectures, Proceedings of the International Symposium on Code Generation and Optimization, p.185-195, March 26-29, 2006
|
|
|
|
|
|
|
|
|
|
|
|
Fernando Castro , Luis Pinuel , Daniel Chaver , Manuel Prieto , Michael Huang , Francisco Tirado, DMDC: Delayed Memory Dependence Checking through Age-Based Filtering, Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, p.297-308, December 09-13, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Miquel Pericàs , Adrian Cristal , Francisco J. Cazorla , Ruben González , Alex Veidenbaum , Daniel A. Jiménez , Mateo Valero, A Two-Level Load/Store Queue Based on Execution Locality, ACM SIGARCH Computer Architecture News, v.36 n.3, p.25-36, June 2008
|
|
|
|
|
|
F. Castro , D. Chaver , L. Pinuel , M. Prieto , F. Tirado, Using age registers for a simple load-store queue filtering, Journal of Systems Architecture: the EUROMICRO Journal, v.55 n.2, p.79-89, February, 2009
|
|
|
|
|
|
|
|
|
|
|