|
ABSTRACT
By exploiting fine grain parallelism, superscalar processors can potentially increase the performance of future supercomputers. However, supercomputers typically have a long access delay to their first level memory which can severely restrict the performance of superscalar processors. Compilers attempt to move load instructions far enough ahead to hide this latency. However, conventional movement of load instructions is limited by data dependence analysis. This paper introduces a simple hardware scheme, referred to as preload register update, to allow the compiler to move load instructions even in the presence of inconclusive data dependence analysis results. Preload register update keeps the load destination registers coherent when load instructions are moved past store instructions that reference the same location. With this addition, superscalar processors can more effectively tolerate longer data access latencies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Pohua P. Chang , Scott A. Mahlke , William Y. Chen , Nancy J. Warter , Wen-mei W. Hwu, IMPACT: an architectural framework for multiple-instruction-issue processors, Proceedings of the 18th annual international symposium on Computer architecture, p.266-275, May 27-30, 1991, Toronto, Ontario, Canada
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
Gina Goff , Ken Kennedy , Chau-Wen Tseng, Practical dependence testing, Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, p.15-29, June 24-28, 1991, Toronto, Ontario, Canada
|
| |
7
|
V. A. Guarna Jr., "Analysis of C programs for parallelization in the presence of pointers," Master's thesis, Center for Supercomputing Research and Development, University of Illinois, Urbana-Champaign, Illinois, 1987.
|
 |
8
|
|
 |
9
|
Pohua P. Chang , William Y. Chen , Scott A. Mahlke , Wen-mei W. Hwu, Comparing static and dynamic code scheduling for multiple-instruction-issue processors, Proceedings of the 24th annual international symposium on Microarchitecture, p.25-33, September 1991, Albuquerque, New Mexico, Puerto Rico
[doi> 10.1145/123465.123471]
|
| |
10
|
P. G. Emma, J. W. Knight, III, J. H. Pomerene, R. N. Rechtschaffen, and F. J. Sparacio, "Posting out-ofsequence fetches," Feb. 1991. United States Patent No. 4991090.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
S. A. Mahlke, W. Y. Chen, W. W. Hwu, B. R. Rau, and M. S. Schlansker, "Sentinel scheduling for VLIW and superscalar processors," tech. rep., Center for Reliable and High-Performance Computing, University of illinois, Urbana, IL, Dec. 1991.
|
| |
15
|
|
 |
16
|
|
| |
17
|
|
 |
18
|
Robert P. Colwell , Robert P. Nix , John J. O'Donnell , David B. Papworth , Paul K. Rodman, A VLIW architecture for a trace scheduling compiler, Proceedings of the second international conference on Architectual support for programming languages and operating systems, p.180-192, October 1987, Palo Alto, California, United States
|
CITED BY 6
|
|
|
|
|
Yoji Yamada , John Gyllenhall , Grant Haab , Wen-mei Hwu, Data relocation and prefetching for programs with large data sets, Proceedings of the 27th annual international symposium on Microarchitecture, p.118-127, November 30-December 02, 1994, San Jose, California, United States
|
|
|
|
|
|
|
|
|
Sally A. McKee , William A. Wulf , James H. Aylor , Maximo H. Salinas , Robert H. Klenke , Sung I. Hong , Dee A. B. Weikle, Dynamic Access Ordering for Streamed Computations, IEEE Transactions on Computers, v.49 n.11, p.1255-1271, November 2000
|
|
|
|
|