|
ABSTRACT
The performance of pipelined processors is severely limited by data dependencies. In order to achieve high performance, a mechanism to alleviate the effects of data dependencies must exist. If a pipelined CPU with multiple functional units is to be used in the presence of a virtual memory hierarchy, a mechanism must also exist for determining the state of the machine precisely. In this paper, we combine the issues of dependency-resolution and preciseness of state. We present a design for instruction issue logic that resolves dependencies dynamically and, at the same time, guarantees a precise state of the machine, without a significant hardware overhead. Detailed simulation studies for the proposed mechanism, using the Lawrence Livermore loops as a benchmark, are presented.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P.M. Kogge, The Architecture of Pipelined Computers. New York: McGraw-Hill, 1981.
|
| |
2
|
D.W. Anderson, F. J. Sparacio, and R. M. Tomasulo, "The IBM System/360 Model 91: Machine Philosophy and Instruction- Handling," IBM Journal of Research and Development, pp. 8-24, January 1967.
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
J. K. F. Lee and A. J. Smith, "Branch Prediction Strategies and Branch Target Buffer Design," IEEE Computer, vol. 17, pp. 6-22, January 1984.
|
 |
7
|
|
 |
8
|
A. R. Pleszkun , J. R. Goodman , W. C. Hsu , R. T. Joersz , G. Bier , P. Woest , P. B. Schechter, WISQ: a restartable architecture using queues, Proceedings of the 14th annual international symposium on Computer architecture, p.290-299, June 02-05, 1987, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/30350.30383]
|
 |
9
|
John Hennessy , Norman Jouppi , Forest Baskett , Thomas Gross , John Gill, Hardware/software tradeoffs for increased performance, Proceedings of the first international symposium on Architectural support for programming languages and operating systems, p.2-11, March 01-03, 1982, Palo Alto, California, United States
|
| |
10
|
R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal of Research and Development, pp. 25-33, January 1967.
|
| |
11
|
CRAY-1 Computer Systems, Hardware Reference Manual. Chippewa Falls, WI: Cray Research, Inc., 1982.
|
| |
12
|
N. Pang and J. E. Smith, "CRAY-1 Simulation Tools," Tech. Report ECE-83-11, University of Wisconsin-Madison, Dec. 1983.
|
 |
13
|
|
| |
14
|
F. H. McMahon, FORTRAN CPU Performance Analysis. Lawrence Livermore Laboratories, 1972.
|
 |
15
|
|
CITED BY 36
|
|
Pohua P. Chang , William Y. Chen , Scott A. Mahlke , Wen-mei W. Hwu, Comparing static and dynamic code scheduling for multiple-instruction-issue processors, Proceedings of the 24th annual international symposium on Microarchitecture, p.25-33, September 1991, Albuquerque, New Mexico, Puerto Rico
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Amirali Baniasadi , Andreas Moshovos, Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.337-347, December 2000, Monterey, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Michael Butler , Tse-Yu Yeh , Yale Patt , Mitch Alsup , Hunter Scales , Michael Shebanow, Single instruction stream parallelism is greater than two, ACM SIGARCH Computer Architecture News, v.19 n.3, p.276-286, May 1991
|
|
|
|
|
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Improving prediction for procedure returns with return-address-stack repair mechanisms, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.259-271, November 1998, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
Saurabh Chheda , Osman Unsal , Israel Koren , C. Mani Krishna , Csaba Andras Moritz, Combining compiler and runtime IPC predictions to reduce energy in next generation architectures, Proceedings of the 1st conference on Computing frontiers, April 14-16, 2004, Ischia, Italy
|
|
|
|
|
|
Matthew K. Farrens , Pius Ng , Phil Nico, A comparision of superscalar and decoupled access/execute architectures, Proceedings of the 26th annual international symposium on Microarchitecture, p.100-103, December 01-03, 1993, Austin, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques, IEEE Transactions on Computers, v.48 n.11, p.1260-1281, November 1999
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|