|
ABSTRACT
To enable concurrent instruction execution, scientific computers generally rely on pipelining, which combines with faster system clocks to achieve greater throughput. Each concurrently executing instruction requires buffer space, usually implemented as a register, to receive its result. This paper focuses on the issue of how many registers are required to achieve optimal performance in pipelined scientific computers. Four machine models are considered: single, double, and triple issue scalar machines, and vector machines with various register lengths. A model is presented that accurately relates the register requirements for optimum performance cyclically scheduled loops with tree-dependence graphs to the degree of function unit pipelining, the instruction issue bandwidth, and code properties. A method for finding upper and lower bounds on the minimum register requirements is also presented.
The result of this work is a theory for assessing register requirements that can be used to reveal fundamental differences among machines within a space of architectural and implementation design choices. Some experimental data is also provided to support the theory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. J. Flynn, "Very High-Speed Computing Systems," Proc. IEEB, vol. 54, pp. 1901-1909, December 1966.
|
 |
2
|
Robert P. Colwell , Robert P. Nix , John J. O'Donnell , David B. Papworth , Paul K. Rodman, A VLIW architecture for a trace scheduling compiler, Proceedings of the second international conference on Architectual support for programming languages and operating systems, p.180-192, October 1987, Palo Alto, California, United States
|
| |
3
|
B. J. Smith, "Architecture and Applications of the HEP Multiprocessor Computer System," Real Time Signal Procesing IV, vol. 298, August 1981.
|
 |
4
|
Y. N. Patt , W. M. Hwu , M. Shebanow, HPS, a new microarchitecture: rationale and introduction, Proceedings of the 18th annual workshop on Microprogramming, p.103-108, December 03-06, 1985, Pacific Grove, California, United States
|
| |
5
|
D. W. Anderson, F. J. Sparacio, and R. M. Tomasulo, "IBM System/360 Model 91: Machine Philosophy and Instruction Handling," IBM Journal of Research and Development, pp. 8-24, January 1967.
|
 |
6
|
William Mangione-Smith , Santosh G. Abraham , Edward S. Davidson, Vector register design for polycyclic vector scheduling, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.154-163, April 08-11, 1991, Santa Clara, California, United States
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
B. Ramakrishna Rau , Christopher D. Glaeser , Raymond L. Picard, Efficient code generation for horizontal architectures: Compiler techniques and architectural support, Proceedings of the 9th annual symposium on Computer Architecture, p.131-139, April 26-29, 1982, Austin, Texas, United States
|
| |
11
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
CITED BY 20
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alexandre E. Eichenberger , Edward S. Davidson , Santosh G. Abraham, Minimum register requirements for a modulo schedule, Proceedings of the 27th annual international symposium on Microarchitecture, p.75-84, November 30-December 02, 1994, San Jose, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
Josep Llosa , Mateo Valero , Eduard Ayguadé , Antonio González, Hypernode reduction modulo scheduling, Proceedings of the 28th annual international symposium on Microarchitecture, p.350-360, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
|
David López , Mateo Valero , Josep Llosa , Eduard Ayguadé, Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs, Proceedings of the 11th international conference on Supercomputing, p.12-19, July 07-11, 1997, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE Design Automation Conference on
Gwo-Dong Chen
, Daniel D. Gajski
|