| Requirements for optimal execution of oops with tests |
| Full text |
Pdf
(782 KB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 2nd international conference on Supercomputing
table of contents
St. Malo, France
Pages: 230 - 237
Year of Publication: 1988
ISBN:0-89791-272-1
|
|
Author
|
|
A. Uht
|
Univ. of California, La Jolla, CA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 13, Citation Count: 3
|
|
|
ABSTRACT
Both the efficient execution of branch intensive code and knowing the bounds on same are important issues in computing in general and supercomputing in particular. In prior work, it has been suggested, implied, or left as a possible maximum, that the hardware needed to execute code with branches optimally, i.e., oracular performance, is exponentially dependent on the total number of dynamic branches to be executed, this number of branches being proportional at least to the number of iterations of the loop. For classes of code taking at least one cycle per iteration to execute, this is not the case. For loops containing one test (normally in the form of a Boolean recurrence of order 1), it is shown that the hardware necessary varies from exponential to polynomial in the length of the dependency cycle L, while execution time varies from one time cycle per iteration to less than L time cycles per iteration; the variation depends on specific code dependencies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Baneajee, U. and Gajski, D. Fast Execution of Loops With IF Statements. IEEE Transactions on Computers C-33(11):1030-1033, November, 1984.
|
| |
3
|
Cytron, R. G. Doacross: Beyond Vectorization for Multiproc~ssors (Extended Abstract). In Proceedings of the 1986 International Conference on Parallel Processing, pages 836-844. Pennsylvania State University and the IEEE Computer Society, August, 1986.
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
Riseman, E. M. and Foster, C. C. The Inhibition of Potential Parallelism by Conditional Jumps. IEEE Transactions on Computers :1405-1411, December, 1972.
|
 |
8
|
Bogong Su , Shiyuan Ding , Jian Wang , Jinshi Xia, GURPR—a method for global software pipelining, Proceedings of the 20th annual workshop on Microprogramming, p.88-96, December 01-04, 1987, Colorado Springs, Colorado, United States
[doi> 10.1145/255305.255322]
|
| |
9
|
Tomasulo, R. M. An Efficient Algorithm for Expoiting Multiple Arithmetic Units. IBM Journal :25-33, january, 1967.
|
| |
10
|
|
| |
11
|
Uht, A. K. and Wedig, R. G. Hardware Extraction of Low-level Concurrency from Serial Instruction Streams. In Proceedings of the International Conference on Parallel Processing, pages 729-736. IEEE Computer Society and the Association for Computing Machinery, August, 1986.
|
| |
12
|
|
 |
13
|
Augustus K. Uht , Constantine D. Polychronopoulos , John F. Kolen, On the combination of hardware and software concurrency extraction methods, Proceedings of the 20th annual workshop on Microprogramming, p.133-141, December 01-04, 1987, Colorado Springs, Colorado, United States
[doi> 10.1145/255305.255331]
|
CITED BY 3
|
|
H. Yamana , T. Hagiwara , J. Kohdate , Y. Muraoka, A preceding activation scheme with graph unfolding for the parallel processing system-array, Proceedings of the 1989 ACM/IEEE conference on Supercomputing, p.675-684, November 12-17, 1989, Reno, Nevada, United States
|
|
|
|
|
|
|
|