| Height reduction of control recurrences for ILP processors |
| Full text |
Pdf
(1.36 MB)
|
| Source
|
International Symposium on Microarchitecture
archive
Proceedings of the 27th annual international symposium on Microarchitecture
table of contents
San Jose, California, United States
Pages: 40 - 51
Year of Publication: 1994
ISBN:0-89791-707-3
|
|
Authors
|
|
Michael Schlansker
|
Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA
|
|
Vinod Kathail
|
Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA
|
|
Sadun Anik
|
Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 11, Citation Count: 12
|
|
|
ABSTRACT
The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the program. While height reduction techniques are not always helpful, their utility can be demonstrated in a broad range of important situations.This paper focuses on the height reduction of control recurrences within loops with data dependent exits. Loops with exits are transformed so as to alleviate performance bottlenecks resulting from control dependences. A compilation approach to effect these transformations is described. The techniques presented in this paper used in combination with prior work on reducing the height of data dependences provide a comprehensive approach to accelerating loops with conditional exits.In many cases, loops with conditional exits provide a degree of parallelism traditionally associated with vectorization. Multiple iterations of a loop can be retired in a single cycle on a processor with adequate instruction level parallelism with no cost in code redundancy. In more difficult cases, height reduction requires redundant computation or may not be feasible.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
A. Nicolau and J. A. Fisher. Measuring the parallelism available for very long instruction word architectures. IEEE Transactions on Computers C-33, 11 (1984), 968-976.
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
J.A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers C- 30, 7 (1981), 478-490.
|
| |
9
|
|
| |
10
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
 |
11
|
Scott A. Mahlke , William Y. Chen , Roger A. Bringmann , Richard E. Hank , Wen-Mei W. Hwu , B. Ramakrishna Rau , Michael S. Schlansker, Sentinel scheduling: a model for compiler-controlled speculative execution, ACM Transactions on Computer Systems (TOCS), v.11 n.4, p.376-408, Nov. 1993
[doi> 10.1145/161541.159765]
|
| |
12
|
J.A. Fisher. Global code generation for instruction-level parallelism: trace scheduling-2. Technical Report HPL-93- 43, Hewlett-Packard Laboratories. Palo Alto CA, 1993.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
B. R. Rau. Cydra 5 directed dataflow architecture. Proceedings of the COMPCON '88 (San Francisco, 1988). 106-113.
|
| |
19
|
B. Ramakrishna Rau , David W. L. Yen , Wei Yen , Ross A. Towie, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs, Computer, v.22 n.1, p.12-26, 28-30, 32-35, January 1989
[doi> 10.1109/2.19820]
|
| |
20
|
J.C. H, Park and M. S. Schlansker. On predicated execution. Technical Report HPL-91-58, Hewlett-Packard Laboratories, Palo Alto CA, 1991.
|
| |
21
|
V. Kathail, M. S. Schlansker, and B. R. Rau. HPL PlayDoh architecture specification: Version 1.0. Technical Report HPL-93-80, Hewlett-Packard Laboratories, Palo Alto CA, 1993.
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
 |
25
|
James C. Dehnert , Peter Y.-T. Hsu , Joseph P. Bratt, Overlapped loop support in the Cydra 5, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.26-38, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
26
|
|
 |
27
|
Nancy J. Warter , Scott A. Mahlke , Wen-Mei W. Hwu , B. Ramakrishna Rau, Reverse If-Conversion, Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, p.290-299, June 21-25, 1993, Albuquerque, New Mexico, United States
|
 |
28
|
B. Ramakrishna Rau , Michael S. Schlansker , P. P. Tirumalai, Code generation schema for modulo scheduled loops, Proceedings of the 25th annual international symposium on Microarchitecture, p.158-169, December 01-04, 1992, Portland, Oregon, United States
|
| |
29
|
M.S. Schlansker, V. Kathail, and S. Anik. Parallelization of control recurrences for ILP processors. Technical Report HPL-94-75, Hewlett-Packard Laboratories, Palo Alto Ca., 1994.
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David I. August , Wen-mei W. Hwu , Scott A. Mahlke, A framework for balancing control flow and predication, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.92-103, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
|
|
|
|
|
|
|
|
|
|
|
Richard E. Hank , Wen-Mei W. Hwu , B. Ramakrishna Rau, Region-based compilation: an introduction and motivation, Proceedings of the 28th annual international symposium on Microarchitecture, p.158-168, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
C.4
PERFORMANCE OF SYSTEMS
Subjects:
Performance attributes
Additional Classification:
B.
Hardware
B.1
CONTROL STRUCTURES AND MICROPROGRAMMING
C.
Computer Systems Organization
C.0
GENERAL
Subjects:
Instruction set design (e.g., RISC, CISC, VLIW)
C.1
PROCESSOR ARCHITECTURES
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
Subjects:
Optimization
General Terms:
Design,
Performance
Keywords:
back-substitution,
blocked back-substitution,
control dependences,
control height reduction,
loop optimization,
parallelism,
recurrences,
software pipeline
|