| An empirical study of decentralized ILP execution models |
| Full text |
Pdf
(1.25 MB)
|
| Source
|
Architectural Support for Programming Languages and Operating Systems
archive
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
table of contents
San Jose, California, United States
Pages: 272 - 281
Year of Publication: 1998
ISBN:1-58113-107-0
Also published in ...
|
|
Authors
|
|
Narayan Ranganathan
|
System Validation, 2501 NW 229th, RA2-302, Intel Corporation, Hillsboro, OR
|
|
Manoj Franklin
|
Electrical Engineering Department, University of Maryland, College Park, MD
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 26, Citation Count: 6
|
|
|
ABSTRACT
Recent fascination for dynamic scheduling as a means for exploiting instruction-level parallelism has introduced significant interest in the scalability aspects of dynamic scheduling hardware. In order to overcome the scalability problems of centralized hardware schedulers, many decentralized execution models are being proposed and investigated recently. The crux of all these models is to split the instruction window across multiple processing elements (PEs) that do independent, scheduling of instructions. The decentralized execution models proposed so far can be grouped under 3 categories, based on the criterion used for assigning an instruction to a particular PE. They are: (i) execution unit dependence based decentralization (EDD), (ii) control dependence based decentralization (CDD), and (iii) data dependence based decentralization (DDD). This paper investigates the performance aspects of these three decentralization approaches. Using a suite of important benchmarks and realistic system parameters, we examine performance differences resulting from the type of partitioning as well as from specific implementation issues such as the type of PE interconnect.We found that with a ring-type PE interconnect, the DDD approach performs the best when the number of PEs is moderate, and that the CDD approach performs best when the number of PEs is large. The currently used approach---EDD---does not perform well for any configuration. With a realistic crossbar, performance does not increase with the number of PEs for any of the partitioning approaches. The results give insight into the best way to use the transistor budget available for implementing the instruction window.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
4
|
|
 |
5
|
|
| |
6
|
G. A. Kemp and M. Franklin, "PEWs: A Decentralized Dynamic Scheduler for ILP Processing," Proc. International Conference on Parallel Processing (ICPP), Vol. I, pp. 239-246, 1996.
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
11
|
N. Ranganathan and M. Franklin, "Complexity~ Effective PEWs Microarchitecture," (to appear in) Microprocessors and Microsystems.
|
| |
12
|
|
| |
13
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
14
|
|
| |
15
|
|
| |
16
|
R. M. Tomasulo, "An Efficient Algorithm for Exploiting Multiple Arithmetic Units," IBM Journal of Research and Development, pp. 25-33, January 1967.
|
| |
17
|
|
 |
18
|
Gary Tyson , Matthew Farrens , Andrew R. Pleszkun, MISC: a Multiple Instruction Stream Computer, Proceedings of the 25th annual international symposium on Microarchitecture, p.193-196, December 01-04, 1992, Portland, Oregon, United States
|
 |
19
|
|
| |
20
|
|
INDEX TERMS
Primary Classification:
F.
Theory of Computation
F.2
ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
F.2.2
Nonnumerical Algorithms and Problems
Subjects:
Sequencing and scheduling
Additional Classification:
C.
Computer Systems Organization
C.0
GENERAL
Subjects:
Instruction set design (e.g., RISC, CISC, VLIW)
D.
Software
D.4
OPERATING SYSTEMS
D.4.1
Process Management
Subjects:
Scheduling
F.
Theory of Computation
F.1
COMPUTATION BY ABSTRACT DEVICES
F.1.2
Modes of Computation
Subjects:
Parallelism and concurrency
G.
Mathematics of Computing
G.4
MATHEMATICAL SOFTWARE
Subjects:
Algorithm design and analysis
General Terms:
Algorithms,
Design,
Experimentation,
Measurement,
Performance,
Theory
Keywords:
control dependence,
data dependence,
decentralization,
dynamic scheduling,
execution unit dependence,
hardware window,
instruction-level parallelism,
speculative execution
|