| Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor |
| Full text |
Pdf
(155 KB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 15th international conference on Supercomputing
table of contents
Sorrento, Italy
Pages: 368 - 380
Year of Publication: 2001
ISBN:1-58113-410-X
|
|
Authors
|
|
Chong-Liang Ooi
|
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN
|
|
Seon Wook Kim
|
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN
|
|
Il Park
|
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN
|
|
Rudolf Eigenmann
|
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN
|
|
Babak Falsafi
|
Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
|
|
T. N. Vijaykumar
|
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 31, Citation Count: 14
|
|
|
ABSTRACT
Recent proposals for Chip Multiprocessors (CMPs) advocate speculative, or implicit, threading in which the hardware employs prediction to peel off instruction sequences (i.e., implicit threads) from the sequential execution stream and speculatively executes them in parallel on multiple processor cores. These proposals augment a conventional multiprocessor, which employs explicit threading, with the ability to handle implicit threads. Current proposals focus on only implicitly-threaded code sections. This paper identifies, for the first time, the issues in combining explicit and implicit threading. We present the Multiplex architecture to combine the two threading models. Multiplex exploits the similarities between implicit and explicit threading, and provides a unified support for the two threading models without additional hardware. Multiplex groups a subset of protocol states in an implicitly-threaded CMP to provide a write-invalidate protocol for explicit threads.
Using a fully-integrated compiler infrastructure for automatic generation of Multiplex code, this paper presents a detailed performance analysis for entire benchmarks, instead of just implicitly-threaded sections, as done in previous papers. We show that neither threading models alone performs consistently better than the other across the benchmarks. A CMP with four dual-issue CPUs achieves a speedup of 1.48 and 2.17 over one dual-issue CPU, using implicit-only and explicit-only threading, respectively. Multiplex matches or outperforms the better of the two threading models for every benchmark, and a four-CPU Multiplex achieves a speedup of 2.63. Our detailed analysis indicates that the dominant overheads in an implicitly-threaded CMP are speculation state overflow due to limited L1 cache capacity, and load imbalance and data dependences in fine-grain threads.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vikas Agarwal , M. S. Hrishikesh , Stephen W. Keckler , Doug Burger, Clock rate versus IPC: the end of the road for conventional microarchitectures, Proceedings of the 27th annual international symposium on Computer architecture, p.248-259, June 2000, Vancouver, British Columbia, Canada
|
| |
2
|
U. Banerjee, R. Eigenmann, A. Nicolau, and D. A. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211-243, Feb. 1993.
|
| |
3
|
M. Berry, D. Chen, P. Koss, D. Kuck, L. Pointer, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orszag, F. Seidl, O. Johnson, G. Swanson, R. Goodrum, and J. Martin. The Perfect Club Benchmarks: Effective performance evaluation of supercomputers. International Journal of Supercomputer Applications, 3(3):5-40, 1989.
|
| |
4
|
William Blume , Ramon Doallo , Rudolf Eigenmann , John Grout , Jay Hoeflinger , Thomas Lawrence , Jaejin Lee , David Padua , Yunheung Paek , Bill Pottenger , Lawrence Rauchwerger , Peng Tu, Parallel Programming with Polaris, Computer, v.29 n.12, p.78-82, December 1996
[doi> 10.1109/2.546612]
|
| |
5
|
|
 |
6
|
Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, The anatomy of the register file in a multiscalar processor, Proceedings of the 27th annual international symposium on Microarchitecture, p.181-190, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192750]
|
| |
7
|
B. Case. Spec95 retires spec92. Microprocessor Report, August 21 1995.
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
Gina Goff , Ken Kennedy , Chau-Wen Tseng, Practical dependence testing, Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, p.15-29, June 24-28, 1991, Toronto, Ontario, Canada
|
| |
12
|
|
 |
13
|
Junjie Gu , Zhiyuan Li , Gyungho Lee, Experience with efficient array data flow analysis for array privatization, Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.157-167, June 18-21, 1997, Las Vegas, Nevada, United States
|
| |
14
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
| |
15
|
|
 |
16
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
| |
17
|
|
| |
18
|
M. Horowitz, R. Ho, and K. Mai. The future of wires. In Proceedings of the Semiconductor Research Corporation Workshop on Interconnects for Systems on a Chip, May 1999.
|
 |
19
|
|
 |
20
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
 |
21
|
|
| |
22
|
|
 |
23
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
24
|
|
| |
25
|
|
| |
26
|
|
 |
27
|
|
 |
28
|
J. Greggory Steffan , Christopher B. Colohan , Antonia Zhai , Todd C. Mowry, A scalable approach to thread-level speculation, Proceedings of the 27th annual international symposium on Computer architecture, p.1-12, June 2000, Vancouver, British Columbia, Canada
|
| |
29
|
|
| |
30
|
M. Tremblay. An architecture for the new millennium. In Proceedings of the 1999 Hot Chips Symposium, August 1999.
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
|
| |
35
|
|
|