|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
There is an increasing trend to use commodity microprocessors as the compute engines in large-scale multiprocessors. However, given that the majority of the microprocessors are sold in the workstation market, not in the multiprocessor market, it is only natural that architectural features that benefit only multiprocessors are less likely to be adopted in commodity microprocessors. In this paper, we explore multiple-context processors, an architectural technique proposed to hide the large memory latency in multiprocessors. We show that while current multiple-context designs work reasonably well for multiprocessors, they are ineffective in hiding the much shorter uniprocessor latencies using the limited parallelism found in workstation environments. We propose an alternative design that combines the best features of two existing approaches, and present simulation results that show it yields better performance for both multiprogrammed workloads on a workstation and parallel applications on a multiprocessor. By addressing the needs of the workstation environment, our proposal makes multiple contexts more attractive for commodity microprocessors.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Cray Research, Incorporated. Cray T3D Technical Summary, October 1993.
|
| |
3
|
|
| |
4
|
George E. Daddis Jr. and H. C. Tomg. The concurrent execution of multiple instruction streams on superscalar processors. In Proceedings of the 1991 International Conference on Parallel Processing, volume I, pages 76--83, 1991.
|
| |
5
|
Helen Davis, Steven R. Goldschmidt, and John Hennessy. Multiprocessor simulation and tracing using Tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume II, pages 99-107, August 1991.
|
| |
6
|
Digital Equipment Corporation. DECChip 21064-AA RISC Microprocessor Preliminary Data Sheet, 1992.
|
 |
7
|
Kourosh Gharachorloo , Daniel Lenoski , James Laudon , Phillip Gibbons , Anoop Gupta , John Hennessy, Memory consistency and event ordering in scalable shared-memory multiprocessors, Proceedings of the 17th annual international symposium on Computer Architecture, p.15-26, May 28-31, 1990, Seattle, Washington, United States
|
 |
8
|
Anoop Gupta , John Hennessy , Kourosh Gharachorloo , Todd Mowry , Wolf-Dietrich Weber, Comparative evaluation of latency reducing and tolerating techniques, Proceedings of the 18th annual international symposium on Computer architecture, p.254-263, May 27-30, 1991, Toronto, Ontario, Canada
|
| |
9
|
|
| |
10
|
William Jaffe, Bob Miller, and Jeff Yetter. A 200 MFLOP precision architecture processor. In Hot Chips IV Symposium Record, pages 1.2.1-1.2.13, August 1992.
|
 |
11
|
|
| |
12
|
|
| |
13
|
Kiyoshi Kurihara, David Chaiken, and Anant Agarwal. Latency tolerance through multithreading in large-scale multiprocessors. In Proceedings of the International Symposium on Shared Memory Multiprocessing, pages 91-101, April 1991.
|
| |
14
|
|
| |
15
|
|
 |
16
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Anoop Gupta , John Hennessy, The directory-based cache coherence protocol for the DASH multiprocessor, Proceedings of the 17th annual international symposium on Computer Architecture, p.148-159, May 28-31, 1990, Seattle, Washington, United States
|
 |
17
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
18
|
|
| |
19
|
Amos R. Omondi. Design of a high performance instruction pipeline. Computer Systems Science and Engineering, 6(1): 13-29, January 1991.
|
| |
20
|
R. Guru Prasadh and Chuan-lin Wu. A benchmark evaluation of a multi-threaded RISC processor architecture. In Proceedings of the 1991 International Conference on Parallel Processing, volume I, pages 84-91, 1991.
|
 |
21
|
|
| |
22
|
Burton J. Smith. Architecture and applications of the HEP multiprocessor computer system. SPIE, 298:241-248, 1981.
|
| |
23
|
Michael David Smith. Support for Speculative Execution in High-Performance Processors. PhD thesis, Stanford University, Stanford, California, November 1992.
|
| |
24
|
S. Peter Song and Marvin Denman. The PowerPC 604TM RISC microprocessor. Motorola Luncheon, iSCA '94, April 1994.
|
| |
25
|
|
 |
26
|
|
CITED BY 31
|
|
|
|
|
|
|
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, ACM SIGARCH Computer Architecture News, v.24 n.2, p.191-202, May 1996
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Improving prediction for procedure returns with return-address-stack repair mechanisms, Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, p.259-271, November 1998, Dallas, Texas, United States
|
|
|
Nicholas Weaver , Yury Markovskiy , Yatish Patel , John Wawrzynek, Post-placement C-slow retiming for the xilinx virtex FPGA, Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays, February 23-25, 2003, Monterey, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pradeep K. Dubey , Kevin O'Brien , Kathryn M. O'Brien , Charles Barton, Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading, Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, p.109-121, June 27-29, 1995, Limassol, Cyprus
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eric Tune , Rakesh Kumar , Dean M. Tullsen , Brad Calder, Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.183-194, December 04-08, 2004, Portland, Oregon
|
|
|
|
|