|
ABSTRACT
Chip-multiprocessor (CMP) architectures present a challenge for efficient simulation, combining the requirements of a detailed microprocessor simulator with that of a tightly-coupled parallel system. In this paper, a distributed simulator for target CMPs is presented based on the Message Passing Interface (MPI) designed to run on a host cluster of workstations. Microbenchmark-based evaluation is used to narrow the parallelization design space concerning the performance impact of distributed vs. centralized target L2 simulation, blocking vs. non-blocking remote cache accesses, null-message vs. barrier techniques for clock synchronization, and network interconnect selection. The best combination is shown to yield speedups of up to 16 on a 9-node cluster of dual-CPU workstations, partially due to cache effects.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vikas Agarwal , M. S. Hrishikesh , Stephen W. Keckler , Doug Burger, Clock rate versus IPC: the end of the road for conventional microarchitectures, Proceedings of the 27th annual international symposium on Computer architecture, p.248-259, June 2000, Vancouver, British Columbia, Canada
|
| |
2
|
|
 |
3
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
| |
4
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015]
|
| |
5
|
Burger, D. and Austin, T. 1997. The SimpleScalar tool set, version 2.0. Tech. Rep. TR-1342, University of Wisconsin-Madison Computer Sciences Department.
|
| |
6
|
Chandy, K. and Misra, J. 1979. Distributed simulation: A case study in design and verification of distributed programs. IEEE Trans. Soft. Eng., 5(5) 440--452.
|
| |
7
|
Chidester, M., George, A., and Radlinski, M. 2001. Multiple-path execution for chip-multiprocessors. Tech. Rep., HCS Research Lab, Department of Electrical and Computer Engineering, University of Florida.
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
Marco Fillo , Stephen W. Keckler , William J. Dally , Nicholas P. Carter , Andrew Chang , Yevgeny Gurevich , Whay S. Lee, The M-Machine multicomputer, International Journal of Parallel Programming, v.25 n.3, p.183-212, June 1997
[doi> 10.1007/BF02700035]
|
| |
14
|
|
| |
15
|
|
| |
16
|
George, A., Fogarty, R., Markwell, J., and Miars, M. 1999. An Integrated Simulation Environment for parallel and distributed system prototyping. Simulation 75(5), 283--294.
|
| |
17
|
|
| |
18
|
|
| |
19
|
Johnson, D. 2001. HP's Mako processor. Microprocessor Forum 2001.
|
| |
20
|
Kahle, J. 1999. Power4: A dual-CPU processor chip. Microprocessor Forum 1999.
|
| |
21
|
|
 |
22
|
|
| |
23
|
MPI Forum 1994. MPI: A Message-Passing Interface Standard. Message-Passing Interface Forum, www.mpi-forum.org.
|
| |
24
|
Shubhendu S. Mukherjee , Steven K. Reinhardt , Babak Falsafi , Mike Litzkow , Mark D. Hill , David A. Wood , Steven Huss-Lederman , James R. Larus, Wisconsin Wind Tunnel II: A Fast, Portable Parallel Architecture Simulator, IEEE Concurrency, v.8 n.4, p.12-20, October 2000
[doi> 10.1109/4434.895100]
|
 |
25
|
Kunle Olukotun , Basem A. Nayfeh , Lance Hammond , Ken Wilson , Kunyung Chang, The case for a single-chip multiprocessor, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.2-11, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
26
|
Pai, V., Ranganathan, P., and Adve, S. 1997. RSIM Reference Manual version 1.0, Tech. Rep. 9705, Department of Electrical and Computer Engineering, Rice University.
|
| |
27
|
Price, C. 1995. MIPS IV Instruction Set, Revision 3.1. MIPS Technologies, Inc., Mountain View, CA.
|
| |
28
|
|
 |
29
|
Steven K. Reinhardt , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , David A. Wood, The Wisconsin Wind Tunnel: virtual prototyping of parallel computers, Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.48-60, May 10-14, 1993, Santa Clara, California, United States
|
| |
30
|
Scali Computer AS 2000. Scali System Guide version 2.0, white paper. Scali Computer AS, www.scali.com.
|
| |
31
|
IEEE 1993. Scalable Coherent Interface: ANSI/IEEE Standard 1596-1992. IEEE Service Center, Piscataway, NJ.
|
| |
32
|
Kevin Skadron , Pritpal S. Ahuja , Margaret Martonosi , Douglas W. Clark, Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques, IEEE Transactions on Computers, v.48 n.11, p.1260-1281, November 1999
[doi> 10.1109/12.811115]
|
| |
33
|
|
 |
34
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
CITED BY 5
|
|
|
|
|
Eric S. Chung , Michael K. Papamichael , Eriko Nurvitadhi , James C. Hoe , Ken Mai , Babak Falsafi, ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs, ACM Transactions on Reconfigurable Technology and Systems (TRETS), v.2 n.2, p.1-32, June 2009
|
|
|
|
|
|
|
|
|
|
|