|
ABSTRACT
We have developed a new technique for evaluating cache coherent, shared-memory computers. The Wisconsin Wind Tunnel (WWT) runs a parallel shared-memory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution time. WWT is a virtual prototype that exploits similarities between the system under design (the target) and an existing evaluation platform (the host). The host directly executes all target program instructions and memory references that hit in the target cache. WWT's shared memory uses the CM-5 memory's error-correcting code (ECC) as valid bits for a fine-grained extension of shared virtual memory. Only memory references that miss in the target cache trap to WWT, which simulates a cache-coherence protocol. WWT correctly interleaves target machine events and calculates target program execution time. WWT runs on parallel computers with greater speed and memory capacity than uniprocessors. WWT's simulation time decreases as target system size increases for fixed-size problems and holds roughly constant as the target system and problem scale.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
A. Agarwal , R. Simoni , J. Hennessy , M. Horowitz, An evaluation of directory schemes for cache coherence, Proceedings of the 15th Annual International Symposium on Computer architecture, p.280-298, May 30-June 02, 1988, Honolulu, Hawaii, United States
|
 |
2
|
Robert Alverson , David Callahan , Daniel Cummings , Brian Koblenz , Allan Porterfield , Burton Smith, The Tera computer system, Proceedings of the 4th international conference on Supercomputing, p.1-6, June 11-15, 1990, Amsterdam, The Netherlands
|
| |
3
|
Rassul Ayanl. A Parallel Simulation Scheme Based on the Distance Between Objects. In Proceedings of the SCS Multiconfcrcnc~ on Distributed Simulation, pages 113- 118, March 1989.
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
David Chaiken , John Kubiatowicz , Anant Agarwal, LimitLESS directories: A scalable cache coherence scheme, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.224-234, April 08-11, 1991, Santa Clara, California, United States
|
 |
8
|
Robert F. Cmelik , Shing I. Kong , David R. Ditzel , Edmund J. Kelly, An analysis of MIPS and SPARC instruction set utilization on the SPEC benchmarks, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.290-302, April 08-11, 1991, Santa Clara, California, United States
|
| |
9
|
Thinking Machines Corporation. The Connection Machine CM-5 Technical Summary, 1991.
|
 |
10
|
R. C. Covington , S. Madala , V. Mehta , J. R. Jump , J. B. Sinclair, The rice parallel processing testbed, Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.4-11, May 24-27, 1988, Santa Fe, New Mexico, United States
|
| |
11
|
William J. Dally, Andrew Chien, Stuart Fiske, Waldemar Horwat, John Keen, Michael Larivee, Rich Nuth, Scott Wills, Paul Carrick, and Greg Flyer. The J-Machine: A Fine-Grain Concurrent Computer. In G. X. Ritter, editor, Proe. Information Processing 89. Elsevier North-Holland, Inc., 1989.
|
| |
12
|
Helen Davis, Stephen R. Goldschmidt, and John Hennessy. Multiprocessor Simulation artd Tracing Using Tango. In Proceedings of the 1991 International Con- }erence on Parallel Processing (Vol. Ii Software), pages Ii99--107, August 1991.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Mark D. Hill , James R. Larus , Steven K. Reinhardt , David A. Wood, Cooperative shared memory: software and hardware for scalable multiprocessor, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.262-273, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
17
|
|
| |
18
|
Kendall Square Research. Kendall Square Research Technical Summary, 1992.
|
 |
19
|
|
| |
20
|
Pavlos Konas and Pen-Chung Yew. Synchronous Parallel Discrete Event Simulation on Shared-Memory Multiproeessors, in Proceedings of 6th Workshop on Parallel and Distributed Simulation, pages 12-21, January 1992.
|
| |
21
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510]
|
 |
22
|
|
| |
23
|
Y.-B. Lin, J.-L. Baer, and E. D. Lazowska. Tailoring a Parallel Trace-Driven Simulation Tedmique to Specific Multlprocessor Cache Coherence Protocols. Technical Report 88-01-02, Department of Computer Science, University of Washington, March 1988.
|
| |
24
|
J. S. Liptay. Structural Aspects of the System/360 Model 85, Part II: The Cache. IBM Systems Journal, 7(1):15-21, 1968.
|
 |
25
|
|
 |
26
|
|
| |
27
|
|
| |
28
|
|
 |
29
|
|
 |
30
|
|
 |
31
|
|
| |
32
|
SPEC. SPEC Benchmark Suite Release 1.0, Winter 1990.
|
| |
33
|
Yuval Tamir and G. Janakiraman. Hierarchical Coherency Management for Shared Virtual Memory Multicomputers. Journal of Parallel and Distributed Computing, 15(4):408-419, August 1992.
|
 |
34
|
David A. Wood , Satish Chandra , Babak Falsafi , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , Shubhendu S. Mukherjee , Subbarao Palacharla , Steven K. Reinhardt, Mechanisms for cooperative shared memory, Proceedings of the 20th annual international symposium on Computer architecture, p.156-167, May 16-19, 1993, San Diego, California, United States
|
CITED BY 93
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vikram S. Adve , Rajive Bagrodia , Ewa Deelman , Thomas Phan , Rizos Sakellariou, Compiler-supported simulation of highly scalable parallel applications, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.1-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
Alain Kägi , Nagi Aboulenein , Douglas C. Burger , James R. Goodman, Techniques for reducing overheads of shared-memory multiprocessing, Proceedings of the 9th international conference on Supercomputing, p.11-20, July 03-07, 1995, Barcelona, Spain
|
|
|
|
|
|
Jeff Gibson , Robert Kunz , David Ofelt , Mark Horowitz , John Hennessy , Mark Heinrich, FLASH vs. (Simulated) FLASH: closing the simulation loop, ACM SIGARCH Computer Architecture News, v.28 n.5, p.49-58, Dec. 2000
|
|
|
Frederick C. Wong , Richard P. Martin , Remzi H. Arpaci-Dusseau , David E. Culler, Architectural requirements and scalability of the NAS parallel benchmarks, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.41-es, November 14-19, 1999, Portland, Oregon, United States
|
|
|
|
|
|
|
|
|
Phillip M. Dickens , Philip Heidelberger , David M. Nicol, Timing simulation of paragon codes using workstation clusters, Proceedings of the 26th conference on Winter simulation, p.1347-1353, December 11-14, 1994, Orlando, Florida, United States
|
|
|
|
|
|
|
|
|
|
|
|
Hervé A. Jamrozik , Michael J. Feeley , Geoffrey M. Voelker , James Evans, II , Anna R. Karlin , Henry M. Levy , Mary K. Vernon, Reducing network latency using subpages in a global memory environment, ACM SIGPLAN Notices, v.31 n.9, p.258-267, Sept. 1996
|
|
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 conference on Supercomputing, p.380-389, December 1994, Washington, D.C., United States
|
|
|
David A. Wood , Satish Chandra , Babak Falsafi , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , Shubhendu S. Mukherjee , Subbarao Palacharla , Steven K. Reinhardt, Mechanisms for cooperative shared memory, ACM SIGARCH Computer Architecture News, v.21 n.2, p.156-167, May 1993
|
|
|
Jeff Gibson , Robert Kunz , David Ofelt , Mark Horowitz , John Hennessy , Mark Heinrich, FLASH vs. (simulated) FLASH: closing the simulation loop, ACM SIGPLAN Notices, v.35 n.11, p.49-58, Nov. 2000
|
|
|
Shubhendu S. Mukherjee , Steven K. Reinhardt , Babak Falsafi , Mike Litzkow , Mark D. Hill , David A. Wood , Steven Huss-Lederman , James R. Larus, Wisconsin Wind Tunnel II: A Fast, Portable Parallel Architecture Simulator, IEEE Concurrency, v.8 n.4, p.12-20, October 2000
|
|
|
|
|
|
|
|
|
|
|
|
Ewa Deelman , Rajive Bagrodia , Rizos Sakellariou , Vikram Adve, Improving lookahead in parallel discrete event simulations of large-scale applications using compiler analysis, Proceedings of the fifteenth workshop on Parallel and distributed simulation, p.5-13, May 15-18, 2001, Lake Arrowhead, California, United States
|
|
|
|
|
|
Douglas C. Burger , Rahmat S. Hyder , Barton P. Miller , David A. Wood, Paging tradeoffs in distributed-shared-memory multiprocessors, Proceedings of the 1994 conference on Supercomputing, p.590-599, December 1994, Washington, D.C., United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ioannis Schoinas , Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , James R. Larus , David A. Wood, Fine-grain access control for distributed shared memory, ACM SIGPLAN Notices, v.29 n.11, p.297-306, Nov. 1994
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H. J. Song , X. Liu , D. Jakobsen , R. Bhagwan , X. Zhang , K. Taura , A. Chien, The MicroGrid: a scientific tool for modeling computational gridsr, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.53-es, November 04-10, 2000, Dallas, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
Tahsin Kurc , Mustafa Uysal , Hyeonsang Eom , Jeff Hollingsworth , Joel Saltz , Alan Sussman, Efficient Performance Prediction for Large-Scale, Data-Intensive Applications, International Journal of High Performance Computing Applications, v.14 n.3, p.216-227, August 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Babak Falsafi , Alvin R. Lebeck , Steven K. Reinhardt , Ioannis Schoinas , Mark D. Hill , James R. Larus , Anne Rogers , David A. Wood, Application-specific protocols for user-level shared memory, Proceedings of the 1994 ACM/IEEE conference on Supercomputing, November 14-18, 1994, Washington, D.C.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jason Liu , Yougu Yuan , David M. Nicol , Robert S. Gray , Calvin C. Newport , David Kotz , Luiz Felipe Perrone, Empirical Validation of Wireless Models in Simulations of Ad Hoc Routing Protocols, Simulation, v.81 n.4, p.307-323, April 2005
|
|
|
Nikolaos Hardavellas , Stephen Somogyi , Thomas F. Wenisch , Roland E. Wunderlich , Shelley Chen , Jangwoo Kim , Babak Falsafi , James C. Hoe , Andreas G. Nowatzyk, SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture, ACM SIGMETRICS Performance Evaluation Review, v.31 n.4, p.31-34, March 2004
|
|
|
|
|
|
|
|
|
H. J. Song , X. Liu , D. Jakobsen , R. Bhagwan , X. Zhang , K. Taura , A. Chien, The MicroGrid: A scientific tool for modeling Computational Grids, Scientific Programming, v.8 n.3, p.127-141, August 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kalyan S. Perumalla , Richard M. Fujimoto , Prashant J. Thakare , Santosh Pande , Homa Karimabadi , Yuri Omelchenko , Jonathan Driscoll, Performance prediction of large-scale parallel discrete event models of physical systems, Proceedings of the 37th conference on Winter simulation, December 04-07, 2005, Orlando, Florida
|
|
|
|
|
|
|
|
|
|
|
|
S. D. Hammond , G. R. Mudalige , J. A. Smith , S. A. Jarvis , J. A. Herdman , A. Vadgama, WARPP: a toolkit for simulating high-performance parallel scientific codes, Proceedings of the 2nd International Conference on Simulation Tools and Techniques, March 02-06, 2009, Rome, Italy
|
|
|
Eric S. Chung , Michael K. Papamichael , Eriko Nurvitadhi , James C. Hoe , Ken Mai , Babak Falsafi, ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs, ACM Transactions on Reconfigurable Technology and Systems (TRETS), v.2 n.2, p.1-32, June 2009
|
|
|
|
|