|
ABSTRACT
Functional full-system simulators are powerful and versatile research tools for accelerating architectural exploration and advanced software development. Their main shortcoming is limited throughput when simulating large multiprocessor systems with hundreds or thousands of processors or when instrumentation is introduced. We propose the ProtoFlex simulation architecture, which uses FPGAs to accelerate full-system multiprocessor simulation and to facilitate high-performance instrumentation. Prior FPGA approaches that prototype a complete system in hardware are either too complex when scaling to large-scale configurations or require significant effort to provide full-system support. In contrast, ProtoFlex virtualizes the execution of many logical processors onto a consolidated number of multiple-context execution engines on the FPGA. Through virtualization, the number of engines can be judiciously scaled, as needed, to deliver on necessary simulation performance at a large savings in complexity. Further, to achieve low-complexity full-system support, a hybrid simulation technique called transplanting allows implementing in the FPGA only the frequently encountered behaviors, while a software simulator preserves the abstraction of a complete system. We have created a first instance of the ProtoFlex simulation architecture, which is an FPGA-based, full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server, hosted on a single Xilinx Virtex-II XCV2P70 FPGA. On average, the simulator achieves a 38x speedup (and as high as 49×) over comparable software simulation across a suite of applications, including OLTP on a commercial database server. We also demonstrate the advantages of minimal-overhead FPGA-accelerated instrumentation through a CMP cache simulation technique that runs orders-of-magnitude faster than software.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
AMD. 2008. Advanced Micro Devices, SimNow Simulator 4.4.3. User’s manual.
|
 |
2
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, ACM SIGARCH Computer Architecture News, v.28 n.2, p.282-293, May 2000
|
| |
3
|
|
| |
4
|
Nathan L. Binkert , Ronald G. Dreslinski , Lisa R. Hsu , Kevin T. Lim , Ali G. Saidi , Steven K. Reinhardt, The M5 Simulator: Modeling Networked Systems, IEEE Micro, v.26 n.4, p.52-60, July 2006
[doi> 10.1109/MM.2006.82]
|
 |
5
|
Patrick Bohrer , James Peterson , Mootaz Elnozahy , Ram Rajamony , Ahmed Gheith , Ron Rockhold , Charles Lefurgy , Hazim Shafi , Tarun Nakra , Rick Simpson , Evan Speight , Kartik Sudeep , Eric Van Hensbergen , Lixin Zhang, Mambo: a full system simulator for the PowerPC architecture, ACM SIGMETRICS Performance Evaluation Review, v.31 n.4, p.8-12, March 2004
[doi> 10.1145/1054907.1054910]
|
| |
6
|
|
 |
7
|
Shimin Chen , Michael Kozuch , Theodoros Strigkos , Babak Falsafi , Phillip B. Gibbons , Todd C. Mowry , Vijaya Ramachandran , Olatunji Ruwase , Michael Ryan , Evangelos Vlachos, Flexible Hardware Acceleration for Instruction-Grain Program Monitoring, Proceedings of the 35th International Symposium on Computer Architecture, p.377-388, June 21-25, 2008
|
 |
8
|
|
| |
9
|
Derek Chiou , Dam Sunwoo , Joonsoo Kim , Nikhil A. Patil , William Reinhart , Darrel Eric Johnson , Jebediah Keefe , Hari Angepat, FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators, Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, p.249-261, December 01-05, 2007
[doi> 10.1109/MICRO.2007.16]
|
 |
10
|
Eric S. Chung , Eriko Nurvitadhi , James C. Hoe , Babak Falsafi , Ken Mai, A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs, Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays, February 24-26, 2008, Monterey, California, USA
[doi> 10.1145/1344671.1344684]
|
 |
11
|
|
| |
12
|
Joel Emer , Pritpal Ahuja , Eric Borch , Artur Klauser , Chi-Keung Luk , Srilatha Manne , Shubhendu S. Mukherjee , Harish Patil , Steven Wallace , Nathan Binkert , Roger Espasa , Toni Juan, Asim: A Performance Model Framework, Computer, v.35 n.2, p.68-76, February 2002
[doi> 10.1109/2.982918]
|
| |
13
|
Richard A. Hankins , Trung Diep , Murali Annavaram , Brian Hirano , Harald Eri , Hubert Nueckel , John P. Shen, Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.151, December 03-05, 2003
|
| |
14
|
Krasnov, A., Schultz, A., Wawrzynek, J., Gibeling, G., and Droz, P. 2007. RAMP Blue: A message-passing manycore system in FPGAs. In Proceedings of the Conference on Field Programmable Logic and Applications.
|
| |
15
|
Lantz, R. 2008. Fast functional simulation with parallel Embra. In Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation.
|
 |
16
|
|
 |
17
|
Shih-Lien L. Lu , Peter Yiannacouras , Rolf Kassa , Michael Konow , Taeweon Suh, An FPGA-based Pentium® in a complete desktop system, Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, February 18-20, 2007, Monterey, California, USA
[doi> 10.1145/1216919.1216927]
|
| |
18
|
Peter S. Magnusson , Magnus Christensson , Jesper Eskilson , Daniel Forsgren , Gustav Hållberg , Johan Högberg , Fredrik Larsson , Andreas Moestedt , Bengt Werner, Simics: A Full System Simulation Platform, Computer, v.35 n.2, p.50-58, February 2002
[doi> 10.1109/2.982916]
|
 |
19
|
Milo M. K. Martin , Daniel J. Sorin , Bradford M. Beckmann , Michael R. Marty , Min Xu , Alaa R. Alameldeen , Kevin E. Moore , Mark D. Hill , David A. Wood, Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset, ACM SIGARCH Computer Architecture News, v.33 n.4, November 2005
[doi> 10.1145/1105734.1105747]
|
| |
20
|
Shubhendu S. Mukherjee , Steven K. Reinhardt , Babak Falsafi , Mike Litzkow , Mark D. Hill , David A. Wood , Steven Huss-Lederman , James R. Larus, Wisconsin Wind Tunnel II: A Fast, Portable Parallel Architecture Simulator, IEEE Concurrency, v.8 n.4, p.12-20, October 2000
[doi> 10.1109/4434.895100]
|
 |
21
|
|
| |
22
|
Nussbaum, F., Fedorova, A., and Small, C. 2004. An overview of the Sam CMT simulator kit. Tech. rep. TR-2004-133, Sun Microsystems Research Labs.
|
 |
23
|
Koray Öner , Luiz A. Barroso , Sasan Iman , Jaeheon Jeong , Krishnan Ramamurthy , Michel Dubois, The design of RPM: an FPGA-based multiprocessor emulator, Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays, p.60-66, February 12-14, 1995, Monterey, California, United States
[doi> 10.1145/201310.201321]
|
| |
24
|
Over, A., Clarke, B., and Strazdins, P. 2007. A comparison of two approaches to parallel simulation of multiprocessors. ispass 0, 12--22.
|
| |
25
|
Harish Patil , Robert Cohn , Mark Charney , Rajiv Kapoor , Andrew Sun , Anand Karunanidhi, Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.81-92, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.28]
|
| |
26
|
Pellauer, M., Vijayaraghavan, M., Adler, M., and Emer, J. 2008. Quick performance models quickly: Timing-Directed simulation on FPGAs. In Proceedings of the International Symposium on Performance Analysis of Systems and Software.
|
| |
27
|
Penry, D., Fay, D., Hodgdon, D., Wells, R., Schelle, G., August, D., and Connors, D. 2006. Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 29--40.
|
 |
28
|
Steven K. Reinhardt , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , David A. Wood, The Wisconsin Wind Tunnel: virtual prototyping of parallel computers, ACM SIGMETRICS Performance Evaluation Review, v.21 n.1, p.48-60, June 1993
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
Tan, Z., Asanović, K., and Patterson, D. 2008. An FPGA host-multithreaded functional model for SPARC v8. In Proceedings of the 3rd Workshop on Architectural Research Prototyping.
|
| |
33
|
Thornton, J. E. 1995. Parallel operation in the control data 6600. 5--12.
|
| |
34
|
Vahia, D. and Hartke, P. 2007. OpenSPARC T1 on Xilinx FPGAs--Updates. June 2007 RAMP Retreat.
|
| |
35
|
|
 |
36
|
|
| |
37
|
John Wawrzynek , David Patterson , Mark Oskin , Shih-Lien Lu , Christoforos Kozyrakis , James C. Hoe , Derek Chiou , Krste Asanovic, RAMP: Research Accelerator for Multiple Processors, IEEE Micro, v.27 n.2, p.46-57, March 2007
[doi> 10.1109/MM.2007.39]
|
 |
38
|
Sewook Wee , Jared Casper , Njuguna Njoroge , Yuriy Tesylar , Daxia Ge , Christos Kozyrakis , Kunle Olukotun, A practical FPGA-based framework for novel CMP research, Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, February 18-20, 2007, Monterey, California, USA
[doi> 10.1145/1216919.1216936]
|
| |
39
|
Wenisch, T. and Wunderlich, R. 2005. SimFlex: Fast, accurate and flexible simulation of computer systems. In Proceedings of the Tutorial in the International Symposium on Microarchitecture (MICRO-38).
|
| |
40
|
Thomas F. Wenisch , Roland E. Wunderlich , Michael Ferdman , Anastassia Ailamaki , Babak Falsafi , James C. Hoe, SimFlex: Statistical Sampling of Computer System Simulation, IEEE Micro, v.26 n.4, p.18-31, July 2006
[doi> 10.1109/MM.2006.79]
|
 |
41
|
|
| |
42
|
Yourst, M. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 23--34.
|
|