ACM Home Page
Please provide us with feedback. Feedback
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
Full text PdfPdf (1.59 MB)
Source
ACM Transactions on Reconfigurable Technology and Systems (TRETS) archive
Volume 2 ,  Issue 2  (June 2009) table of contents
Article No. 15  
Year of Publication: 2009
ISSN:1936-7406
Authors
Eric S. Chung  Computer Architecture Laboratory at Carnegie Mellon
Michael K. Papamichael  Computer Architecture Laboratory at Carnegie Mellon
Eriko Nurvitadhi  Computer Architecture Laboratory at Carnegie Mellon
James C. Hoe  Computer Architecture Laboratory at Carnegie Mellon
Ken Mai  Computer Architecture Laboratory at Carnegie Mellon
Babak Falsafi  Parallel Systems Architecture Laboratory École Polytechnique Fédérale de Lausanne
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 44,   Downloads (12 Months): 149,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1534916.1534925
What is a DOI?

ABSTRACT

Functional full-system simulators are powerful and versatile research tools for accelerating architectural exploration and advanced software development. Their main shortcoming is limited throughput when simulating large multiprocessor systems with hundreds or thousands of processors or when instrumentation is introduced. We propose the ProtoFlex simulation architecture, which uses FPGAs to accelerate full-system multiprocessor simulation and to facilitate high-performance instrumentation. Prior FPGA approaches that prototype a complete system in hardware are either too complex when scaling to large-scale configurations or require significant effort to provide full-system support. In contrast, ProtoFlex virtualizes the execution of many logical processors onto a consolidated number of multiple-context execution engines on the FPGA. Through virtualization, the number of engines can be judiciously scaled, as needed, to deliver on necessary simulation performance at a large savings in complexity. Further, to achieve low-complexity full-system support, a hybrid simulation technique called transplanting allows implementing in the FPGA only the frequently encountered behaviors, while a software simulator preserves the abstraction of a complete system.

We have created a first instance of the ProtoFlex simulation architecture, which is an FPGA-based, full-system functional simulator for a 16-way UltraSPARC III symmetric multiprocessor server, hosted on a single Xilinx Virtex-II XCV2P70 FPGA. On average, the simulator achieves a 38x speedup (and as high as 49×) over comparable software simulation across a suite of applications, including OLTP on a commercial database server. We also demonstrate the advantages of minimal-overhead FPGA-accelerated instrumentation through a CMP cache simulation technique that runs orders-of-magnitude faster than software.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
AMD. 2008. Advanced Micro Devices, SimNow Simulator 4.4.3. User’s manual.
2
 
3
 
4
5
 
6
7
8
 
9
10
11
 
12
 
13
 
14
Krasnov, A., Schultz, A., Wawrzynek, J., Gibeling, G., and Droz, P. 2007. RAMP Blue: A message-passing manycore system in FPGAs. In Proceedings of the Conference on Field Programmable Logic and Applications.
 
15
Lantz, R. 2008. Fast functional simulation with parallel Embra. In Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation.
16
17
 
18
19
 
20
21
 
22
Nussbaum, F., Fedorova, A., and Small, C. 2004. An overview of the Sam CMT simulator kit. Tech. rep. TR-2004-133, Sun Microsystems Research Labs.
23
 
24
Over, A., Clarke, B., and Strazdins, P. 2007. A comparison of two approaches to parallel simulation of multiprocessors. ispass 0, 12--22.
 
25
 
26
Pellauer, M., Vijayaraghavan, M., Adler, M., and Emer, J. 2008. Quick performance models quickly: Timing-Directed simulation on FPGAs. In Proceedings of the International Symposium on Performance Analysis of Systems and Software.
 
27
Penry, D., Fay, D., Hodgdon, D., Wells, R., Schelle, G., August, D., and Connors, D. 2006. Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 29--40.
28
 
29
 
30
31
 
32
Tan, Z., Asanović, K., and Patterson, D. 2008. An FPGA host-multithreaded functional model for SPARC v8. In Proceedings of the 3rd Workshop on Architectural Research Prototyping.
 
33
Thornton, J. E. 1995. Parallel operation in the control data 6600. 5--12.
 
34
Vahia, D. and Hartke, P. 2007. OpenSPARC T1 on Xilinx FPGAs--Updates. June 2007 RAMP Retreat.
 
35
36
 
37
38
 
39
Wenisch, T. and Wunderlich, R. 2005. SimFlex: Fast, accurate and flexible simulation of computer systems. In Proceedings of the Tutorial in the International Symposium on Microarchitecture (MICRO-38).
 
40
41
 
42
Yourst, M. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 23--34.

Collaborative Colleagues:
Eric S. Chung: colleagues
Michael K. Papamichael: colleagues
Eriko Nurvitadhi: colleagues
James C. Hoe: colleagues
Ken Mai: colleagues
Babak Falsafi: colleagues