|
Warning: The download time has expired please click on the item to try again.
ABSTRACT
We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circuits on FPGAs (field-programmable gate arrays). Building on dynamic synthesis for single-processor single-thread systems, known as warp processing, thread warping improves performances of multiprocessor systems by speeding up individual threads and by allowing more threads to execute concurrently. Furthermore, thread warping maintains the important separation of function from architecture, enabling portability of applications to architectures with different quantities of microprocessors and FPGA.an advantage not shared by static compilation/synthesis approaches. We introduce a framework of architecture, CAD tools, and operating system that together support thread warping. We summarize experiments on an extensive architectural simulation framework we developed, showing application speedups of 4x to 502x, averaging 130x compared to a multiprocessor system having four ARM11 microprocessors, for eight benchmark applications. Even compared to a 64-processor system, thread warping achieves 11x speedup.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
R. Amerson , R. Carter , W. Culbertson , P. Kuekes , G. Snider , Lyle Albertson, Plasma: an FPGA for million gate systems, Proceedings of the 1996 ACM fourth international symposium on Field-programmable gate arrays, p.10-16, February 11-13, 1996, Monterey, California, United States
[doi> 10.1145/228370.228372]
|
| |
2
|
|
 |
3
|
|
| |
4
|
Cifuentes, C. Reverse Compilation Techniques. PhD Thesis, Queensland University of Technology, 1994.
|
| |
5
|
Cray XD1. http://www.cray.com/products/xd1, 2005.
|
| |
6
|
Dellson, A., Sandberg, G., and Möhl, S. Turning FPGAs into Supercomputers. Cray User Group, 2006.
|
| |
7
|
Eles, P., Peng, Z., Kuchchinski, K., and Doboli, A. System level hardware/software partitioning based on simulated annealing and tabu search. Journal on Design Automation for Embedded Systems (DAES), Springer, 2, 1 (1997), 5--32.
|
 |
8
|
|
 |
9
|
|
 |
10
|
Zhi Guo , Betul Buyukkurt , Walid Najjar, Input data reuse in compiling window operations onto reconfigurable hardware, Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, June 11-13, 2004, Washington, DC, USA
|
| |
11
|
|
 |
12
|
|
| |
13
|
IBM. The Cell Architecture. http://domino.research.ibm.com, 2006.
|
| |
14
|
Schleupen, K., Lekuch, S., Mannion, R., Guo, Z., Najjar, W., and Vahid, F. Dynamic partial FPGA reconfiguration in a prototype microprocessor system. In Proceedings of Int. Conf. on Field Programmable Logic And Applications, 2007.
|
| |
15
|
Intel Quad-Core Xeon. http://www.intel.com, 2007.
|
 |
16
|
|
 |
17
|
Dirk Koch , Christian Haubelt , Jürgen Teich, Efficient hardware checkpointing: concepts, overhead analysis, and implementation, Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, February 18-20, 2007, Monterey, California, USA
[doi> 10.1145/1216919.1216950]
|
| |
18
|
M. LaPedus. Intel Tips Teraflops Programmable Processor. EE Times, September 2006.
|
| |
19
|
Lu, J., Chen, H., Yew, P., and Hsu, W. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6 (Jun 2004), 1--24.
|
| |
20
|
Ludwig, S. Fast Hardware Synthesis Tools and a Reconfigurable Coprocessor. Ph.D. Thesis, ETH Zurich, 2005.
|
 |
21
|
|
| |
22
|
|
 |
23
|
Gaurav Mittal , David C. Zaretsky , Xiaoyong Tang , P. Banerjee, Automatic translation of software binaries onto FPGAs, Proceedings of the 41st annual conference on Design automation, June 07-11, 2004, San Diego, CA, USA
[doi> 10.1145/996566.996678]
|
| |
24
|
|
 |
25
|
|
| |
26
|
SGI Altix. http://www.sgi.com/products/servers/altix/
|
| |
27
|
|
| |
28
|
VxWorks RTOS. http://www.windriver.com/vxworks/, 2007.
|
| |
29
|
Xilinx Virtex II Pro, http://www.xilinx.com, 2006.
|
| |
30
|
Xilinx Virtex IV, http://www.xilinx.com, 2006.
|
| |
31
|
|
|