|
ABSTRACT
This poster describes CHiMPS, a toolflow that aims to provide software developers with a way to program hybrid CPU-FPGA platforms using familiar tools, languages, and techniques. CHiMPS starts with C and produces a specialized spatial dataflow architecture that supports coherent caches and the shared-memory programming model. The toolflow is designed to abstract away the complex details of data movement and separate memories on the hybrid platforms, as well as take advantage of memory management and computation techniques unique to reconfigurable hardware. This poster focuses on the memory design for CHiMPS, particularly the use of numerous small caches customized for various phases of program execution. The poster also addresses area vs. performance tradeoffs for various configurations. Applications compiled using CHiMPS show performance improvements of more than 36x on simple compute-intensive kernels, and 4.3x on the difficult-to-parallelize STSWM application without any special optimizations compared to running only on the CPU. The toolflow supports full ANSI-C, and produces hardware that runs on platforms that are expected to be available within one year
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Krste Asanovic, et al. The Landscape of Parallel Computing: A View from Berkeley. UCB/EECS-2006-183. http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html, Dec 2006.
|
| |
2
|
DRC Computers, RPU110 Data Sheet v4.18.07, http://www.drccomputer.com/pdfs/DRC_RPU110_datasheet.pdf, 2007.
|
| |
3
|
XtremeData, XD2000i Data Sheet, http://www.xtremedatainc.com/pdf/XD2000i_brief.pdf, 2007.
|
| |
4
|
Steve Trimberger. Redefining the FPGA. Field Programmable Logic (FPL) 2007, San Jose, CA, 2007.
|
| |
5
|
Avinash (Nash) Palaniswamy, Misha Burich, Intel + Altera = Efficient HPC Coprocessing, http://www.altera.com/education/webcasts/all/wc-2007-efficient-hpc-processing.html, 2007.
|
| |
6
|
Celoxica, Handel-C For Hardware Design v1.1, http://www.celoxica.com/techlib/files/CEL-W0307171L48-63.pdf, August 2002.
|
| |
7
|
|
| |
8
|
Celoxica, Handel-C Language Reference Manual RM-1003-4.2, http://www.celoxica.com, 2004.
|
| |
9
|
Xilinx, UG096: Implementing a Virtex-4 FX PowerPC System with a C-to-HDL Hardware Coprocessor Accelerator Design Guide v1.0, http://www.xilinx.com/bvdocs/userguides/ug096.pdf, 2005.
|
| |
10
|
|
 |
11
|
David Slogsnat , Alexander Giese , Ulrich Brüning, A versatile, low latency HyperTransport core, Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays, February 18-20, 2007, Monterey, California, USA
[doi> 10.1145/1216919.1216926]
|
| |
12
|
Ian McCallum, Intel® QuickAssist Technology Accelerator Abstraction Layer (AAL) 317481-001US, http://download.intel.com/technology/platforms/quickassist/quickassist_aal_whitepaper.pdf, 2007.
|
| |
13
|
M.B. Gokhale et al., "Promises and Pitfalls of Reconfigurable Supercomputing" Proc. 2006 Conf. Eng. of Reconfigurable Systems and Algorithms, CSREA Press, 2006, pp. 11--20.
|
| |
14
|
Celoxica, Accelerating System Performance Using ESL Design Tools and FPGA Technology v. 1.0, http://www.celoxica.com/techlib/files/CEL-W061018155T-514.pdf, August 2006.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
Stephen A. Edwards. High-level Synthesis from the Synchronous Language Esterel. In Proceedings of the International Workshop on Logic and Synthesis (IWLS). New Orleans, Louisiana, June, 2002.
|
| |
21
|
|
 |
22
|
Patrick Schaumont , Serge Vernalde , Luc Rijnders , Marc Engels , Ivo Bolsens, A programming environment for the design of complex high speed ASICs, Proceedings of the 35th annual conference on Design automation, p.315-320, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277135]
|
| |
23
|
Mentor Graphics, Catapult Synthesis Datasheet 10-25-550w, http://www.mentor.com/products/esl/high_level_synthesis/catapult_synthesis/upload/Catapult_DS_0107.pdf, 2007.
|
| |
24
|
Daniel D. Gajski, Jianwen Zhu, Rainer Dömer, Andreas Gerstlauer, and Shuqing Zhao. SpecC: Specification Language and Methodology. Kluwer, Boston, Massachusetts, 2000.
|
| |
25
|
W. Böhm , J. Hammes , B. Draper , M. Chawathe , C. Ross , R. Rinker , W. Najjar, Mapping a Single Assignment Programming Language to Reconfigurable Systems, The Journal of Supercomputing, v.21 n.2, p.117-130, February 2002
[doi> 10.1023/A:1013623303037]
|
 |
26
|
|
 |
27
|
Takashi Kambe , Akihisa Yamada , Koichi Nishida , Kazuhisa Okada , Mitsuhisa Ohnishi , Andrew Kay , Paul Boca , Vince Zammit , Toshio Nomura, A C-based synthesis system, Bach, and its application (invited talk), Proceedings of the 2001 conference on Asia South Pacific design automation, p.151-155, January 2001, Yokohama, Japan
[doi> 10.1145/370155.370309]
|
| |
28
|
Nallatech, DIMEtalk 3.1 User Guide NT 107-0305, http://www.nallatech.com, 2006.
|
| |
29
|
|
| |
30
|
Altera, Nios II C2H Compiler Users Guide v1.2, http://www.altera.com/literature/ug/ug_nios2_c2h_compiler.pdf, May 2007.
|
| |
31
|
|
| |
32
|
|
| |
33
|
D. Edwards and A. Bardsley. Balsa: An asynchronous hardware synthesis language. The Computer J., 45(1):12--18, 2002.
|
 |
34
|
|
| |
35
|
J. Teifel and R. Manohar. Static tokens: Using dataflow to automate concurrent pipeline synthesis. In 10th Int'l Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 17--27, April 2004.
|
| |
36
|
AMD, AMD Introduces World's First Dedicated Enterprise Stream Processor, http://www.amd.com/us--en/Corporate/VirtualPressRoom/0,,51_104_543~114146,00.html, November 2006.
|
| |
37
|
nVidia, NVIDIA CUDA Compute Unified Device Architecture Programming Guide v1.0, http://developer.download.nvidia.com/compute/cuda/1_0/NVIDIA_CUDA_Programming_Guide_1.0.pdf, 6/2007
|
| |
38
|
RapidMind, Writing Applications for the GPU Using the RapidMind" Development Platform, http://www.rapidmind.net/pdfs/WPgpu.pdf, 2006.
|
| |
39
|
Matthew Papakipos, The PeakStream Platform, High-Productivity Software Development for Multi-Core Processors, http://download.microsoft.com/download/d/f/6/df6accd5-4bf2-4984-8285-f4f23b7b1f37/WinHEC2007_PeakStream.doc, April 2007.
|
| |
40
|
Chris Frasier, David Hansen, LCC, A Retargetable Compiler for ANSI-C v4.2, http://www.cs.princeton.edu/software/lcc/, 2007.
|
| |
41
|
Intel® Core"2 Extreme Processor X6800 and Intel® Core"2 Duo Desktop Processor E6000 and E4000 Sequences Datasheet, v -006, http://download.intel.com/design/processor/datashts/31327806.pdf, 2007.
|
| |
42
|
Xilinx, XST Users Guide 9.1i, http://toolbox.xilinx.com/docsan/xilinx9/books/docs/xst/xst.pdf, 2007.
|
| |
43
|
Xilinx, XAPP228 -- Quad-Port Memories in Virtex Devices, http://www.xilinx.com/bvdocs/appnotes/xapp228.pdf, 2002.
|
| |
44
|
NCAR, STSWM (NCAR Spectral Transform Shallow Water Model), http://www.csm.ornl.gov/chammp/stswm/index.html, 2000
|
| |
45
|
Xilinx ACP Press Release http://www.xilinx.com/prs_rls/2007/events_corp/0757_intelforum.htm
|
| |
46
|
Xilinx, DS083: Virtex-II Pro Data Sheet v. 4.6, http://www.xilinx.com/bvdocs/publications/ds083.pdf, 2007.
|
| |
47
|
Xilinx, UG081: MicroBlaze Processor Reference Guide v 7.0 http://www.xilinx.com/ise/embedded/mb_ref_guide.pdf, 2007.
|
CITED BY 2
|
|
Andrew Putnam , Susan Eggers , Dave Bennett , Eric Dellinger , Jeff Mason , Henry Styles , Prasanna Sundararajan , Ralph Wittig, Performance and power of cache-based reconfigurable computing, ACM SIGARCH Computer Architecture News, v.37 n.3, June 2009
|
|
|
|
|