ACM Home Page
Please provide us with feedback. Feedback
Two-level microprocessor-accelerator partitioning
Full text PdfPdf (113 KB)
Source Design, Automation, and Test in Europe archive
Proceedings of the conference on Design, automation and test in Europe table of contents
Nice, France
SESSION: Design space exploration and nano-technologies for reconfigurable computing table of contents
Pages: 313 - 318  
Year of Publication: 2007
ISBN:978-3-9810801-2-4
Authors
Scott Sirowy  University of California, Riverside
Yonghui Wu  University of California, Riverside
Stefano Lonardi  University of California, Riverside
Frank Vahid  University of California, Riverside and University of California, Irvine
Sponsors
: IEEE Council on Electronic Design Automation (CEDA)
SIGDA: ACM Special Interest Group on Design Automation
: The EDA Consortium
EDAA : European Design and Automation Association
RAS : RAS
: The IEEE Computer Society TTTC
: ECSI
Publisher
EDA Consortium  San Jose, CA, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 20,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

The integration of microprocessors and field-programmable gate array (FPGA) fabric on a single chip increases both the utility and necessity of tools that automatically move software functions from the microprocessor to accelerators on the FPGA to improve performance or energy. Such hardware/software partitioning for modern FPGAs involves the problem of partitioning functions among two levels of accelerator groups -- tightly-coupled accelerators that have fast single-clock-cycle memory access to the microprocessor's memory, and loosely-coupled accelerators that access memory through a bridge to avoid slowing the main clock period with their longer critical paths. We introduce this new two-level accelerator-partitioning problem, and we describe a novel optimal dynamic programming algorithm to solve the problem. By making use of the size constraint imposed by FPGAs, the algorithm has what is effectively quadratic runtime complexity, running in just a few seconds for examples with up to 25 accelerators, obtaining an average performance improvement of 35% compared to a traditional single-level bus architecture.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Chattopadhyay, A. and Z. Zilic. GALDS: A Complete Framework for Designing Multiclock ASICs and SoCs. IEEE Trans. on Very Large Scale Integration (VLSI) Systems, Vol. 13, No. 6, June 2005
2
 
3
CriticalBlue. http://www.criticalblue.com
 
4
Eles, P., Z. Peng, K. Kuchcinsky, and A. Doboli. System Level Hardware/Software Partitioning Based on Simulated Annealing and Tabu Search. Design Automation for Embedded Systems, vol2, no 1, 5--32 January 1997.
 
5
Excalibur. Altera Corp., http://www.altera.com
 
6
7
 
8
 
9
10
 
11
 
12
 
13
 
14
 
15
 
16
Poseidon Triton System. http://www.poseidon-systems.com
 
17
18
19
 
20
Wildfire Reference Manual, Annapolis, Maryland: Annapolis Microsystems, Inc., 1998
21
Collaborative Colleagues:
Scott Sirowy: colleagues
Yonghui Wu: colleagues
Stefano Lonardi: colleagues
Frank Vahid: colleagues