ACM Home Page
Please provide us with feedback. Feedback
Soft vector processors vs FPGA custom hardware: measuring and reducing the gap
Source
International Symposium on Field Programmable Gate Arrays archive
Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays table of contents
Monterey, California, USA
POSTER SESSION: Processors & CAD tools table of contents
Pages 277-277  
Year of Publication: 2009
ISBN:978-1-60558-410-2
Authors
Peter Yiannacouras  University of Toronto, Toronto, Canada
J. Gregory Steffan  University of Toronto, Toronto, Canada
Jonathan Rose  University of Toronto, Toronto, Canada
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1508128.1508178
What is a DOI?

ABSTRACT

Soft processors are often used in FPGA-based systems because of their ease-of-use, but for a given computation there is a significant gap in area/performance between a C code implementation executing on a soft processor and a custom FPGA hardware implementation. Recent research has demonstrated that soft processors augmented with support for vector instructions provide significant improvements in performance and scalability for data-parallel workloads. In this work, using an FPGA platform equipped with DDR memory executing data-parallel benchmarks from the industry-standard EEMBC suite, we measure the area/performance gaps between (i) C programs executing on a scalar soft processor, (ii) hand-vectorized programs executing on a soft vector processor, and (iii) custom FPGA hardware. We demonstrate that the wall clock performance gap between scalar executed C and custom hardware can be drastically reduced using our improved soft vector processors, even though they are still clocked 3x slower than custom hardware. We identify loop overhead, data delivery, and exact resource usage as three key advantages of custom hardware that we propose to mitigate in our soft vector processor respectively by decoupling pipelines, tuning cache design, supporting prefetching, and automatically eliminating unused instructions and datapath width. We show that together these improvements increase performance by 3x and reduce the area of the fastest soft vector processor by 2x, significantly reducing the need for designers to resort to more challenging custom hardware implementations.


Collaborative Colleagues:
Peter Yiannacouras: colleagues
J. Gregory Steffan: colleagues
Jonathan Rose: colleagues