ACM Home Page
Please provide us with feedback. Feedback
Traversal caches: a first step towards FPGA acceleration of pointer-based data structures
Full text PdfPdf (340 KB)
Source
International Conference on Hardware Software Codesign archive
Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis table of contents
Atlanta, GA, USA
SESSION: Performance enhancement-new techniques for FPGAs and partitioning table of contents
Pages 61-66  
Year of Publication: 2008
ISBN:978-1-60558-470-6
Authors
Greg Stitt  University of Florida, Gainesville, FL, USA
Gaurav Chaudhari  University of Florida, Gainesville, FL, USA
James Coole  University of Florida, Gainesville, FL, USA
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
SIGBED: ACM Special Interest Group on Embedded Systems
ACM: Association for Computing Machinery
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 100,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1450135.1450150
What is a DOI?

ABSTRACT

Field-programmable gate arrays (FPGAs) often achieve order of magnitude speedups compared to microprocessors, but typically have been unable to improve the performance of applications with irregular memory access patterns, such as traversals of pointer-based data structures. Due to the common use of these data structures, the applicability and widespread success of FPGAs has been limited. In this paper, we introduce the traversal cache framework - a first step towards improving the performance of FPGA applications that utilize pointer-based data structures. The traversal cache is a local FPGA memory that stores repeated traversals of pointer-based data structures, allowing for these traversals to be efficiently streamed into the FPGA. Although the cache is generally limited to improving applications that exhibit repeated traversals, we show that many applications in fact have this characteristic. Furthermore, we show that few repetitions are needed to achieve performance improvements. We present experimental results showing that FPGA implementations using the traversal cache framework achieve speedups ranging from 7x to 29x compared to pointer-based software on a 3.2 GHz Xeon.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
 
5
 
6
 
7
GiDEL. GiDEL PROC Boards, 2008. http://www.gidel.com/PROCBoards.htm.
 
8
9
10
11
12
13
14
 
15
Nallatech Inc. DIMEtalk 3, 2008. http://www.nallatech.com/?node_id=1.2.2&id=19.
 
16
Nallatech Inc. Nallatech PCIXM FPGA accelerator card, 2008. http://www.nallatech.com/?node_id=1.2.2&id=41.
17
18
 
19
 
20
 
21
Xilinx Inc. MicroBlaze, 2008. http://www.xilinx.com/products/design_resources/proc_central/microblaze.htm.
 
22
Xilinx Inc. Virtex IV FX devices, 2008. http://www.xilinx.com/products/silicon_solutions/fpgas/virtex/virtex4/index.htm.

Collaborative Colleagues:
Greg Stitt: colleagues
Gaurav Chaudhari: colleagues
James Coole: colleagues