ACM Home Page
Please provide us with feedback. Feedback
Scalable subgraph mapping for acyclic computation accelerators
Full text PdfPdf (906 KB)
Source International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems table of contents
Seoul, Korea
SESSION: Compilation table of contents
Pages: 147 - 157  
Year of Publication: 2006
ISBN:1-59593-543-6
Authors
Nathan Clark  University of Michigan - Ann Arbor, MI
Amir Hormati  University of Michigan - Ann Arbor, MI
Scott Mahlke  University of Michigan - Ann Arbor, MI
Sami Yehia  ARM Ltd., Cambridge, United Kingdom
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 48,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1176760.1176779
What is a DOI?

ABSTRACT

Computer architects are constantly faced with the need to improve performance and increase the efficiency of computation in their designs. To this end, it is increasingly common to see acyclic com-putation accelerators appear in embedded processor designs. One major problem with adding accelerators to a design is that it is difficult to generate high-quality code utilizing them. Hand-written assembly code is typical, and if compiler support does exist, it is implemented using only greedy algorithms. In this work, we investigate more thorough techniques for compiling to processors with acyclic accelerators. Where as greedy solutions only explore one possible solution, the techniques presented in this paper explore the entire design space, when possible. Intelligent pruning methods are employed to ensure compilation is both tractable and scalable. Overall, our new compilation algorithms produce code that performs on average 10%, and up to 32% better than standard greedy methods. These algorithms also run in less than one second for more than 98% of basic blocks tested.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
ARM Ltd. ARM926EJ-S Technical Reference Manual, Jan. 2004. http://www. arm. com/pdfs/DDI0198D 926 TRM. pdf.
3
 
4
 
5
6
7
8
 
9
10
 
11
 
12
E. Goldberg, L. Carloni, T. Villa, R. Brayton, and A. Sangiovanni-Vincentelli. Negative thinking in branch-and-bound: the case of unate covering. IEEE TCAD, 19(3):281--294, Mar. 2000.
13
 
14
 
15
S. Hu, I. Kim, M. H. Lipasti, andJ. E. Smith. Anapproachfor implementing efficient superscalar cisc processors. In Proc. 12th HPCA, pages 213--226, 2006.
 
16
 
17
I. Huang and A. M. Despain. Synthesis of application specific instruction sets. IEEE TCAD, 14(6):663--675, June 1995.
18
 
19
 
20
 
21
22
 
23
 
24
A. Peymandoust et al. Automatic instruction set extension and utilization for embedded processors. In 14th ASAP, pages 108--120, June 2003.
 
25
26
27
28
29
30
31

CITED BY  9

Collaborative Colleagues:
Nathan Clark: colleagues
Amir Hormati: colleagues
Scott Mahlke: colleagues
Sami Yehia: colleagues