| Scalable subgraph mapping for acyclic computation accelerators |
| Full text |
Pdf
(906 KB)
|
| Source
|
International Conference on Compilers, Architecture and Synthesis for Embedded Systems
archive
Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
table of contents
Seoul, Korea
SESSION: Compilation
table of contents
Pages: 147 - 157
Year of Publication: 2006
ISBN:1-59593-543-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 48, Citation Count: 9
|
|
|
ABSTRACT
Computer architects are constantly faced with the need to improve performance and increase the efficiency of computation in their designs. To this end, it is increasingly common to see acyclic com-putation accelerators appear in embedded processor designs. One major problem with adding accelerators to a design is that it is difficult to generate high-quality code utilizing them. Hand-written assembly code is typical, and if compiler support does exist, it is implemented using only greedy algorithms. In this work, we investigate more thorough techniques for compiling to processors with acyclic accelerators. Where as greedy solutions only explore one possible solution, the techniques presented in this paper explore the entire design space, when possible. Intelligent pruning methods are employed to ensure compilation is both tractable and scalable. Overall, our new compilation algorithms produce code that performs on average 10%, and up to 32% better than standard greedy methods. These algorithms also run in less than one second for more than 98% of basic blocks tested.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
ARM Ltd. ARM926EJ-S Technical Reference Manual, Jan. 2004. http://www. arm. com/pdfs/DDI0198D 926 TRM. pdf.
|
 |
3
|
|
| |
4
|
Partha Biswas , Sudarshan Banerjee , Nikil Dutt , Laura Pozzi , Paolo Ienne, ISEGEN: Generation of High-Quality Instruction Set Extensions by Iterative Improvement, Proceedings of the conference on Design, Automation and Test in Europe, p.1246-1251, March 07-11, 2005
[doi> 10.1109/DATE.2005.191]
|
| |
5
|
|
 |
6
|
Philip Brisk , Adam Kaplan , Ryan Kastner , Majid Sarrafzadeh, Instruction generation and regularity extraction for reconfigurable processors, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, October 08-11, 2002, Grenoble, France
[doi> 10.1145/581630.581672]
|
 |
7
|
|
 |
8
|
Nathan Clark , Jason Blome , Michael Chu , Scott Mahlke , Stuart Biles , Krisztian Flautner, An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors, Proceedings of the 32nd Annual International Symposium on Computer Architecture, p.272-283, June 04-08, 2005
|
| |
9
|
|
 |
10
|
Roberto Cordone , Fabrizio Ferrandi , Donatella Sciuto , Roberto Wolfler Calvo, An efficient heuristic approach to solve the unate covering problem, Proceedings of the conference on Design, automation and test in Europe, p.364-371, March 27-30, 2000, Paris, France
[doi> 10.1145/343647.343799]
|
| |
11
|
|
| |
12
|
E. Goldberg, L. Carloni, T. Villa, R. Brayton, and A. Sangiovanni-Vincentelli. Negative thinking in branch-and-bound: the case of unate covering. IEEE TCAD, 19(3):281--294, Mar. 2000.
|
 |
13
|
|
| |
14
|
|
| |
15
|
S. Hu, I. Kim, M. H. Lipasti, andJ. E. Smith. Anapproachfor implementing efficient superscalar cisc processors. In Proc. 12th HPCA, pages 213--226, 2006.
|
| |
16
|
|
| |
17
|
I. Huang and A. M. Despain. Synthesis of application specific instruction sets. IEEE TCAD, 14(6):663--675, June 1995.
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
Stan Liao , Srinivas Devadas , Kurt Keutzer , Steve Tjiang, Instruction selection using binate covering for code size optimization, Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design, p.393-399, November 05-09, 1995, San Jose, California, United States
|
 |
22
|
Nadeem Malik , Richard J. Eickemeyer , Stamatis Vassiliadis, Interlock collapsing ALU for increased instruction-level parallelism, Proceedings of the 25th annual international symposium on Microarchitecture, p.149-157, December 01-04, 1992, Portland, Oregon, United States
|
| |
23
|
|
| |
24
|
A. Peymandoust et al. Automatic instruction set extension and utilization for embedded processors. In 14th ASAP, pages 108--120, June 2003.
|
| |
25
|
|
 |
26
|
|
 |
27
|
Peter G. Sassone , D. Scott Wills , Gabriel H. Loh, Static strands: safely collapsing dependence chains for increasing embedded power efficiency, Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, June 15-17, 2005, Chicago, Illinois, USA
|
 |
28
|
|
 |
29
|
|
 |
30
|
Zhi Alex Ye , Andreas Moshovos , Scott Hauck , Prithviraj Banerjee, CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit, Proceedings of the 27th annual international symposium on Computer architecture, p.225-235, June 2000, Vancouver, British Columbia, Canada
|
 |
31
|
Sami Yehia , Nathan Clark , Scott Mahlke , Krisztiàn Flautner, Exploring the design space of LUT-based transparent accelerators, Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, September 24-27, 2005, San Francisco, California, USA
[doi> 10.1145/1086297.1086301]
|
CITED BY 9
|
|
Ya-shuai Lü , Li Shen , Li-bo Huang , Zhi-ying Wang , Nong Xiao, Customizing computation accelerators for extensible multi-issue processors with effective optimization techniques, Proceedings of the 45th annual conference on Design automation, June 08-13, 2008, Anaheim, California
|
|
|
|
|
|
|
|
|
Hanno Scharwaechter , Jonghee M. Youn , Rainer Leupers , Yunheung Paek , Gerd Ascheid , Heinrich Meyr, A code-generator generator for multi-output instructions, Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, September 30-October 03, 2007, Salzburg, Austria
|
|
|
|
|
|
Shantanu Gupta , Shuguang Feng , Amin Ansari , Jason Blome , Scott Mahlke, StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
Paolo Bonzini , Giovanni Ansaloni , Laura Pozzi, Compiling custom instructions onto expression-grained reconfigurable architectures, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
|
|
|
|
|