|
ABSTRACT
In recent years, it is increasingly common to see using application specific instruction set processors (ASIPs) in embedded system designs. These ASIPs can offer the ability of customizing hardware computation accelerators for an application domain. Along with instruction set extensions (ISEs), the customized accelerators can significantly improve the performance of embedded processors, which has already been exemplified in previous research work and industrial products. However, these accelerators in ASIPs can only accelerate the applications that are compiled with ISEs. Those applications compiled without ISEs can not benefit from the hardware accelerators at all. In this paper, we propose using software dynamic binary translation to overcome this problem, i.e. dynamically utilizing the accelerators. Unlike a static approach, dynamically utilizing accelerator poses many new problems. This paper comprehensively explores the techniques and design choices for solving these problems, and demonstrates the effectiveness by the results of experiments.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. E. Smith and R. Nair. Virtual Machines. Morgan Kaufmann Publishers, 2005.
|
| |
2
|
Q. Wu, M. Martonosi, D. W. Clark, Y. Reddi, D. Connors, Y. Wu, J. Lee, and D. Brooks. A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance. Int'l. Symp. on Microarchitecture, 2005.
|
| |
3
|
V. Kiriansky, D. Bruening and S. Amarasinghe. Secure execution via program shepherding. USENIX Security, 2002.
|
| |
4
|
K. Scott and J. Davidson. Safe virtual execution using software dynamic translation. Annual Computer Security Applications Conference, 2002.
|
| |
5
|
J. E. Miller and A. Agarwal. Software-based Instruction Caching for Embedded Processors. Conf. on Architectural Support for Programming Languages and Operating Systems, 2006.
|
| |
6
|
G. Desoli, N. Mateev, E. Duesterwald, P. Faraboschi, and J. A. Fisher. DELI: A new run--time control point. Int'l.Symp. on Microarchitecture, 2002.
|
| |
7
|
S. Shogan and B. R. Childers. Compact Binaries with Code Compression in a Software Dynamic Translator. Conf. Design Automation and Test in Europe, 2004.
|
| |
8
|
S. Zhou, B. R. Childers, and M. L. Soffa. Planning for Code Buffer Management in Distributed Virtual Execution Environments. Conf. Virtual Execution Environments, 2005.
|
| |
9
|
J. Baiocchi, B. R. Childers, J. W. Davidson, J. D. Hiser and J. Misurda. Fragment Cache Management for Dynamic Binary Translators in Embedded Systems with Scratchpad. Int'l. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2007.
|
| |
10
|
K. Atasu, Laura Pozzi, Paolo Ienne. Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints. Proc. Design Automation Conference, Anaheim, CA, USA, June 2003, pp. 256--261
|
| |
11
|
A. Peymandoust, L. Pozzi, P. Ienne, G. De Micheli. Automatic Instruction Set Extension and Utilization for Embedded Processors. Proc. IEEE International Conference on Application-Specific Systems, Architectures and Processors, Hague, Netherlands, June 2003, pp. 108--118
|
| |
12
|
N. Clark, H. Zhong, S. Mahlke. Processor Acceleration through Automated Instruction Set Customization. Proc. the 36th Annual International Symposium on Microarchitecture, San Diego, CA, USA, Dec. 2003, pp. 129--140
|
| |
13
|
P. Yu, T. Mitra. Characterizing Embedded Applications for Instruction-set Extensible Processors. Design Automation Conference, San Diego, CA, June 2004, pp. 723--728
|
| |
14
|
N. Cheung, J. Henkel, S. Parameswaran. Rapid Configuration and Instruction Selection for an ASIP: A Case Study. DATE 2003, pp. 802--807
|
| |
15
|
L. Pozzi and P. Ienne. Exploiting Pipelining to Relax Register-File Port Constraints of Instruction-Set Extensions. In Proc. of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems.
|
| |
16
|
N. Clark et al. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proc. of the 37th Annual International Symposium on Microarchitecture, Dec. 2004.
|
| |
17
|
N. Clark, J. Blome, M. Chu, S. Mahlke, S. Biles, K. Flautner. An architecture framework for transparent instruction set customization in embedded processors. Proc the 32nd Annual International Symposium on Computer Architecture, Wisconsin, USA, June 2005.
|
| |
18
|
H. P. Huynh, J. E. Sim and T. Mitra. An Efficient Framework for Dynamic Reconfiguration of Instruction-Set Customization. In Proc. of the 2007 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems.
|
| |
19
|
Stretch Inc. Stretch S5530 software configurable processor.
|
| |
20
|
L. Bauer, M. Shafique, S. Kramer, J. Henkel. RISPP: Rotating Instruction Set Processing Platform. DAC 2007, pp. 791--796
|
| |
21
|
L. Bauer, M. Shafique, S. Kreutz, J. Henkel. Run-time System for an Extensible Embedded Processor with Dynamic Instruction Set. DATE 2008, pp. 752--757
|
| |
22
|
L. Bauer, M. Shafique and J. Henkel. Run-time Instruction Set Selection in a Transmutable Embedded Processor. DAC 2008, pp. 56--61
|
| |
23
|
N. Clark et al. Application--specific processing on a general-purpose core via transparent instruction set customization. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 30--40, Dec. 2004.
|
| |
24
|
P. Sassone and D. S. Wills. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proc. of the 37th Annual International Symposium on Microarchitecture, pages 7--17, Dec. 2004.
|
| |
25
|
M. Vuletic, C. Dubach, L. Pozzi and P. Ienne. Enabling unrestricted automated synthesis of portable hardware accelerators for virtual machines. CODES+ISSS 2005.
|
| |
26
|
S. Hu, I. Kim, M. H. Lipasti, and J. E. Smith. An approach for implementing efficient superscalar CISC processors. In Proc. of the 12th International Symposium on High-Performance Computer Architecture, pages 213--226, 2006.
|
| |
27
|
N. Clark, A. Hormati, S. Mahlke and S. Yehia. Scalable subgraph mapping for acyclic computation accelerators. Proc. International Conference on Compilers, Architecture and Synthesis for Embedded Systems, Seoul, Korea, 2006, pp. 147--157
|
| |
28
|
Y. Lü, L. Shen, L. B. Huang, Z. Y. Wang and N. Xiao. Customizing Computation Accelerators for Extensible Multi-issue Processors with Effective Optimization Techniques. DAC 2008, pp. 197--200
|
| |
29
|
|
| |
30
|
V. Bala, E. Duesterwald, S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. Int'l Symp. on Programming Language Design and Implementation, pp. 1--12, Jun. 2000.
|
| |
31
|
Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge and Richard B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In IEEE 4th Workshop on Workload Characterization, Austin, Texas, USA, Dec. 2001, pp. 3--14
|
| |
32
|
Chunho Lee, Miodrag Potkonjak and William H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons Systems. Proc. International Symposium on Microarchitecture, Research Triangle Park, North Carolina, USA, Dec. 1997, pp 330--335
|
|