ACM Home Page
Please provide us with feedback. Feedback
Way Stealing: cache-assisted automatic instruction set extensions
Full text PdfPdf (153 KB)
Source Annual ACM IEEE Design Automation Conference archive
Proceedings of the 46th Annual Design Automation Conference table of contents
San Francisco, California
SESSION: High-performance platforms: advances in system-level exploration and optimization table of contents
Pages 31-36  
Year of Publication: 2009
ISBN:978-1-60558-497-3
Authors
Theo Kluter  Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Philip Brisk  Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Paolo Ienne  Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Edoardo Charbon  Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland and Delft University of Technology, Delft, The Netherlands
Sponsors
EDAC : Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CAS : Circuits & Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 18,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629911.1629923
What is a DOI?

ABSTRACT

This paper introduces Way Stealing, a simple architectural modification to a cache-based processor to increase data bandwidth to and from application-specific Instruction Set Extensions (ISEs). Way Stealing provides more bandwidth to the ISE-logic than the register file alone and does not require expensive coherence protocols, as it does not add memory elements to the processor. When enhanced with Way Stealing, ISE identification flows detect more opportunities for acceleration than prior methods; consequently, Way Stealing can accelerate applications to up to 3.7X, whilst reducing the memory sub-system energy consumption by up to 67%, despite data-cache related restrictions.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
P. Biswas, N. Dutt, L. Pozzi, and P. Ienne. Introduction of architecturally visible storage in instruction set extensions. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, CAD-26(3):435--46, Mar. 2007.
 
2
J. Cong, G. Han, and Z. Zhang. Architecture and compiler optimizations for data bandwidth improvement in configurable embedded processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(9):986--97, Sept. 2006.
 
3
J. A. Fisher, P. Faraboschi, and C. Young. Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools. Morgan Kaufmann, San Francisco, Calif., 2005.
 
4
T. R. Halfhill. EEMBC releases first benchmarks. Microprocessor Report, 1 May 2000.
 
5
P. Ienne and R. Leupers, editors. Customizable Embedded Processors---Design Technologies and Applications. Systems on Silicon Series. Morgan Kaufmann, San Mateo, Calif., 2006.
 
6
R. Jayaseelan, H. Liu, and T. Mitra. Exploiting forwarding to improve data bandwidth of instruction-set extensions. In Proceedings of the 43rd Design Automation Conference, pages 43--48, San Francisco, Calif., July 2006.
 
7
K. Karuri, A. Chattopadhyay, M. Hohenauer, R. Leupers, G. Ascheid, and H. Meyr. Increasing data-bandwidth to instruction-set extensions through register clustering. In Proceedings of the International Conference on Computer Aided Design, pages 166--71, San Jose, Calif., Nov. 2007.
 
8
T. Kluter, P. Brisk, P. Ienne, and E. Charbon. Speculative DMA for Architecturally Visible Storage in Instruction Set Extensions. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, pages 243--48, Atlanta, Ga., Oct. 2008.
 
9
L. Pozzi, K. Atasu, and P. Ienne. Exact and approximate algorithms for the extension of embedded processor instruction sets. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, CAD-25(7):1209--29, July 2006.
 
10
G. Ramalingam. On loops, dominators, and dominance frontiers. ACM Transactions on Programming Languages and Systems (TOPLAS), 24(5):455--90, Sept. 2002.
 
11
P. Ranganathan, S. V. Adve, and N. P. Jouppi. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 214--24, Vancouver, June 2000.
 
12
D. Tarjan, S. Thoziyoor, and N. P. Jouppi. CACTI 4.0. Technical Report HPL-2006-86, Hewlett-Packard Development Company, Palo Alto, Calif., June 2006.