ACM Home Page
Please provide us with feedback. Feedback
Improving performance of simple cores by exploiting loop-level parallelism through value prediction and reconfiguration
Full text PdfPdf (485 KB)
Source
Conference On Computing Frontiers archive
Proceedings of the 6th ACM conference on Computing frontiers table of contents
Ischia, Italy
SESSION: Advanved architectures 2 table of contents
Pages 151-160  
Year of Publication: 2009
ISBN:978-1-60558-413-3
Authors
Tameesh Suri  State University of New York at Binghamton, Binghamton, NY, USA
Aneesh Aggarwal  State University of New York at Binghamton, Binghamton, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 71,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1531743.1531768
What is a DOI?

ABSTRACT

There is a growing trend towards designing simpler CPU cores that have considerable area, complexity, and power advantages. These cores are then leveraged in large-scale multicore processors or in SoCs for hand-held devices. The most significant limitation of such simple CPU cores is their lower performance. In this paper, we propose a technique to improve the performance of simple cores with minimal increase in complexity and area. In particular, we integrate a Reconfigurable Hardware Unit (RHU) that exploits loop-level parallelism to increase the core's overall performance. The RHU is reconfigured to execute instructions with highly predictable operand values from the future iterations of loops. Our experiments show that the proposed architecture improves the performance by an average of about 51% across a wide range of applications, while incurring a area overhead of only about 5.6%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
5
 
6
D. Burger and T.M. Austin, The Simple Scalar Tool Set, Version 2.0, em Computer Arch. News, June 1997.
 
7
8
9
 
10
 
11
12
13
 
14
 
15
16
 
17
 
18
19
 
20
21
 
22
I. Huang and A.M. Despain, Synthesis of application specific instruction sets IEEE TCAD, 1995
23
 
24
25
 
26
 
27
S. Lieberman et al., Extracting Statistical Loop-Level Parallelism using Hardware-Assisted Recovery, University of Michigan CSE Technical Report, CSE-TR-528-07, 2007
 
28
Sun Microsystems, Inc. OpenSPARC T1 Micro Architecture Specification, Sun Microsystems, Inc., 2006.
29
30
31
32
33
34
 
35
T. Suri et al., Scalable Multi-cores with Improved Per-core Performance using Off-the-critical Path Reconfigurable Hardware Proc. IEEE International Conference on High Performance Computing, 2008
 
36
 
37
TSMC 90nm Core Library -- TCBN90GHP, Application Note -- Revision 1.2, 2006
 
38
 
39
R. Wittig and P. Chow, Onechip: An fpga processor with reconfigurable logic, Proc. FCCM, 1996.

Collaborative Colleagues:
Tameesh Suri: colleagues
Aneesh Aggarwal: colleagues