| Improving performance of simple cores by exploiting loop-level parallelism through value prediction and reconfiguration |
| Full text |
Pdf
(485 KB)
|
Source
|
Conference On Computing Frontiers
archive
Proceedings of the 6th ACM conference on Computing frontiers
table of contents
Ischia, Italy
SESSION: Advanved architectures 2
table of contents
Pages 151-160
Year of Publication: 2009
ISBN:978-1-60558-413-3
|
|
Authors
|
|
Tameesh Suri
|
State University of New York at Binghamton, Binghamton, NY, USA
|
|
Aneesh Aggarwal
|
State University of New York at Binghamton, Binghamton, NY, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 13, Downloads (12 Months): 71, Citation Count: 0
|
|
|
ABSTRACT
There is a growing trend towards designing simpler CPU cores that have considerable area, complexity, and power advantages. These cores are then leveraged in large-scale multicore processors or in SoCs for hand-held devices. The most significant limitation of such simple CPU cores is their lower performance. In this paper, we propose a technique to improve the performance of simple cores with minimal increase in complexity and area. In particular, we integrate a Reconfigurable Hardware Unit (RHU) that exploits loop-level parallelism to increase the core's overall performance. The RHU is reconfigured to execute instructions with highly predictable operand values from the future iterations of loops. Our experiments show that the proposed architecture improves the performance by an average of about 51% across a wide range of applications, while incurring a area overhead of only about 5.6%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
Philip Brisk , Adam Kaplan , Ryan Kastner , Majid Sarrafzadeh, Instruction generation and regularity extraction for reconfigurable processors, Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, October 08-11, 2002, Grenoble, France
[doi> 10.1145/581630.581672]
|
| |
6
|
D. Burger and T.M. Austin, The Simple Scalar Tool Set, Version 2.0, em Computer Arch. News, June 1997.
|
| |
7
|
|
 |
8
|
Yuan Chou , Pazhani Pillai , Herman Schmit , John Paul Shen, PipeRench implementation of the instruction path coprocessor, Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, p.147-158, December 2000, Monterey, California, United States
[doi> 10.1145/360128.360144]
|
 |
9
|
Nathan Clark , Jason Blome , Michael Chu , Scott Mahlke , Stuart Biles , Krisztian Flautner, An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors, Proceedings of the 32nd annual international symposium on Computer Architecture, p.272-283, June 04-08, 2005
|
| |
10
|
|
| |
11
|
Nathan Clark , Manjunath Kudlur , Hyunchul Park , Scott Mahlke , Krisztian Flautner, Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.30-40, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.5]
|
 |
12
|
|
 |
13
|
|
| |
14
|
Brian Fahs , Satarupa Bose , Matthew Crum , Brian Slechta , Francesco Spadini , Tony Tung , Sanjay J. Patel , Steven S. Lumetta, Performance characterization of a hardware mechanism for dynamic optimization, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
15
|
|
 |
16
|
|
| |
17
|
M. R. Guthaus , J. S. Ringenberg , D. Ernst , T. M. Austin , T. Mudge , R. B. Brown, MiBench: A free, commercially representative embedded benchmark suite, Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, p.3-14, December 02-02, 2001
[doi> 10.1109/WWC.2001.15]
|
| |
18
|
|
 |
19
|
Steven K. Hsu , Amit Agarwal , Kaushik Roy , Ram K. Krishnamurthy , Shekhar Borkar, An 8.3GHz dual supply/threshold optimized 32b integer ALU-register file loop in 90nm CMOS, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
[doi> 10.1145/1077603.1077630]
|
| |
20
|
|
 |
21
|
María Jesús Garzarán , Milos Prvulovic , José María Llabería , Víctor Viñals , Lawrence Rauchwerger , Josep Torrellas, Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), v.2 n.3, p.247-279, September 2005
[doi> 10.1145/1089008.1089010]
|
| |
22
|
I. Huang and A.M. Despain, Synthesis of application specific instruction sets IEEE TCAD, 1995
|
 |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
27
|
S. Lieberman et al., Extracting Statistical Loop-Level Parallelism using Hardware-Assisted Recovery, University of Michigan CSE Technical Report, CSE-TR-528-07, 2007
|
| |
28
|
Sun Microsystems, Inc. OpenSPARC T1 Micro Architecture Specification, Sun Microsystems, Inc., 2006.
|
 |
29
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
30
|
|
 |
31
|
|
 |
32
|
|
 |
33
|
|
 |
34
|
Fei Sun , Srivaths Ravi , Anand Raghunathan , Niraj K. Jha, Synthesis of custom processors based on extensible platforms, Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design, p.641-648, November 10-14, 2002, San Jose, California
[doi> 10.1145/774572.774667]
|
| |
35
|
T. Suri et al., Scalable Multi-cores with Improved Per-core Performance using Off-the-critical Path Reconfigurable Hardware Proc. IEEE International Conference on High Performance Computing, 2008
|
| |
36
|
|
| |
37
|
TSMC 90nm Core Library -- TCBN90GHP, Application Note -- Revision 1.2, 2006
|
| |
38
|
|
| |
39
|
R. Wittig and P. Chow, Onechip: An fpga processor with reconfigurable logic, Proc. FCCM, 1996.
|
|