| Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors |
| Full text |
Pdf
(521 KB)
|
Source
|
Language, Compiler and Tool Support for Embedded Systems
archive
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
table of contents
Tucson, AZ, USA
SESSION: Architecture
table of contents
Pages 71-78
Year of Publication: 2008
ISBN:978-1-60558-104-0
Also published in ...
|
|
Authors
|
|
Houman Homayoun
|
University of California Irvine, Irvine, CA, USA
|
|
Sudeep Pasricha
|
University of California Irvine, Irvine, CA, USA
|
|
Mohammad Makhzan
|
University of California Irvine, Irvine, CA, USA
|
|
Alex Veidenbaum
|
University of California Irvine, Irvine, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 78, Citation Count: 0
|
|
|
ABSTRACT
While Ultra Deep Submicron (UDSM) CMOS scaling gives embedded processor designers ample silicon budget to increase processor resources to improve performance, restrictions with the power budget and practically achievable operating clock frequencies act as limiting factors. In this paper we show how just increasing processor resource size is not effective in improving performance due to constraints on achievable operating clock frequency. In response we propose two adaptive resource resizing techniques L2RS and L2ML1RS that adaptively resize resources by exploiting cache misses. Our results show a significant performance improvement and overall energy-delay reduction of on average 9.2% (upto 34%) and 3.8% respectively across SPEC2K benchmarks for L2ML1RS. Applying L2RS resulted in 6.8% performance improvement (upto 24%) and 4.6% energy-delay reduction. We also present the required circuit modification to apply these techniques which shown to be minimal.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
José-Lorenzo Cruz , Antonio González , Mateo Valero , Nigel P. Topham, Multiple-banked register file architectures, Proceedings of the 27th annual international symposium on Computer architecture, p.316-325, June 2000, Vancouver, British Columbia, Canada
|
 |
3
|
|
| |
4
|
|
| |
5
|
Joseph Sharkey, Dmitry Ponomarev, "An L2-Miss-Driven Early Register Deallocation for SMT Processors", ICS 2007.
|
| |
6
|
|
| |
7
|
S. Rixner,W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. "Register organization for media processing." In Proc. of the 6th Intl. Symp. on High-Performance Computer Architecture, pages 375--386, 1999.
|
| |
8
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
9
|
|
| |
10
|
|
| |
11
|
IBM Corporation. PowerPC 750 RISC Microprocessor Technical Summary. www.ibm.com.
|
 |
12
|
|
| |
13
|
|
 |
14
|
Alper Buyuktosunoglu , David Albonesi , Stanley Schuster , David Brooks , Pradip Bose , Peter Cook, A circuit level implementation of an adaptive issue queue for power-aware microprocessors, Proceedings of the 11th Great Lakes symposium on VLSI, p.73-78, March 2001, West Lafayette, Indiana, United States
[doi> 10.1145/368122.368807]
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
| |
19
|
Cacti4," http://quid.hpl.hp.com:9081/cacti/.
|
| |
20
|
SimpleScalar4 tutorial, SimpleScalar LLC. http://www.simplescalar.com/tutorial.html
|
 |
21
|
|
| |
22
|
S. Geissler et al., "A low-power RISC microprocessor using dual PLLs in a 0.13/spl mu/m SOI technology with copper interconnect and low-k BEOL dielectric", in ISSCC 2002.
|
REVIEW
"Carlos Juiz : Reviewer"
Homayoun et al. contend that increasing the size of processor resources is not an effective way to improve performance, due to constraints on achievable clock frequency during operation. In fact, while increasing the size of processor resources, s
more...
|