|
ABSTRACT
The memory subsystem, including address translations and cache accesses, consumes a major portion of the overall energy on a processor. In this paper, we address the memory energy issues by using a streamlined architectural partitioning technique that effectively reduces energy consumption in the memory subsystem without compromising performance. It is achieved by decoupling the d-TLB lookups and the data cache accesses, based on the semantic regions defined by programming languages and software convention, into discrete reference substreams --- stack, global static, and heap. Their unique access behaviors and locality characteristics are analyzed and exploited for power reduction. Our results show that an average of 35% energy can be reduced in the d-TLB and the data cache. Furthermore, an average of 46% energy can be saved by selectively multi-porting the semantic-aware d-TLBs and data caches against their monolithic counterparts.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
T. M. Austin. Simplescalar 4.0 Release Note. http://www.simplescalar.com/.
|
| |
4
|
A. D. Berenbaum, D. R. Ditzel, and H. R. McLellan. An Introduction to the CRISP Architecture. In Proceedings of the Spring COMPCON, 1987.
|
| |
5
|
R. P. Blake. Exploring a Stack Architecture. IEEE Computer, May 1977.
|
 |
6
|
|
 |
7
|
|
 |
8
|
Sangyeun Cho , Pen-Chung Yew , Gyungho Lee, Decoupling local variable accesses in a wide-issue superscalar processor, Proceedings of the 26th annual international symposium on Computer architecture, p.100-110, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
9
|
J.-H. Choi, J.-H. Lee, S.-W. Jeong, S.-D. Kim, and C. Weems. A Low power TLB structure for Embedded Systems. IEEE Computer TCCA Letter, January 2002.
|
| |
10
|
|
 |
11
|
|
 |
12
|
Kanad Ghose , Milind B. Kamble, Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation, Proceedings of the 1999 international symposium on Low power electronics and design, p.70-75, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313860]
|
| |
13
|
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A Free, Commercially Representative Embedded Benchmark Suite. In IEEE 4th Workshop on Workload Characterization, 2001.
|
 |
14
|
Michael Huang , Jose Renau , Seung-Moon Yoo , Josep Torrellas, L1 data cache decomposition for energy efficiency, Proceedings of the 2001 international symposium on Low power electronics and design, p.10-15, August 2001, Huntington Beach, California, United States
[doi> 10.1145/383082.383086]
|
| |
15
|
N. Jouppi. CACTI 3.0. http://research.compaq.com/wrl /people/jouppi/CACTI.html, 1999.
|
 |
16
|
Toni Juan , Tomas Lang , Juan J. Navarro, Reducing TLB power requirements, Proceedings of the 1997 international symposium on Low power electronics and design, p.196-201, August 18-20, 1997, Monterey, California, United States
[doi> 10.1145/263272.263332]
|
| |
17
|
I. Kadayif , A. Sivasubramaniam , M. Kandemir , G. Kandiraju , G. Chen, Generating physical addresses directly for saving instruction TLB energy, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
18
|
|
| |
19
|
H.-H. S. Lee, J. B. Fryman, A. U. Diril, and Y. S. Dhillon. The Elusive Metric for Low-Power Architecture Research. In Workshop on Complexity-Effective Design in conjunction with ISCA-30, 2003.
|
 |
20
|
Hsien-Hsin S. Lee , Gary S. Tyson, Region-based caching: an energy-delay efficient memory architecture for embedded processors, Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems, p.120-127, November 17-19, 2000, San Jose, California, United States
[doi> 10.1145/354880.354898]
|
| |
21
|
James Montanaro , Richard T. Witek , Krishna Anne , Andrew J. Black , Elizabeth M. Cooper , Daniel W. Dobberpuhl , Paul M. Donahue , Jim Eno , Gregory W. Hoeppner , David Kruckemyer , Thomas H. Lee , Peter C. M. Lin , Liam Madden , Daniel Murray , Mark H. Pearce , Sribalan Santhanam , Kathryn J. Snyder , Ray Stephany , Stephen C. Thierauf, A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor, Digital Technical Journal, v.9 n.1, p.49-62, 1997
|
 |
22
|
|
CITED BY 9
|
|
Dongrui Fan , Zhimin Tang , Hailin Huang , Guang R. Gao, An energy efficient TLB design methodology, Proceedings of the 2005 international symposium on Low power electronics and design, August 08-10, 2005, San Diego, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
Chinnakrishnan Ballapuram , Kiran Puttaswamy , Gabriel H. Loh , Hsien-Hsin S. Lee, Entropy-based low power data TLB design, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
|
|
|
|
|
|
Olga Golubeva , Mirko Loghi , Enrico Macii , Massimo Poncino, Locality-driven architectural cache sub-banking for leakage energy reduction, Proceedings of the 2007 international symposium on Low power electronics and design, August 27-29, 2007, Portland, OR, USA
|
|
|
|
|