| A predictive decode filter cache for reducing power consumption in embedded processors |
| Full text |
Pdf
(268 KB)
|
Source
|
ACM Transactions on Design Automation of Electronic Systems (TODAES)
archive
Volume 12 , Issue 2 (April 2007)
table of contents
Article No. 14
Year of Publication: 2007
ISSN:1084-4309
|
|
Authors
|
|
Weiyu Tang
|
Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA
|
|
Arun Kejariwal
|
Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA
|
|
Alexander V. Veidenbaum
|
Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA
|
|
Alexandru Nicolau
|
Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 45, Citation Count: 0
|
|
|
ABSTRACT
With advances in semiconductor technology, power management has increasingly become a very important design constraint in processor design. In embedded processors, instruction fetch and decode consume more than 40% of processor power. This calls for development of power minimization techniques for the fetch and decode stages of the processor pipeline. For this, filter cache has been proposed as an architectural extension for reducing the power consumption. A filter cache is placed between the CPU and the instruction cache (I-cache) to provide the instruction stream. A filter cache has the advantages of shorter access time and lower power consumption. However, the downside of a filter cache is a possible performance loss in case of cache misses. In this article, we present a novel technique---decode filter cache (DFC)---for minimizing power consumption with minimal performance impact. The DFC stores decoded instructions. Thus, a hit in the DFC eliminates instruction fetch and its subsequent decoding. The bypassing of both instruction fetch and decode reduces processor power. We present a runtime approach for predicting whether the next fetch source is present in the DFC. In case a miss is predicted, we reduce the miss penalty by accessing the I-cache directly. We propose to classify instructions as cacheable or noncacheable, depending on the decode width. For efficient use of the cache space, a sectored cache design is used for the DFC so that both cacheable and noncacheable instructions can coexist in the DFC sector. Experimental results show that the DFC reduces processor power by 34% on an average and our next fetch prediction mechanism reduces miss penalty by more than 91%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Nikolaos Bellas , Ibrahim Hajj , Constantine Polychronopoulos, Using dynamic cache management techniques to reduce energy in a high-performance processor, Proceedings of the 1999 international symposium on Low power electronics and design, p.64-69, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313856]
|
 |
3
|
Kanad Ghose , Milind B. Kamble, Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation, Proceedings of the 1999 international symposium on Low power electronics and design, p.70-75, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313860]
|
 |
4
|
Dirk Grunwald , Artur Klauser , Srilatha Manne , Andrew Pleszkun, Confidence estimation for speculation control, Proceedings of the 25th annual international symposium on Computer architecture, p.122-131, June 27-July 02, 1998, Barcelona, Spain
|
| |
5
|
|
| |
6
|
Johnson Kin , Munish Gupta , William H. Mangione-Smith, The filter cache: an energy efficient memory structure, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.184-193, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
7
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
8
|
|
| |
9
|
Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Hoeppner, G. W., Kruckemyer, D., Lee, T. H., Lin, P. C. M., Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephany, R., and Thierauf, S. C. 1996. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE J. Solid-State Circ. 32, 11, 1703--1714.
|
| |
10
|
Kevin B. Normoyle , Michael A. Csoppenszky , Allan Tzeng , Timothy P. Johnson , Christopher D. Furman , Jamshid Mostoufi, UltraSPARC-IIi: Expanding the Boundaries of a System on a Chip, IEEE Micro, v.18 n.2, p.14-24, March 1998
[doi> 10.1109/40.671399]
|
| |
11
|
|
| |
12
|
|
 |
13
|
|
 |
14
|
Baruch Solomon , Avi Mendelson , Doron Orenstein , Yoav Almog , Ronny Ronen, Micro-operation cache: a power aware frontend for the variable instruction length ISA, Proceedings of the 2001 international symposium on Low power electronics and design, p.4-9, August 2001, Huntington Beach, California, United States
[doi> 10.1145/383082.383085]
|
| |
15
|
SimpleScalar. The SimpleScalar tool set. http://simplescalar.com/.
|
 |
16
|
|
| |
17
|
Vivekanandarajah, K., Srikanthan, T., Bhattacharyya, S., and Kannan, P. V. 2003. Incorporating pattern prediction technique for energy efficient filter cache design. In Proceedings of the 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications (Calgary, Alberta, Canada). 44--47.
|
| |
18
|
Wilton, S. and Jouppi, N. 1994. An enhanced access and cycle time model for on-chip caches. Tech. Rep. 93/5, Digital Western Research Laboratory.
|
|