| Reducing data cache energy consumption via cached load/store queue |
| Full text |
Pdf
(141 KB)
|
| Source
|
International Symposium on Low Power Electronics and Design
archive
Proceedings of the 2003 international symposium on Low power electronics and design
table of contents
Seoul, Korea
SESSION: Power efficient cache design
table of contents
Pages: 252 - 257
Year of Publication: 2003
ISBN:1-58113-682-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 28, Citation Count: 5
|
|
|
ABSTRACT
High-performance processors use a large set--associative L1 data cache with multiple ports. As clock speeds and size increase such a cache consumes a significant percentage of the total processor energy. This paper proposes a method of saving energy by reducing the number of data cache accesses. It does so by modifying the Load/Store Queue design to allow "caching" of previously accessed data values on both loads and stores after the corresponding memory access instruction has been committed. It is shown that a 32-entry modified LSQ design allows an average of 38.5% of the loads in the SpecINT95 benchmarks and 18.9% in the SpecFP95 benchmarks to get their data from the LSQ. The reduction in the number of L1 cache accesses results in up to a 40% reduction in the L1 data cache energy consumption and in an up to a 16% improvement in the energy--delay product while requiring almost no additional hardware or complex control logic.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Rastislav Bodík , Rajiv Gupta , Mary Lou Soffa, Load-reuse analysis: design and evaluation, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.64-76, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
3
|
|
| |
4
|
D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, 1997.
|
| |
5
|
K. Diefendorff. K7 challenges Intel. Microprocessor Report, 12(14):1--7, Oct. 1998.
|
| |
6
|
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, (Q1):13, Feb. 2001.
|
 |
7
|
Koji Inoue , Tohru Ishihara , Kazuaki Murakami, Way-predicting set-associative cache for high performance and low energy consumption, Proceedings of the 1999 international symposium on Low power electronics and design, p.273-275, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313948]
|
| |
8
|
|
| |
9
|
Johnson Kin , Munish Gupta , William H. Mangione-Smith, The filter cache: an energy efficient memory structure, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.184-193, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
10
|
K. M. Lepak. Silent stores for free: Reducing the cost of store verification. Master's thesis, University of Wisconsin--Madison, 2000.
|
| |
11
|
A. Moshovos and G. S. Sohi. Read-after-read memory dependence prediction. 1999.
|
| |
12
|
|
| |
13
|
W. Tang, A. Veidenbaum, A. Nicolau, and R. Gupta. Simultaneous way-footprint prediction and branch prediction for energy savings in set-associative instruction caches. In IEEE Workshop on Power Management for Real-Time and Embedded Systems, 2001.
|
 |
14
|
|
|