ACM Home Page
Please provide us with feedback. Feedback
Reducing data cache energy consumption via cached load/store queue
Full text PdfPdf (141 KB)
Source International Symposium on Low Power Electronics and Design archive
Proceedings of the 2003 international symposium on Low power electronics and design table of contents
Seoul, Korea
SESSION: Power efficient cache design table of contents
Pages: 252 - 257  
Year of Publication: 2003
ISBN:1-58113-682-X
Authors
Dan Nicolaescu  University of California, Irvine, CA
Alex Veidenbaum  University of California, Irvine, CA
Alex Nicolau  University of California, Irvine, CA
Sponsors
ACM: Association for Computing Machinery
SIGDA: ACM Special Interest Group on Design Automation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 28,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/871506.871569
What is a DOI?

ABSTRACT

High-performance processors use a large set--associative L1 data cache with multiple ports. As clock speeds and size increase such a cache consumes a significant percentage of the total processor energy. This paper proposes a method of saving energy by reducing the number of data cache accesses. It does so by modifying the Load/Store Queue design to allow "caching" of previously accessed data values on both loads and stores after the corresponding memory access instruction has been committed. It is shown that a 32-entry modified LSQ design allows an average of 38.5% of the loads in the SpecINT95 benchmarks and 18.9% in the SpecFP95 benchmarks to get their data from the LSQ. The reduction in the number of L1 cache accesses results in up to a 40% reduction in the L1 data cache energy consumption and in an up to a 16% improvement in the energy--delay product while requiring almost no additional hardware or complex control logic.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
D. Burger and T. M. Austin. The simplescalar tool set, version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison, 1997.
 
5
K. Diefendorff. K7 challenges Intel. Microprocessor Report, 12(14):1--7, Oct. 1998.
 
6
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, (Q1):13, Feb. 2001.
7
 
8
 
9
 
10
K. M. Lepak. Silent stores for free: Reducing the cost of store verification. Master's thesis, University of Wisconsin--Madison, 2000.
 
11
A. Moshovos and G. S. Sohi. Read-after-read memory dependence prediction. 1999.
 
12
 
13
W. Tang, A. Veidenbaum, A. Nicolau, and R. Gupta. Simultaneous way-footprint prediction and branch prediction for energy savings in set-associative instruction caches. In IEEE Workshop on Power Management for Real-Time and Embedded Systems, 2001.
14


Collaborative Colleagues:
Dan Nicolaescu: colleagues
Alex Veidenbaum: colleagues
Alex Nicolau: colleagues