ACM Home Page
Please provide us with feedback. Feedback
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Full text PdfPdf (1.33 MB)
Source Architectural Support for Programming Languages and Operating Systems archive
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems table of contents
San Jose, California
SESSION: Computer architecture table of contents
Pages: 211 - 222  
Year of Publication: 2002
ISBN:1-58113-574-2
Also published in ...
Authors
Changkyu Kim  The University of Texas, Austin
Doug Burger  The University of Texas, Austin
Stephen W. Keckler  The University of Texas, Austin
Sponsors
SIGPLAN: ACM Special Interest Group on Programming Languages
SIGOPS: ACM Special Interest Group on Operating Systems
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 31,   Downloads (12 Months): 201,   Citation Count: 62
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/605397.605420
What is a DOI?

ABSTRACT

Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a line's physical location within the cache. Consequently, cache access times will become a continuum of latencies rather than a single discrete latency. This non-uniformity can be exploited to provide faster access to cache lines in the portions of the cache that reside closer to the processor. In this paper, we evaluate a series of cache designs that provides fast hits to multi-megabyte cache memories. We first propose physical designs for these Non-Uniform Cache Architectures (NUCAs). We extend these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy--while using less silicon area--by 13%, and comes within 13% of an ideal minimal hit latency solution.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
D. Bailey, J. Barton, T. Lasinski, and H. Simon. The NAS parallel benchmarks. Technical Report RNR-91-002 Revision 2, NASA Ames Research Laboratory, Mountain View, CA, August 1991.
4
5
 
6
R. Desikan, D. Burger, S. W. Keckler, and T. M. Austin. Sim-alpha: A validated execution-driven alpha 21264 simulator. Technical Report TR-01-23, Department of Computer Sciences, University of Texas at Austin, 2001.
7
 
8
L. Gwennap. Alpha 21364 to ease memory bottleneck. Microprocessor Report, 12(14), October 1998.
9
 
10
J. M. Hill and J. Lachman. A 900MHz 2.25 MB cache with on-chip CPU now in Cu SOI. In Proceedings of the IEEE International Solid-State Circuits Conference, pages 171-177, February 2001.
 
11
M. Horowitz, R. Ho, and K. Mai. The future of wires. In Seminconductor Research Corporation Workshop on Interconnects for Systems on a Chip, May 1999.
12
 
13
 
14
J. Rubinstein, P. Penfield, and M. A. Horowitz. Signal delay in RC tree networks. IEEE Transactions on Computer-Aided Design, CAD-2(3):202-211, 1983.
15
 
16
N. Jouppi and S. Wilton. An enhanced access and cycle time model for on-chip caches. Technical Report TR-93-5, Compaq WRL, July 1994.
 
17
 
18
 
19
20
 
21
K.-F. Lee, H.-W. Hon, and R. Reddy. An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(1):35-44, 1990.
 
22
 
23
H. Pilo, A. Allen, J. Covino, P. Hansen, S. Lamphier, C. Murphy, T. Traver, and P. Yee. An 833MHz 1.5w 18Mb CMOS SRAM with 1.67Gb/s/pin. In Proceedings of the 2000 IEEE International Solid-State Circuits Conference, pages 266-267, February 2000.
 
24
 
25
 
26
The national technology roadmap for semiconductors. Semiconductor Industry Association, 1999.
 
27
P. Shivakumar and N. P. Jouppi. Cacti 3.0: An integrated cache timing, power and area model. Technical report, Compaq Computer Corporation, August 2001.
 
28
29
 
30
Standard Performance Evaluation Corporation. SPEC Newsletter, Fairfax, VA, September 2000.
 
31
32
 
33
S. Wilton and N. Jouppi. Cacti: An enhanced cache access and cycle time model. IEEE Journal of Solid-State Circuits, 31(5):677-688, May 1996.

CITED BY  62
Collaborative Colleagues:
Changkyu Kim: colleagues
Doug Burger: colleagues
Stephen W. Keckler: colleagues