ACM Home Page
Please provide us with feedback. Feedback
Partitioned first-level cache design for clustered microarchitectures
Full text PdfPdf (192 KB)
Source International Conference on Supercomputing archive
Proceedings of the 17th annual international conference on Supercomputing table of contents
San Francisco, CA, USA
SESSION: Processor microarchitecture I table of contents
Pages: 22 - 31  
Year of Publication: 2003
ISBN:1-58113-733-8
Authors
Paul Racunas  University of Michigan, Ann Arbor, MI
Yale N. Patt  University of Texas at Austin, Austin, TX
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 35,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/782814.782820
What is a DOI?

ABSTRACT

The high clock frequencies of modern superscalar processors make the wire delay incurred in moving data across the processor chip a significant concern. As frequencies continue to increase, it will become more difficult for a centralized first level data cache to supply the timely data bandwidth required by superscalar processors.This paper presents a complete solution for the partitioning of the first level of the memory hierarchy. The first level data cache is split into several independent partitions, which are arbitrarily distributable across the processor die. After being decoded, memory instructions are sent to the reservation stations of the functional unit adjacent to the cache partition that they are most likely to access. The partition assignments for both static instructions and cache data are dynamically changed to adapt to data access patterns. A data cache line is permitted to reside in only one partition at a time, allowing each store to update only a single partition, and allowing the partitioning and simplification of the memory disambiguation logic. The partitioned cache achieves a reduction in cache access latency through a combination of reduced wire delay and reduced cache array size. A partitioned cache with eight 8KB direct-mapped partitions maintains a hit rate greater than that of a 32KB direct-mapped cache. A machine utilizing the partitioned cache outperforms a machine with a conventional 64KB direct-mapped cache by 4.5% and a machine with a 64KB 8-way set-associative cache by 7.0%, when cache latencies estimated through the use of the CACTI cache simulation tool are taken into account.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
6
 
7
 
8
M. Franklin. The multiscalar architecture. Technical Report 1196, Computer Sciences Department, University of Wisconsin - Madison, Nov. 1993.
 
9
L. Gwennap. Digital 21264 sets new standard. Microprocessor Report, pages 11--16, Oct. 1996.
 
10
H. V. Henk~Neefs and K. D. Bosschere. A technique for high bandwidth and deterministic low latency load/store accesses to multiple cache banks. In Proceedings of the Sixth IEEE International Symposium on High Performance Computer Architecture, pages 313--324, 2000.
11
12
 
13
14
 
15
G. Reinman and N. P. Jouppi. Cacti 2.0: An integrated cache timing and power model. Technical report, Western Research Laboratory, 2000.
 
16
 
17
18
 
19


Collaborative Colleagues:
Paul Racunas: colleagues
Yale N. Patt: colleagues