ACM Home Page
Please provide us with feedback. Feedback
Compiler managed micro-cache bypassing for high performance EPIC processors
Full text Publisher SitePublisher Site PdfPdf (1.15 MB)
Source International Symposium on Microarchitecture archive
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture table of contents
Istanbul, Turkey
SESSION: Compiler scheduling table of contents
Pages: 134 - 145  
Year of Publication: 2002
ISBN ~ ISSN:1072-4451 , 0-7695-1859-1
Authors
Youfeng Wu  Intel Corporation, Santa Clara, CA
Ryan Rakvic  Intel Corporation, Santa Clara, CA
Li-Ling Chen  Intel Corporation, Santa Clara, CA
Chyi-Chang Miao  Intel Corporation, Shrewsbury, MA
George Chrysos  Intel Corporation, Shrewsbury, MA
Jesse Fang  Intel Corporation, Santa Clara, CA
Sponsors
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
: IEEE TC-uArch
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 14,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Advanced microprocessors have been increasing clock rates, well beyond the Gigahertz boundary. For such high performance microprocessors, a small and fast data micro cache (ucache) is important to overall performance, and proper management of it via load bypassing has a significant performance impact. In this paper, we propose and evaluate a hardware-software collaborative technique to manage ucache bypassing for EPIC processors. The hardware supports the ucache bypassing with a flag in the load instruction format, and the compiler employs static analysis and profiling to identify loads that should bypass the ucache. The collaborative method achieves a significant improvement in performance for the SpecInt2000 benchmarks. On average, about 40%, 30%, 24%, and 22% of load references are identified to bypass 256B, 1K, 4K, and 8K sized ucaches, respectively. This reduces the ucache miss rates by 39%, 32%, 28%, and 26%. The number of pipeline stalls from loads to their uses is reduced by 13%, 9%, 6%, and 5%. Meanwhile, the L1 and L2 cache misses remain largely unchanged. For the 256B ucache, bypassing improves overall performance on average by 5%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Chi, C. H.; Dietz, H., "Improving cache performance by selective cache bypass," Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences, 1989. Vol. I: Architecture Track, 1989, pp 277--285 vol. 1
 
3
4
5
 
6
7
8
 
9
 
10
Kurd, N.A.; Barkarullah, J.S.; Dizon, R.O.; Fletcher, T.D.; Madland, P.D. "A multigigahertz clocking scheme for the Pentium(R) 4 microprocessor," IEEE Journal of Solid-State Circuits, Volume: 36 Issue: 11, Nov. 2001 pp 1647--1653
 
11
12
 
13
Livadas, P.E.; Roy, P.K., "Program dependence analysis," Proceedings of Conference on Software Maintenance, 1992, pp 356--365
14
 
15
Pyreddy R. and G. Tyson, "Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints," Workshop on Complexity-Effective Design (WCED), Anchorage, Alaska, USA, May 25--26, 2002
 
16
Rakvic, Ryan; Black, Bryan; Limaye, Deepak; and Shen, John. "Non-vital Loads," HPCA 8, Feb 2--6, 2002.
 
17
 
18
19
 
20
 
21
 
22
 
23
 
24
Uhlig, R.; R. Fishtein; O. Gershon; I. Hirsh; H. Wang, "SoftSDV: A Pre-silicon Software Development Environment for the IA-64 Architecture", Intel Technology Journal Q4 1999, http://www.intel.com/technology/itj/q41999/articles/art_2.htm
25


Collaborative Colleagues:
Youfeng Wu: colleagues
Ryan Rakvic: colleagues
Li-Ling Chen: colleagues
Chyi-Chang Miao: colleagues
George Chrysos: colleagues
Jesse Fang: colleagues