|
ABSTRACT
Deep-submicron designs have to take care of process variation effects as variations in critical process parameters result in large variations in access latencies of hardware components. This is severe in the case of memory components as minimum sized transistors are used in their design. In this work, by considering on-chip data caches, we study the effect of access latency variations on performance. We discuss performance losses due to the worst-case design, wherein the entire cache operates with the worst-case process variation delay, followed by process variation aware cache designs which work at set-level granularity. We then propose a technique called block rearrangement to minimize performance loss incurred by a process variation aware cache which works at set-level granularity. Using block rearrangement technique, we rearrange the physical locations of cache blocks such that a cache set can have its "n" blocks (assuming a n-way set-associative cache) in multiple rows instead of a single row as in the case of a cache with conventional addressing scheme. By distributing blocks of a cache set over multiple sets, we minimize the number of sets being affected by process variation. We evaluate our technique using SPEC2000 CPU benchmarks and show that our technique achieves significant performance benefits over caches with conventional addressing scheme.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
SimpleScalar toolset. http://www.simplescalar.com.
|
| |
2
|
SPEC 2000 Benchmark. http://www.spec.org.
|
| |
3
|
A. Agarwal et. al., "Process variation in embedded memories: failure analysis and variation aware architecture", IEEE J. of Solid-State Circuits, 40(9), 2005, pp. 1804--1814.
|
 |
4
|
Michael Bekerman , Adi Yoaz , Freddy Gabbay , Stephan Jourdan , Maxim Kalaev , Ronny Ronen, Early load address resolution via register tracking, Proceedings of the 27th annual international symposium on Computer architecture, p.306-315, June 2000, Vancouver, British Columbia, Canada
|
 |
5
|
Shekhar Borkar , Tanay Karnik , Siva Narendra , Jim Tschanz , Ali Keshavarzi , Vivek De, Parameter variations and impact on circuits and microarchitecture, Proceedings of the 40th conference on Design automation, June 02-06, 2003, Anaheim, CA, USA
[doi> 10.1145/775832.775920]
|
| |
6
|
K. Bowman et. al., "Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration", IEEE J. of Solid-State Circuits, 2002, 37(2), pp. 183--190.
|
| |
7
|
M. L. Bushnell and V. D. Agarwal, Essentials of Electronic Testing for Digital, Memory, and Mixed-Signal VLSI Circuits, Kluwer, 2000.
|
| |
8
|
|
 |
9
|
|
| |
10
|
R. J. Eickemeyer and S. Vassiliadis, "A load instruction unit for pipelined processors", IBM J. of Research and Development, 1993, 3, pp. 547--564.
|
| |
11
|
Dan Ernst , Shidhartha Das , Seokwoo Lee , David Blaauw , Todd Austin , Trevor Mudge , Nam Sung Kim , Krisztian Flautner, Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation, IEEE Micro, v.24 n.6, p.10-20, November 2004
[doi> 10.1109/MM.2004.85]
|
| |
12
|
Nam Sung Kim , Taeho Kgil , K. Bowman , V. De , T. Mudge, Total power-optimal pipelining and parallel processing under process variations in nanometer technology, Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design, p.535-540, November 06-10, 2005, San Jose, CA
|
| |
13
|
Ja Chun Ku , Serkan Ozdemir , Gokhan Memik , Yehea Ismail, Thermal Management of On-Chip Caches Through Power Density Minimization, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.283-293, November 12-16, 2005, Barcelona, Spain
[doi> 10.1109/MICRO.2005.36]
|
 |
14
|
Mikko H. Lipasti , Christopher B. Wilkerson , John Paul Shen, Value locality and load value prediction, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.138-147, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
15
|
G. Memik et. al., "Precise instruction scheduling", J. of Instruction-Level Parallelism, 2005, pp. 1--29.
|
 |
16
|
Andreas Moshovos , Scott E. Breach , T. N. Vijaykumar , Gurindar S. Sohi, Dynamic speculation and synchronization of data dependences, Proceedings of the 24th annual international symposium on Computer architecture, p.181-193, June 01-04, 1997, Denver, Colorado, United States
|
| |
17
|
|
| |
18
|
S. Mukhopadhyay et. al., "Modeling and estimation of failure probability due to parameter variations in nanoscale SRAMs for yield enhancement", Symposium on VLSI Circuits, 2004, pp.789--796.
|
| |
19
|
S. Nassif, "Within Chip variability analysis", IEEE IEDM conference, 1998, pp. 283--286.
|
| |
20
|
S. Nassif, "Modeling and analysis of manufacturing variations", CICC, 2001, pp. 223--228.
|
 |
21
|
A. Papanikolaou , F. Lobmaier , H. Wang , M. Miranda , F. Catthoor, A system-level methodology for fully compensating process variability impact of memory organizations in periodic applications, Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, September 19-21, 2005, Jersey City, NJ, USA
[doi> 10.1145/1084834.1084866]
|
| |
22
|
Michael D. Powell , Amit Agarwal , T. N. Vijaykumar , Babak Falsafi , Kaushik Roy, Reducing set-associative cache energy via way-prediction and selective direct-mapping, Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, December 01-05, 2001, Austin, Texas
|
| |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
P. Zuchowski et. al., "Process and environmental variation impacts on ASIC timing", DAC, 2005.
|
|