|
ABSTRACT
The incidence of hard errors in CPUs is a challenge for future multicore designs due to increasing total core area. Even if the location and nature of hard errors are known a priori, either at manufacture-time or in the field, cores with such errors must be disabled in the absence of hard-error tolerance. While caches, with their regular and repetitive structures, are easily covered against hard errors by providing spare arrays or spare lines, structures within a core are neither as regular nor as repetitive. Previous work has proposed microarchitectural core salvaging to exploit structural redundancy within a core and maintain functionality in the presence of hard errors. Unfortunately microarchitectural salvaging introduces complexity and may provide only limited coverage of core area against hard errors due to a lack of natural redundancy in the core. This paper makes a case for architectural core salvaging. We observe that even if some individual cores cannot execute certain operations, a CPU die can be instruction-set-architecture (ISA) compliant, that is execute all of the instructions required by its ISA, by exploiting natural cross-core redundancy. We propose using hardware to migrate offending threads to another core that can execute the operation. Architectural core salvaging can cover a large core area against faults, and be implemented by leveraging known techniques that minimize changes to the microarchitecture. We show it is possible to optimize architectural core salvaging such that the performance on a faulty die approaches that of a fault-free die--assuring significantly better performance than core disabling for many workloads and no worse performance than core disabling for the remainder.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
M. Bushnell and V. Agrawal. Essentials of Electronic Testing for Digital, Memory, and Mixed-Signal VLSI Circuits. Springer, 2000.
|
| |
5
|
J. Chang, M. Huang, J. Shoemaker, J. Benoit, S.-L. Chen, W. Chen, S. Chiu, R. Ganesan, G. Leong, V. Lukka, S. Rusu, and D. Srivastava. The 65nm 16mb on-die l3 cache for a dual core multi-threaded xeon processor. In 2006 Symposium on VLSI Circuits, pages 126--127, Feb. 2006.
|
| |
6
|
Kypros Constantinides , Onur Mutlu , Todd Austin , Valeria Bertacco, Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation, Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, p.97-108, December 01-05, 2007
[doi> 10.1109/MICRO.2007.39]
|
| |
7
|
Joel Emer , Pritpal Ahuja , Eric Borch , Artur Klauser , Chi-Keung Luk , Srilatha Manne , Shubhendu S. Mukherjee , Harish Patil , Steven Wallace , Nathan Binkert , Roger Espasa , Toni Juan, Asim: A Performance Model Framework, Computer, v.35 n.2, p.68-76, February 2002
[doi> 10.1109/2.982918]
|
| |
8
|
G. Gerosa, S. Curtis, M. D'Addeo, B. Jiang, B. Kuttanna, F. Merchant, B. Patel, M. Taufique, and H. Samarchi. A sub-lw to 2w low-power IA processor formobile internet devices and ultra-mobile PCs in 45nm hi-k metal gate CMOS. In 2008 IEEE International Solid-State Circuits Conference, Feb. 2008.
|
| |
9
|
M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. A novel simd architecture for the cell heterogeneous chip-multiprocessor. In Proceedings of Seventeenth Symposium of IEEE Hot Chips, Aug. 2005.
|
| |
10
|
S. Gunther, F. Binns, D. M. Carmean, and J. C. Hall. Managing the impact of increasing microprocessor power consumption. In Intel Technology Journal Q1 2001, Q1 2001.
|
| |
11
|
Intel Corporation. First Details on a Future Intel Design Codenamed Larrabee. http://www.intel.com/pressroom/archive/releases/20080804fact.htm, Aug. 2008.
|
| |
12
|
Intel Corporation. Intel Core 2 Duo Processor and Intel Core 2 Extreme Processor on 45-nm Process for Platforms Based on Mobile Intel 965 Express Chipset Family. ftp://download.intel.com/design/mobile/datashts/31891401.pdf, Jan. 2008.
|
| |
13
|
Intel Corporation. Intel Corporation's Multicore Architecture Briefing. http://www.intel.com/pressroom/archive/releases/20080317fact.htm, Mar. 2008.
|
| |
14
|
|
| |
15
|
R. Joseph. Exploring salvage techniques for multi-core architectures. In Workshop on High Performance Computing Reliability Issues (HPCRI) 2005, Feb. 2005.
|
| |
16
|
A. Meixner and D. J. Sorin. Detouring: Translating software to circumvent hard faults in simple cores. In International Conference on Dependable Systems and Networks (DSN2008), pages 80--89, June 2008.
|
| |
17
|
M. D. Powell, A. Biswas, J. Emer, S. S. Mukherjee, B. R. Sheikh, and S. Yardi. CAMP: A technique to estimate per-structure power at run-time using a few simple parameters. In Fifteenth International Symposium on High Performance Computer Architecture (HPCA), Feb. 2009.
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
J. C. Smolens, B. T. Gold, J. C. Hoe, B. Falsafi, and K. Mai. Detecting emerging wearout faults. In Workshop on Silicon Errors in Logic - System Effects (SELSE-3), Apr. 2007.
|
 |
23
|
|
| |
24
|
The Standard Performance Evaluation Corporation. Spec CPU2000 suite. http://www.specbench.org/osg/cpu2000/.
|
| |
25
|
The Standard Performance Evaluation Corporation. Spec CPU2006 suite. http://www.specbench.org/osg/cpu2006/.
|
| |
26
|
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical report, HP Laboratories, Palo Alto, 2008.
|
| |
27
|
D. Weiss, J. J. Wuu, and V. Chin. The on-chip 3-MB subarray-based third-level cache on an Itanium microprocessor. IEEE Journal of Solid-State Circuits, 37(11):1523--1529, 2002.
|
|