ACM Home Page
Please provide us with feedback. Feedback
Configurable isolation: building high availability systems with commodity multi-core processors
Full text PdfPdf (458 KB)
Source
International Symposium on Computer Architecture archive
Proceedings of the 34th annual international symposium on Computer architecture table of contents
San Diego, California, USA
SESSION: Faults table of contents
Pages: 470 - 481  
Year of Publication: 2007
ISBN:978-1-59593-706-3
Also published in ...
Authors
Nidhi Aggarwal  University of Wisconsin-Madison, Madison, WI
Parthasarathy Ranganathan  Hewlett Packard Labs, Palo Alto, CA
Norman P. Jouppi  Hewlett Packard Labs, Palo Alto, CA
James E. Smith  University of Wisconsin-Madison, Madison, WI
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS : Computer Society
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 186,   Citation Count: 9
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1250662.1250720
What is a DOI?

ABSTRACT

High availability is an increasingly important requirement for enterprise systems, often valued more than performance. Systems designed for high availability typically use redundant hardware for error detection and continued uptime in the event of a failure. Chip multiprocessors with an abundance of identical resources like cores, cache and interconnection networks would appear to be ideal building blocks for implementing high availability solutions on chip. However, doing so poses significant challenges with respect to error containment and faulty component replacement. Increasing silicon and transient fault rates with future technology scaling exacerbate the problem. This paper proposes a novel, cost-effective, architecture for high availability systems built from future multi-core processors. We propose a new chip multiprocessor architecture that provides configurable isolation for fault containment and component retirement, based upon cost-effective modifications to commodity designs. The design is evaluated for a state-of-the-art industrial fault model and the proposed architecture is shown to provide effective fault isolation and graceful degradation even when the failure rate is high.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Albonesi, D.H. Selective Cache Ways: On-Demand Cache Resource Allocation. Journal of Instruction-Level Parallelism, Vol. 2, 2000.
 
2
 
3
Bartlett, W. and Ball, B. Tandem's Approach to Fault Tolerance. Tandem Systems Rev., vol. 4, no. 1, Feb. 1998, pp. 84--95.
 
4
 
5
 
6
7
 
8
 
9
Dell, T.J. A White paper on the benefit of chipkill-correct ECC for PC Server Main Memory, IBM white paper, http://www-03.ibm.com/servers/eserver/pseries/campaigns/chipkill.pdf.
 
10
Eagle Rock Alliance Ltd. Online survey results: 2001 cost of downtime. http://contingencyplanningresearch.com/2001.Survey.pdf, Aug. 2001.
 
11
 
12
 
13
Gold, B. T., Smolens, J. C., Falsafi, B. and Hoe, J. C. The Granularity of Soft-Error Containment in Shared Memory Multiprocessors, Proceedings of The Workshop on Silicon Errors in Logic-System Effects (SELSE), 2006.
14
 
15
 
16
Joseph, R. Exploring Core Salvage Techniques for Multi-core Architectures. Workshop on High Performance Computing Reliability Issues, 2005.
17
 
18
Nakano, J. et al. ReViveI/O: Efficient handling of I/O in highly-available rollback-recovery servers. In HPCA, 2006.
 
19
Qureshi, M. K. et al. Microarchitecture-based introspection: A technique for transientfault tolerance in microprocessors. In Proc. of 32nd Intl. Symp. on Comp. Arch. (ISCA-32), June 2005.
20
 
21
22
 
23
 
24
 
25
 
26
 
27
28
 
29
30
31
32
33
34
 
35
SPEC Benchmark Suite. http://www.spec.org and http://www.spec.org/cpu/analysis/memory/
 
36
International Technology Roadmap for Semiconductors. http://www.itrs.net/
 
37
Falcon, A. Faraboschi, P., and Ortega, D. Combining Simulation and Virtualization through Dynamic Sampling. ISPASS-2007.
 
38
Foxton Technology, http://www.intel.com/technology/magazine/computing/foxton-technology-0905.htm
39
 
40
 
41
Tendler, J. M., Dodson, J. S., Fields Jr., J. S., Le, H., and Sinharoy, B. IBM Power4 system microarchitecture. IBM Journal of Research and Development, 46(1):5--26, 2002.

CITED BY  9

Collaborative Colleagues:
Nidhi Aggarwal: colleagues
Parthasarathy Ranganathan: colleagues
Norman P. Jouppi: colleagues
James E. Smith: colleagues