| StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems |
| Full text |
Pdf
(795 KB)
|
Source
|
International Conference on Compilers, Architecture and Synthesis for Embedded Systems
archive
Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
table of contents
Atlanta, GA, USA
SESSION: Resiliency
table of contents
Pages 1-10
Year of Publication: 2008
ISBN:978-1-60558-469-0
|
|
Authors
|
|
Shantanu Gupta
|
University of Michigan, Ann Arbor, MI, USA
|
|
Shuguang Feng
|
University of Michigan, Ann Arbor, MI, USA
|
|
Amin Ansari
|
University of Michigan, Ann Arbor, MI, USA
|
|
Jason Blome
|
University of Michigan, Ann Arbor, MI, USA
|
|
Scott Mahlke
|
University of Michigan, Ann Arbor, MI, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 114, Citation Count: 2
|
|
|
ABSTRACT
Although CMOS feature size scaling has been the source of dramatic performance gains, it has lead to mounting reliability concerns due to increasing power densities and on-chip temperatures. Given that most wearout mechanisms that plague semiconductor devices are highly dependent on these parameters, significantly higher failure rates are projected for future technology generations. Traditional techniques for dealing with device failures have relied on coarse-grained redundancy to maintain service in the face of failed components. In this work, we challenge this practice by identifying its inability to scale to high failure rate scenarios and investigate the advantages of finer-grained configurations. We use this study to motivate the design of StageNet, an embedded CMP architecture designed from its inception with reliability as a first class design constraint. StageNet relies on a reconfigurable network of replicated processor pipeline stages to maximize the useful lifetime of the chip, gracefully degrading performance toward end of life. This paper addresses the microarchitecture of the basic building block of StageNet, named StageNetSlice, which is a processor core comprised of networked pipeline stages. A naive slice design results in approximately 4X slowdown verses a traditional processor due to longer communication delays in the pipeline. However, several small design changes that eliminate inter-stage communication paths and minimize communication bandwidth reduce this overhead to 11% on average while providing high levels of fine-grain adaptability.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Nidhi Aggarwal , Parthasarathy Ranganathan , Norman P. Jouppi , James E. Smith, Configurable isolation: building high availability systems with commodity multi-core processors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
2
|
ARM. Arm11. http://www.arm.com/products/CPUs/families/ARM11Family.html.
|
| |
3
|
ARM. Arm9. http://www.arm.com/products/CPUs/families/ARM9Family.html.
|
| |
4
|
J. S. S. T. Association. Failure mechanisms and models for semiconductor devices. Technical Report JEP122C, JEDEC Solid State Technology Association, Mar. 2006.
|
| |
5
|
|
| |
6
|
David Bernick , Bill Bruckert , Paul Del Vigna , David Garcia , Robert Jardine , Jim Klecka , Jim Smullen, NonStop® Advanced Architecture, Proceedings of the 2005 International Conference on Dependable Systems and Networks, p.12-21, June 28-July 01, 2005
[doi> 10.1109/DSN.2005.70]
|
| |
7
|
|
| |
8
|
|
| |
9
|
J. A. Blome, S. Feng, S. Gupta, and S. Mahlke. Online timing analysis for wearout detection. In Proc. of the 2nd Workshop on Architectural Reliability, pages 51--60, 2006.
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
A. Christou. Electromigration and Electronic Device Degradation. John Wiley and Sons, Inc., 1994.
|
 |
14
|
Nathan Clark , Amir Hormati , Scott Mahlke , Sami Yehia, Scalable subgraph mapping for acyclic computation accelerators, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
[doi> 10.1145/1176760.1176779]
|
| |
15
|
K. Constantinides et al. Bulletproof: A defect-tolerant CMP switch architecture. In Proc. of the 12th International Symposium on High-Performance Computer Architecture, pages 3--14, Feb. 2006.
|
| |
16
|
Kypros Constantinides , Onur Mutlu , Todd Austin , Valeria Bertacco, Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation, Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, p.97-108, December 01-05, 2007
[doi> 10.1109/MICRO.2007.39]
|
| |
17
|
D. Dumin. Oxide Reliability: A Summary of Silicon Oxide Wearout, Breakdown, and Reliability. World Scientific Publishing Co. Pte. Ltd., 2002.
|
| |
18
|
S. Gupta, S. Feng, J. Blome, and S. Mahlke. Stagenet: A reconfigurable cmp fabric for resilient systems. In Proc. of the 2nd Reconfigurable and Adaptive Architecture Workshop, 2007.
|
| |
19
|
V. Kathail, M. Schlansker, and B. Rau. HPL-PD architecture specification: Version 1.1. Technical Report HPL-93-80(R.1), Hewlett-Packard Laboratories, Feb. 2000.
|
| |
20
|
|
| |
21
|
M.-L. Li, P. Ramachandran, S. Sahoo, S. Adve, V. Adve, and Y. Zhou. Trace-based microarchitecture-level diagnosis of permanent hardware faults. In Proc. of the 2008 International Conference on Dependable Systems and Networks, June 2008.
|
| |
22
|
|
| |
23
|
OpenCores. OpenRISC 1200, 2006. http://www.opencores.org/projects.cgi/web/ or1k/openrisc_1200.
|
| |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
Jared C. Smolens , Jangwoo Kim , James C. Hoe , Babak Falsafi, Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.257-268, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.19]
|
| |
29
|
|
| |
30
|
L. Spainhower and T. Gregg. IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. IBM Journal of Research and Development, 43(6):863--873, 1999.
|
 |
31
|
Jayanth Srinivasan , Sarita V. Adve , Pradip Bose , Jude A. Rivers, The Case for Lifetime Reliability-Aware Microprocessors, Proceedings of the 31st annual international symposium on Computer architecture, p.276, June 19-23, 2004, München, Germany
|
| |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
Trimaran. An infrastructure for research in ILP, 2000. http://www.trimaran.org/.
|
 |
36
|
Manish Vachharajani , Neil Vachharajani , David A. Penry , Jason A. Blome , Sharad Malik , David I. August, The Liberty Simulation Environment: A deliberate approach to high-level system modeling, ACM Transactions on Computer Systems (TOCS), v.24 n.3, p.211-249, August 2006
[doi> 10.1145/1151690.1151691]
|
 |
37
|
|
| |
38
|
|
| |
39
|
E. Wu et al. Interplay of voltage and temperature acceleration of oxide breakdown for ultra-thin gate oxides. Solid-State Electronics, 46:1787--1798, 2002.
|
| |
40
|
|
|