ACM Home Page
Please provide us with feedback. Feedback
End-to-end register data-flow continuous self-test
Full text PdfPdf (512 KB)
Source
International Symposium on Computer Architecture archive
Proceedings of the 36th annual international symposium on Computer architecture table of contents
Austin, TX, USA
SESSION: Reliability and fault tolerance table of contents
Pages 105-115  
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
Authors
Javier Carretero  Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Pedro Chaparro  Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Xavier Vera  Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Jaume Abella  Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Antonio González  Intel Barcelona Research Center, Intel Labs - UPC, Barcelona, Spain
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 69,   Downloads (12 Months): 187,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1555754.1555770
What is a DOI?

ABSTRACT

While Moore's Law predicts the ability of semi-conductor industry to engineer smaller and more efficient transistors and circuits, there are serious issues not contemplated in that law. One concern is the verification effort of modern computing systems, which has grown to dominate the cost of system design. On the other hand, technology scaling leads to burn-in phase out. As a result, in-the-field error rate may increase due to both actual errors and latent defects. Whereas data can be protected with arithmetic codes (like parity or ECC), there is a lack of cost-effective mechanisms for control logic.

This paper presents a light-weight microarchitectural mechanism that ensures that data consumed through registers are correct. Microarchitecture presents a new way to manage reliability and testing without significantly sacrificing cost and performance, offering a unique opportunity to detect errors in the field at low cost. Our results show a coverage around 90% for the targeted structures with a cost in power and area of about 4%. The structures protected include the issue queue logic and the data associated (i.e., tags, control signals), input multiplexors, rename data, replay logic, register free list, bypasses data and logic, MOB data and addresses, register file logic, register file storage and functional units.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
M. Agostinelli, J. Hicks, J. Xu, B. Woolery, K. Mistry, K. Zhang, S. Jacobs, J. Jopling, W. Yang, B. Lee, T. Raz, M. Mehalel, P. Kolar, Y. Wang, J. Sandford, D. Pivin, C. Peterson, M. DiBattista, S. Pae, M. Jones, S. Johnson, and G. Subramanian. Erratic fluctuations of SRAM cache vmin at the 90nm process technology node. In Technical digest of IEEE International Electron Devices Meeting (IEDM), pages 655--658, December 2005.
3
 
4
 
5
 
6
T. Barnett, A. Singh, and V. Nelson. Extending integrated-circuit yield-models to estimate early-life reliability. IEEE Transactions on Reliability, 52(3):296--300, Sept. 2003.
7
8
 
9
G. Hinton, D. Sager, M. Upton, D. Bogs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the Pentium 4 processor. Intel Technology Journal, 5(1):13, Feb. 2001.
 
10
 
11
S. Kumar and A. Aggarwal. Reducing resource redundancy for concurrent error detection techniques in high performance microprocessors. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), 2006.
 
12
 
13
G. Langdon and C. Tang. Concurrent error detection for group look-ahead binary adders. IBM Journal of Research and Development, 14(5):563--573, 1970.
 
14
 
15
 
16
 
17
 
18
 
19
P. Monteiro and T. Rao. A residue checker for arithmetic and logical operations. In 2nd Fault Tolerant Computing Symposium, 1972.
 
20
M. Mueller, L. C. Alves, W. Fischer, M. L. Fair, and I. Modi. RAS strategy for IBM S/390 G5 and G6. IBM Journal of Research and Development, 43(5):875--888, 1999.
21
 
22
 
23
 
24
 
25
 
26
 
27
V. Reddy, A. Al-Zawawi, and E. Rotenberg. Assertion-based microarchitecture design for improved fault tolerance. In Proceedings of International Conference on Computer Design (ICCD), pages 362--369, 2007.
 
28
K. Reick, P. Sanda, S. Swaney, J. Kellington, M. Floyd, and D. Henderson. Fault-tolerant design of the IBM Power6\TMark microprocessor. In Proceedings of the Hot Chips 19 Symposium, 2007.
 
29
30
 
31
32
 
33
Sih and Reinheimer. Checking logical operations by residues. IBM Technical Disclosure Bulletin, 15(7):2325--2327, 1972.
 
34
J. Smolens, B. Gold, J. Hoe, B. Falsafi, and K. Mai. Detecting emerging wearout faults. In Proceedings of the 3rd Workshop on Silicon Errors in Logic - System Effects (SELSE), 2007.
 
35
 
36
L. Spainhower and T. Gregg. IBM S/390 parallel enterprise server G5 fault tolerance: a historical perspective. IBM Journal of Research and Development, 43(5/6):863--873, 1999.
 
37
 
38
SPECCPU 2000. SPEC Newsletter, Sept. 2000.
 
39
K. Sundaramoorthy, Z. Purser, and E. Rotenberg. Slipstream processors: improving both performance and fault tolerance. In Proceedings of the 33th International Symposium on Microarchitecture (MICRO), 2000.
40
 
41

Collaborative Colleagues:
Javier Carretero: colleagues
Pedro Chaparro: colleagues
Xavier Vera: colleagues
Jaume Abella: colleagues
Antonio González: colleagues