ACM Home Page
Please provide us with feedback. Feedback
Analysis of a composite performance reliability measure for fault-tolerant systems
Full text PdfPdf (1.43 MB)
Source Journal of the ACM (JACM) archive
Volume 34 ,  Issue 1  (January 1987) table of contents
Pages: 179 - 199  
Year of Publication: 1987
ISSN:0004-5411
Authors
Lorenzo Donatiello  IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Balakrishna R. Iyer  IBM Thomas J. Watson Research Center, Yorktown Heights, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 30,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/7531.7536
What is a DOI?

ABSTRACT

Today's concomitant needs for higher computing power and reliability has increased the relevance of multiple-processor fault-tolerant systems. Multiple functional units improve the raw performance (throughput, response time, etc.) of the system, and, as units fail, the system may continue to function albeit with degraded performance. Such systems and other fault-tolerant systems are not adequately characterized by separate performance and reliability measures. A composite measure for the performance and reliability of a fault-tolerant system observed over a finite mission time is analyzed. A Markov chain model is used for system state-space representation, and transient analysis is performed to obtain closed-form solutions for the density and moments of the composite measure. Only failures that cannot be repaired until the end of the mission are modeled. The time spent in a specific system configuration is assumed to be large enough to permit the use of a hierarchical model and static measures to quantify the performance of the system in individual configurations. For a multiple-processor system, where performance measures are usually associated with and aggregated over many jobs, this is tantamount to assuming that the time to process a job is much smaller than the time between failures. An extension of the results to general acyclic Markov chain models is included.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
BARLOW, R. E., AND PROSCHAN, F. Statistical Theory of Reliability and Life Testing Probability Models. Holt, Reinhart, and Winston, Silver Spring, Md., 1981.
4
5
 
6
 
7
COURTOlS, P.J. Decomposability." Queueing and Computer System Applications. Academic Press, Orlando, Fla., 1977.
 
8
 
9
DONATIELLO, L., AND IYER, B.R. Closed-form solution for system availability distribution. IBM Res. Rep. RC I1169. IBM Thomas J. Watson Research Center, Yorktown Heights, N.Y., May 1985.
 
10
F'ERRARI, D. Computer Systems, Performance Evaluation. Prentice-Hall, Englewood Cliffs, N.J., 1978.
 
11
FURCHGOTT, D. G., AND MEYER, J.F. A solution method for degradable non-repairable systems. IEEE Trans. Comp. C-33, 6 (June 1984), 550-554.
 
12
GELENBE, E., AND MITRANi, I. Analysis and Synthesis of Computer Systems. Academic Press, London, 1980.
 
13
GOYAL, A., AND TANTAWI, A. N. Evaluation of performability in acyclic markov chain. IBM Res. Rep. RC 10529. IBM Thomas J. Watson Research Center, Yorktown Heights, N.Y., May 1984.
 
14
HOWARD, R.A. Dynamic Probabilistic Systems. Vol. II, Semimarkov and Decision. Wiley, New York, N.Y., 1971.
15
 
16
IYER, B.R. Recent results in performability analysis. In Current Advances in Distributed Computing and Communications, Y. Yemini, Ed. Computer Science Press, Rockville, Md., 1987, pp. 50-64.
 
17
 
18
 
19
IYER, B. R., Yu, P. S., AND DONATIELLO, L. Analysis of fault-tolerant muitiprocessor architectures for lock engine design. IBM Res. Rep. RC 11314. IBM Thomas J. Watson Research Center, Yorktown Heights, N.Y., Aug. 1985.
 
20
KULKARNI, V. G., NICOLA, V. F., AND TRIVEDI, K.S. On modelling the performance and reliability of multi-mode computer systems. J. Syst. Sofiw. 6, 1-2 (May 1986), 175-182.
 
21
 
22
 
23
MEYER, J. F. On evaluating performability of degradable computing systems. IEEE Trans. Comput. C-29 (1980), 720-731.
 
24
MEYER, J.F. Closed-form solutions of performability. IEEE Trans. Comput., C-31, 7 (July 1982), 648-657.
 
25
PURl, P.S. A method for studying the integral functionals of stochastic processes with applications: I. Markov chain case. J. Appl. Prob. 8 (1971), 331-343.
26
 
27
SAUER, C. H., AND CHANDY, K.M. Computer System Performance Modeling. Prentice-Hall, Englewood Cliffs, N.J., 198 i.
 
28
29
 
30
SERLIN, O. Fault-tolerant systems in commercial applications. Computer C-I 7, 8 (1984), 19-30.
 
31
SIEWIOREK, D.P. Architecture of fault-tolerant computers. Computer C-I 7, 8 (1984), 9-18.
 
32
SIEWIOREK, D. P., AND SWARZ, R.S. The Theory and Practice of Reliable Systems Design. Digital Press, Bedford, Mass., 1982.
 
33
SM,TH, W. L. Renewal theory and its ramifications. J. Roy. Stat. Soc. Set. B. 20 (1958), 243-302.
 
34
 
35
TOY, W. N., AND GALLAHER, L.E. Overview and architecture of the 3B20D processor. Bell Syst. Tech. J. 62, 1, Part 2, (Jan. 1982), 118-190.
36


Collaborative Colleagues:
Lorenzo Donatiello: colleagues
Balakrishna R. Iyer: colleagues