ACM Home Page
Please provide us with feedback. Feedback
Failure-tolerant parallel programming and its supporting system architecture
Full text PdfPdf (1.49 MB)
Source AFIPS Joint Computer Conferences archive
Proceedings of the June 7-10, 1976, national computer conference and exposition table of contents
New York, New York
SESSION: Systems: computer systems table of contents
Pages 413-423  
Year of Publication: 1976
Authors
K. H. Kim  University of Southern California, Los Angeles, California
C. V. Ramamoorthy  University of California, Berkeley, California
Sponsor
AFIPS : American Federation of Information Processing Societies
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 10,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1499799.1499860
What is a DOI?

ABSTRACT

The state-of-art in software validation as well as the continuing growth of the size and complexity of software subsystems, makes extra costs paid for software error tolerance more than justified. A program in which software redundancy is incorporated i.e. a program in which procedures for run-time validation and recovery are explicitly specified, is generally called a failure-tolerant program. One problem in failure-tolerant programming, which could be particularly serious in real-time computing environments, is the program execution time increased due to incorporation of validation and recovery procedures. This paper introduces an approach to the solution, called the failure-tolerant parallel programming. The essence of this approach is to maximally overlap main-stream computation with redundant computation oriented for validation and recovery. Subsequently, a model system architecture tailored for efficient execution of failure-tolerant parallel programs is described. It is of highly general and modular nature and contains a novel memory subsystem named the duplex memory. Directions of further researches on program structuring and expansion of the model architecture are also indicated.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
Chandy, K. M. and C. V. Ramamoorthy, "Rollback and Recovery Strategies for Computer Programs," IEEE Trans, on Comp., February 1972, pp. 137--146.
 
5
Chandy, K. M. et al., "Analytic Models for Rollback and Recovery Strategies in Data Base Systems," IEEE Trans, on Software Engr., March 1975, pp. 100--110.
 
6
 
7
Chang, H. Y. et al., Fault Diagnosis of Digital Systems, Wiley-Interscience, 1970.
 
8
Connet, J. R. et al., "Software Defenses in Real-Time Control Systems," Digest of the 1972 Int'l Symp. on Fault-Tolerant Computing, pp. 94--99.
9
 
10
Dijkstra, E. W., "Structured Programming," in J. N. Buxton and B. Randell (eds.), Software Engineering Techniques, report on a Conf. sponsored by the NATO Science Committee, Rome, Italy, 1969, pp. 84--88.
 
11
Elemendorf, W. R., "Fault-Tolerant Programming," Digest of the 1972 Int'l Symp. on Fault-Tolerant Computing, pp. 79--83.
12
 
13
Hetzel, W. C. (ed), Program Test Methods, Prentice-Hall, 1973.
 
14
 
15
Kennedy, P. J. and T. M. Quinn, "Recovery Strategies in the No. 2 Electronic Switching System," Digest of the 1972 Int'l Symp. on Fault-Tolerant Computing, pp. 165--169.
16
 
17
Kopetz, H., "Software Redundancy in Real-Time Systems," Proc. IFIP Congress 1974, pp. 182--186.
 
18
Pierce, W. H., Failure-Tolerant Computer Design, Academic Press, 1965.
19
 
20
 
21
Ramamoorthy, C. V. and K. H. Kim, "Software Monitors Aiding Systematic Testing and their Optimal Placement," Proc. 1st Nat'l Conf. on Software Engr., pp. 21--26.
 
22
Randell, B., "System Structure for Software Fault Tolerance," IEEE Trans. on Software Engr., June 1975, pp. 220--232.
 
23
Rohr, J. A., "STAREX-Self-Repair Routines: Software Recovery in the JPL-STAR Computer," Digest of the 1973 Int'l Symp. on Fault-Tolerant Computing, pp. 11--16.
 
24
 
25
Short, R. A., "The Attainment of Reliable Digital Systems Through the Use of Redundancy---A Survey," IEEE Comp. Group News, March 1968, pp. 2--17.
 
26
Von Neumann, J., "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components," in Automata Studies, Annals of Math. No. 34, Princeton Univ. Press, 1956, pp. 43--98.
27

Collaborative Colleagues:
K. H. Kim: colleagues
C. V. Ramamoorthy: colleagues