ACM Home Page
Please provide us with feedback. Feedback
Performance analysis of a fault tolerant computer system
Full text PdfPdf (223 KB)
Source Joint International Conference on Measurement and Modeling of Computer Systems archive
Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems table of contents
Univ. of Colorado, Boulder, Colorado, United States
Pages: 249 - 250  
Year of Publication: 1990
ISBN:0-89791-359-0
Also published in ...
Author
Lionel C. Mitchell  CTA Incorporated, 6116 Executive Boulevard, Rockville, Maryland
Sponsor
SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 28,   Citation Count: 0
Additional Information:

abstract   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/98457.98769
What is a DOI?

ABSTRACT

This paper presents the description of an analytical queueing network model of a Tandem computer system in the FAA Remote Maintenance Monitoring environment and a performance analysis of the Maintenance Processor Subsystem for the 1990s time frame. The approach was to use measurement data to quantify application service demands and performance contributions of the fault-tolerant software in the Tandem environment in an analytical queueing network model. Sensitivity analyses were conducted using the model to examine alternative configurations, workload growth, and system overhead among others. The model framework and performance analysis methodology can be used for capacity planning purposes during the operational phase of the system. The Federal Aviation Administration (FAA) is responsible for the many critical functions of the National Airspace System (NAS). Many of these functions have very high availability requirements. One such function is Remote Maintenance Monitoring (RMM). The FAA has implemented prototype versions of portions of this system on the Tandem fault tolerant computer architecture. The Maintenance Processor Subsystem (MPS) implements monitor/control and management information functions within FAA's Remote Maintenance Monitoring System (RMMS). MPSs are located at 23 Air Route Traffic Control Centers (ARTCC) and various other FAA sites. These computers remotely monitor and control sensors. The RMMS components are in various stages of development. The MPS currently consists of a multi-processor Tandem configuration with initial versions of the monitor/control and management information software. Only a small number of remote sensors are currently monitored via point-to-point communication links. The performance evaluation of the FAA's MPS involved the following steps: assess the functional and performance requirements; develop and validate a baseline model of the MPS prototype Tandem system; modify the baseline model to represent future MPS configuration and transaction requirements; and evaluate predicted performance. The functional and performance requirements of the MPS were determined primarily from FAA documentation and personnel. Performance data from a prototype MPS site at the Memphis ARTCC, collected by the Tandem XRAY monitor, were used to quantify model priority, service demand and workload intensity parameters, and to validate the baseline model using response time and utilization metrics. Configuration specification on the Memphis node was also collected for the use in the model. The model was developed using the CTA queueing network package Performance Analysis Tool Box (PATB). The model of the Tandem computer represents the non-stop processing operation implemented by Tandem's Transaction Monitoring Facility (TMF) and the mirrored disk writing operation. In addition, the model represents the GUARDIAN operating system priority scheduler, CPU burst size, interrupt processing, and memory swapping. The basic modeling approach was to use measurement data to represent the complex fault tolerant activities in an analytical queueing network model. A model of Memphis MPS node was developed to serve as a baseline for examining the performance of future ARTCC MPS configurations. The model was developed using the PATB queueing network tool which implements a Linearizer mean value analysis algorithm. The MPS functional and performance requirements and the XRAY measurement data were used to define the software, communication, and workload characteristics of the model. The XRAY measurement data and configuration information on the Memphis MPS node and Tandem information were used to define the hardware and system software characteristics and to quantify the processing and I/O service demands for the application and system software. The basic components of the PATB model are: CPU, disk, and communication link hardware components; the application and system software program elements including the fault tolerant functions; and the application and overhead workload, or transaction, flows. The local terminals were implicitly represented as the source of the transactions. The Remote Monitoring Subsystem (RMS) sensor devices were represented as transaction sources and sinks. The interprocessor bus, the device controllers and the I/O bus were not included in the model. Their contribution to performance was judged to be insignificant based on examination of measurement data. The fault-tolerant check-point functionality of Tandem's Transaction Monitoring Facility was represented by including the TMF processing and I/O activities as serial delays on the transaction flows for application workloads. The mirrored disk writing was reflected in the I/O service demand data from XRAY and did not require any further model representation. Memory contention was modeled in a separate PATB model. Both models assume a normal operational scenario (i.e., failure modes are not modeled). The baseline performance model was validated using the XRAY data from the Memphis MPS site. The primary performance metric used in the model validation was average terminal response time. Model response time was within 15 percent of measured response time. One parameter examined in the validation exercise was CPU burst size. Using average burst size instead of the operating system maximum provided better agreement of model results with measured results. The MPS baseline model was modified to represent different possible MPS configurations for the 1990s. The changes in the model reflected additional and faster CPU, disk and communication servers and modification of software CPU residency and workload flows. Various alternatives were examined for hardware and software configuration, number of sensor devices monitored, terminal transaction load, and system overhead and application software service demands. In addition to the detailed model of the application and system software a flow-equivalent queueing network model was developed, using PATB, to examine the impact of memory queueing for the proposed configuration. The model was developed to examine the impact of: the operating system policy of “cloning” processes subject to queue length threshold; additional application software functions not yet implemented; uncertainty of expected transaction rate; and additional system software storage requirements. The results of the analysis are being used by the FAA to define the MPS performance requirements for the 1995 time frame. The MPS model may be used in the future for capacity planning and performance optimization exercises for different MPS field configurations.