|
ABSTRACT
The monitoring of distributed systems involves the collection, interpretation, and display of information concerning the interactions among concurrently executing processes. This information and its display can support the debugging, testing, performance evaluation, and dynamic documentation of distributed systems. General problems associated with monitoring are outlined in this paper, and the architecture of a general purpose, extensible, distributed monitoring system is presented. Three approaches to the display of process interactions are described: textual traces, animated graphical traces, and a combination of aspects of the textual and graphical approaches. The roles that each of these approaches fulfill in monitoring and debugging distributed systems are identified and compared. Monitoring tools for collecting communication statistics, detecting deadlock, controlling the non-deterministic execution of distributed systems, and for using protocol specifications in monitoring are also described.
Our discussion is based on experience in the development and use of a monitoring system within a distributed programming environment called Jade. Jade was developed within the Computer Science Department of the University of Calgary and is now being used to support teaching and research at a number of university and research organizations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
BALZER, R.M. EXDAMS--EXtendable debugging and monitoring system. In Proceedings of AFIPS Spring Joint Computer Conference. AFIPS Press, Reston, Va., 1969, 567-580.
|
| |
3
|
BATES, P., AND WILEDEN, J.C. An approach to high-level debugging of distributed systems. Softw. Eng. Not. 8, 4 (Aug. 1983), 107.
|
| |
4
|
BIRTWISTLE, G. M., WYVILL, B. L. M., LEVINSON, D., AND NEAL, R. Visualizing a simulation using animated pictures. In Proceedings of SCS Conference on Simulation in Strongly Typed Languages (San Diego, Calif., Feb. 2-4, 1984). Society for Computer Simulation, San Diego, 1984, 57-61.
|
| |
5
|
BROWN, M. H., AND SEDGEWICK, R. Techniques for algorithm animation. IEEE Softw. 2, 1 (Jan. 1985), 28.
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
DEWAR, A. A graphical debugger for prolog. Master's thesis, Department of Computer Science, University of Calgary, Calgary, Alberta, Canada. (1985).
|
| |
10
|
DEWAR, A., AND UNGER, B. Graphical tracing and debugging of simulations. In Proceedings of SCS Conference on Simulation in Strongly Typed Languages (San Diego, Calif., Feb. 2-4, 1984). Society for Computer Simulation, San Diego, 1984, 68-76.
|
| |
11
|
GARCIA-MOLINA, H., GERMANO, F., AND KOHLER, W.H. Debugging a distributed computing system. IEEE Trans. Softw. Eng. 10, 2 (Mar. 1984), 210.
|
| |
12
|
HARRISON, M.D. Monitoring a target network to support subsequent host simulation. Res. Rep., Dept. of Computer Science, University of York, Toronto, Ontario, Canada (1984).
|
| |
13
|
JOYCE, J. J., BIRTWISTLE, G. M., AND WYVILL, B. L.M. ANDES--an environment for animated discrete event simulation. In Proceedings of United Kingdom Simulation Conference (Bath, U.K., May 1984). United Kingdom Simulation Council, 1984.
|
| |
14
|
JOYCE, J. J., AND UNCER, B.W. Graphical monitoring of distributed systems. In Proceedings of the SCS Conference on AI, Graphics, and Simulation (San Diego, Calif., Jan. 1985). Society for Computer Simulation, San Diego, Calif., 1985, 85-92.
|
 |
15
|
|
| |
16
|
LOMOW, G. A., AND UNGER, B.W. Distributed software prototyping and simulation in Jade. Can. J. Oper. Res. Inf. Process. 23, 1 (Feb. 1985), 69-89.
|
 |
17
|
Jeffrey W. Mincy , Alan L. Tharp , Kuo-Chung Tai, Visualizing algorithms and processes with the aid of a computer, Proceedings of the fourteenth SIGCSE technical symposium on Computer science education, p.106-111, February 17-18, 1983, Orlando, Florida, United States
|
| |
18
|
|
 |
19
|
|
| |
20
|
NEAL, R., LOMOW, G. A., PETERSON, M., UNGER, B. W., AND WITTEN, I.H. Experience with an interprocess communication protocol in a distributed programming environment. In Proceedings of CIPS Session '84 Conference (Calgary, Alberta, Canada, May 9-11, 1984). Canadian Information Processing Society, Calgary, 1984, 361-364.
|
| |
21
|
NOUNOU, N., AND YEMINI, Y. Development tools for communication protocols. Res. Rep. CUCS-160-85, Department of Computer Science, Columbia University, New York, N.Y. (Feb. 1985).
|
 |
22
|
|
| |
23
|
SNOI)(:~ASS, R.T. Monitoring distributed systems: A relational approach. Ph.D. dissertation, l)epar|ment of Computer Science, Carnegie-Mellon University, Pittsburgh, Pa. (1982).
|
| |
24
|
SOFTWARE RESEARCH AND DEVEVLOPMENT GROUP. Jade User's Manual (4 vols). Res. Rep., University of Calgary, Dept. of Computer Science, Calgary, Alberta, Canada. (Oct. 1985).
|
| |
25
|
TEITELMAN, W., AND MASINTER, L. The interlisp programming environment. IEEE Comput. 14, 4 (Apr. 1981).
|
| |
26
|
UNGER, B. W., AND BIDULOCK, D.S. The design and simulation of a multicomputer network message processor. Comput. Networks 6, 4 (Sept. 1982) 263-277.
|
| |
27
|
UNGER, B. W., BIRTWISTLE, G. M., CLEARY, J. G., AND DEWAR, A. A distributed software prototyping and simulation environment: Jade. In Proceedings of SCS Conference on Intelligent Simulation Environments (San Diego, Calif., Jan. 23-25, 1986). Society for Computer Simulation, San Diego, 1986, 63-71.
|
| |
28
|
UNGER, B., CLEARY, J., LOMOW, G., L{, X., SLIND, K., AND XIAO, Z. Jade virtual time implementation manual. Res. Rep. 86/242/16, Department of Computer Science, University of Calgary, Calgary, Alberta, Canada (Oct. 1986).
|
| |
29
|
VAUCHER, J. Future directions in simulation software. In Proceedings of SCS Conference on Simulation in Strongly Typed Languages (San Diego, Calif., Feb. 1984).
|
| |
30
|
WYVILL, B. L. M., NEAL, R., LEVINSON, D., AND BRAMWELL, B. JAGGIES: A distributed hierarchical graphics system. In Proceedings of CIPS Session '84 Conference (Calgary, Alberta, May 9-11, 1984). Canadian Information Processing Society, Calgary, 1984, 214-217.
|
CITED BY 41
|
|
|
|
|
David B. Cavitt , C. Michael Overstreet , Kurt J. Maly, A performance monitoring application for distributed interactive simulations (DIS), Proceedings of the 29th conference on Winter simulation, p.421-428, December 07-10, 1997, Atlanta, Georgia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chang-Hyun Jo , Phil Sun Kim , Hyeung Sik Im , Eui Hyun Paik , Byung Sun Lee, A design and prototyping of an object-oriented program debugger, Proceedings of the 1997 ACM symposium on Applied computing, p.45-51, April 1997, San Jose, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sivakumar Ravada , E. K. Park , Kia Makki, Automatic detection of errors in distributed systems, Proceedings of the 1995 ACM 23rd annual conference on Computer science, p.30-35, February 28-March 02, 1995, Nashville, Tennessee, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hasina Abdu , Hanan Lutfiyya , Michael A. Bauer, Investigating monitoring configurations, Proceedings of the 1996 ACM symposium on Applied Computing, p.366-373, February 17-19, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
Hasina Abdu , Hanan L. Lutfiyya , Michael A. Bauer, An investigation of monitoring configurations, Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research, p.1, November 07-09, 1995, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R. Hofmann , R. Klar , B. Mohr , A. Quick , M. Siegle, Distributed Performance Monitoring: Methods, Tools, and Applications, IEEE Transactions on Parallel and Distributed Systems, v.5 n.6, p.585-598, June 1994
|
|
|
Hasina Abdu , Hanan Lutfiyya , Michael A. Bauer, Monitoring overhead in distributed systems: visualization and estimation techniques, Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research, p.1, November 12-14, 1996, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Paul Barham , Rebecca Isaacs , Richard Mortier , Dushyanth Narayanan, Magpie: online modelling and performance-aware systems, Proceedings of the 9th conference on Hot Topics in Operating Systems, p.15-15, May 18-21, 2003, Lihue, Hawaii
|
|
|
Paul Barham , Austin Donnelly , Rebecca Isaacs , Richard Mortier, Using magpie for request extraction and workload modelling, Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, p.18-18, December 06-08, 2004, San Francisco, CA
|
|
|
|
|
|
|
|
|
|
REVIEW
"Richard Thomas Snodgrass : Reviewer"
Monitoring involves the collection, analysis, and presentation of dynamic
information concerning a computational process. A distributed system presents
difficulties in all three areas.
This paper outlines several approaches to the ana
more...
|