|
ABSTRACT
Run-time tools are crucial to program development. In our desktop computer environments, we take for granted the availability of tools for operations such as debugging, profiling, tracing, checkpointing, and visualization. When programs move into distributed or Grid environments, it is difficult to find such tools. This difficulty is caused by the complex interactions necessary between application program, operating system and layers of job scheduling and process management software. As a result, each run-time tool must be individually ported to run under a particular job management system; for m tools and n environments, the problem becomes an m \times n effort, rather than the hoped-for m + n effort. Variations in underlying operating systems can make this problem even worse. The consequence of this situation is a paucity of tools in distributed and Grid computing environments. In response to the problem, we have analyzed a variety of job scheduling environments and run-time tools to better understand their interactions. From this analysis, we isolated what we believe are the essential interactions between the run-time tool, job scheduler and resource manager, and application program. We are proposing a standard interface, called the Tool Dæmon Protocol (TDP) that codifies these interactions and provides the necessary communication functions. We have implemented a pilot TDP library and experimented with Parador, a prototype using the Paradyn Parallel Performance tools profiling jobs running under the Condor batch-scheduling environment.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
[4] Cray Computer Inc., "NQE Users Guide", Version 3.2, January 1997.
|
| |
5
|
[5] Etnus LLC, "TotalView User's Guide", Document version 6.0.0-1, January 2003. ¿http://www.etnus.com¿
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
[9] IBM Corporation, "Load Leveler Users Guide", Version 1.2. 1995.
|
| |
10
|
|
 |
11
|
|
| |
12
|
Barton P. Miller , Mark D. Callaghan , Jonathan M. Cargille , Jeffrey K. Hollingsworth , R. Bruce Irvin , Karen L. Karavanic , Krishna Kunchithapadam , Tia Newhall, The Paradyn Parallel Performance Measurement Tool, Computer, v.28 n.11, p.37-46, November 1995
[doi> 10.1109/2.471178]
|
| |
13
|
[13] M.J. Mutka, M. Livny, and M.W. Litzkow, "Condor - A Hunter of Idle Workstations", 8th Int'l Conf. on Distributed Systems, San Francisco, Calif., June 1988.
|
| |
14
|
|
| |
15
|
[15] Platform Computing Inc, "LSF Users Guide".
|
| |
16
|
|
|