|
ABSTRACT
Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data generated when monitoring a large number of application processes, (2) communication between a large number of tool components, and (3) presentation of performance data and analysis results for applications with a large number of processes. In this paper, we present a novel approach for finding performance problems in applications with a large number of processes that leverages our multicast and data aggregation infrastructure to address these three performance tool scalability barriers.First, we show how to design a scalable, distributed performance diagnosis facility. We demonstrate this design with an on-line, automated strategy for finding performance bottlenecks. Our strategy uses distributed, independent bottleneck search agents located in the tool agent processes that monitor running application processes. Second, we present a technique for constructing compact displays of the results of our bottleneck detection strategy. This technique, called the Sub-Graph Folding Algorithm, presents bottleneck search results using dynamic graphs that record the refinement of a bottleneck search. The complexity of the results graph is controlled by combining sub-graphs showing similar local application behavior into a composite sub-graph.Using an approach that combines these two synergistic parts, we performed bottleneck searches on programs with up to 1024 processes with no sign of tool resource saturation. With 1024 application processes, our visualization technique reduced a search results graph containing over 30,000 nodes to a single composite 44-node graph sub-graph showing the same qualitative performance information as the original graph.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
APART Working Group on Automatic Performance Analysis: Resources and Tools. http://www.gz-juelich.de/apart/, April 2004.
|
 |
2
|
Remzi H. Arpaci , Andrea C. Dusseau , Amin M. Vahdat , Lok T. Liu , Thomas E. Anderson , David A. Patterson, The interaction of parallel and sequential workloads on a network of workstations, Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.267-278, May 15-19, 1995, Ottawa, Ontario, Canada
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
B.A. Coan and J.L. Welch. Modular Construction of an Efficient 1-Bit Byzantine Agreement Protocol. Mathematical Sys. Theory 26, 1, 1993.
|
| |
7
|
|
| |
8
|
|
| |
9
|
J.E. Garlick and C.M. Dunlap. Building CHAOS: an Operating Environment for Livermore Linux Clusters. Lawrence Livermore National Laboratory Technical Report UCRL-ID-151968, February 2002.
|
| |
10
|
|
| |
11
|
W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Argonne National Laboratory Report MCS-P567-0296, February 1996.
|
| |
12
|
W. Gu, G. Eisenhauer, K. Schwan, and J. Vetter. Falcon: On-line Monitoring for Steering Parallel Programs. Concurrency: Practice and Experience 10, 9, August 1998.
|
 |
13
|
|
| |
14
|
|
| |
15
|
J.K. Hollingsworth, B.P. Miller, and J. Cargille. Dynamic Program Instrumentation for Scalable Parallel Tools. 1994 Scalable High Performance Computing Conf. (SHPCC '94), Knoxville, Tennessee, pp. 841--850, May 1994.
|
| |
16
|
K.L. Karavanic and B.P. Miller. Experiment Management Support for Performance Tuning. SC99, Portland, Oregon, November 1999.
|
| |
17
|
|
 |
18
|
|
| |
19
|
Lawrence Livermore National Laboratory. M&IC Capability Cluster. http://www.llnl.gov/linux/mcr/, April 2005.
|
 |
20
|
|
 |
21
|
|
| |
22
|
Barton P. Miller , Mark D. Callaghan , Jonathan M. Cargille , Jeffrey K. Hollingsworth , R. Bruce Irvin , Karen L. Karavanic , Krishna Kunchithapadam , Tia Newhall, The Paradyn Parallel Performance Measurement Tool, Computer, v.28 n.11, p.37-46, November 1995
[doi> 10.1109/2.471178]
|
| |
23
|
The MIMD Lattice Computation (MILC) Collaboration. http://physics.indiana.edu/~sg/milc.html, April 2005.
|
| |
24
|
B. Mohr and F. Wolf. KOJAK: A Tool Set for Automatic Performance Analysis of Parallel Applications. Ninth Intl. Euro-Par Conf. (Euro-Par 2003), Klagenfurt, Austria, August 2003. Published as Lecture Notes in Computer Science 2790, H. Kosch, L. Böszörményi, and H. Hellwagner (Eds.), Springer-Verlag, Heidelberg.
|
| |
25
|
A. Morajko. Dynamic Tuning of Parallel/Distributed Applications. Doctoral dissertation, Universitat Autonoma de Barcelona, Spain, December 2003.
|
| |
26
|
|
| |
27
|
|
| |
28
|
Steve Sistare , Don Allen , Rich Bowker , Karen Jourdenais , Josh Simons , Rich Title, A Scalable Debugger for Massively Parallel Message-Passing Programs, IEEE Parallel & Distributed Technology: Systems & Technology, v.2 n.2, p.50-56, June 1994
[doi> 10.1109/88.311572]
|
| |
29
|
|
| |
30
|
C. Tapus, I-H. Chung, and J.K. Hollingsworth. Active Harmony: Towards Automated Performance Tuning. SC 2002, Baltimore, Maryland, November 2002.
|
| |
31
|
H.-L. Truong and T. Fahringer. SCALEA: a Performance Analysis Tool for Parallel Programs. Concurrency and Computation: Practice and Experience 15, 11-12, Sept. 2003.
|
| |
32
|
J.C. Yan and S. Listgarten. Intrusion Compensation for Performance Evaluation of Parallel Programs on a Multicomputer. Sixth Intl. Conf. on Parallel and Distributed Computing Systems (ISCA 1993), Louisville, Kentucky, October 1993.
|
CITED BY 5
|
|
|
|
|
Bradley J. Barnes , Barry Rountree , David K. Lowenthal , Jaxk Reeves , Bronis de Supinski , Martin Schulz, A regression-based approach to scalability prediction, Proceedings of the 22nd annual international conference on Supercomputing, June 07-12, 2008, Island of Kos, Greece
|
|
|
|
|
|
|
|
|
|
|