ACM Home Page
Please provide us with feedback. Feedback
Dynamic statistical profiling of communication activity in distributed applications
Full text PdfPdf (1.65 MB)
Source Joint International Conference on Measurement and Modeling of Computer Systems archive
Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems table of contents
Marina Del Rey, California
SESSION: Distributed systems table of contents
Pages: 240 - 250  
Year of Publication: 2002
ISBN:1-58113-531-9
Also published in ...
Author
Jeffrey Vetter  Lawrence Livermore National Laboratory, Livermore, CA
Sponsor
SIGMETRICS: ACM Special Interest Group on Measurement and Evaluation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 31,   Citation Count: 5
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/511334.511364
What is a DOI?

ABSTRACT

Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using message sampling. We have implemented an operational prototype, named PHOTON, and our evidence shows that this new approach can provide an accurate, low-overhead, tractable alternative for performance analysis of communication activity. PHOTON consists of two components: a Message Passing Interface (MPI) profiling layer that implements sampling and analysis, and a modified MPI runtime that appends a small but necessary amount of information to individual messages. More importantly, this alternative enables an assortment of runtime analysis techniques so that, in contrast to post-mortem, trace-based techniques, the raw performance data can be jettisoned immediately after analysis. Our investigation shows that message sampling can reduce overhead to imperceptible levels for many applications. Experiments on several applications demonstrate the viability of this approach. For example, with one application, our technique reduced the analysis overhead from 154% for traditional tracing to 6% for statistical profiling. We also evaluate different sampling techniques in this framework. The coverage of the sample space provided by purely random sampling is superior to counter- and timer-based sampling. Also, PHOTON's design reveals that frugal modifications to the MPI runtime system could facilitate such techniques on production computing systems, and it suggests that this sampling technique could execute continuously for long-running applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
 
6
7
 
8
G. A. Geist, M. T. Heath et al., "A Users' Guide to PICL - A Portable Instrumented Communication Library," Oak Ridge National Laboratory, P.O.Box 2009, Bldg. 9207-A, Oak Ridge, TN 37831-8083 1991.
9
 
10
 
11
W. D. Gropp, E. Lusk, and D. Swider, "Improving the Performance of MPI Derived Datatypes," Proc. MPI Developers and Users Conference (MPIDC), 1999.
 
12
W. Gu, G. Eisenhauer et al., "Falcon: On-line Monitoring and Steering of Parallel Programs," Concurrency: Practice and Experience, 10(9):699-736, 1998.
 
13
 
14
 
15
 
16
K. R. Koch, R. S. Baker, and R. E. Alcouffe, "Solution of the First-Order Form of the 3-D Discrete Ordinates Equation on a Massively Parallel Processor," Trans. Amer. Nuc. Soc., 65(198), 1992.
 
17
J. Labarta, S. Girona et al., "DiP: A Parallel Program Development Environment," CEPBA, Barcelona, Spain 1996.
18
19
 
20
D. A. Reed, P. C. Roth et al., "Scalable performance analysis: the Pablo performance analysis environment," Proc. Scalable Parallel Libraries Conf., 1994, pp. 104-13.
21
 
22
 
23
24
 
25
 
26