|
ABSTRACT
Performance analysis of communication activity for a terascale application with traditional message tracing can be overwhelming in terms of overhead, perturbation, and storage. We propose a novel alternative that enables dynamic statistical profiling of an application's communication activity using message sampling. We have implemented an operational prototype, named PHOTON, and our evidence shows that this new approach can provide an accurate, low-overhead, tractable alternative for performance analysis of communication activity. PHOTON consists of two components: a Message Passing Interface (MPI) profiling layer that implements sampling and analysis, and a modified MPI runtime that appends a small but necessary amount of information to individual messages. More importantly, this alternative enables an assortment of runtime analysis techniques so that, in contrast to post-mortem, trace-based techniques, the raw performance data can be jettisoned immediately after analysis. Our investigation shows that message sampling can reduce overhead to imperceptible levels for many applications. Experiments on several applications demonstrate the viability of this approach. For example, with one application, our technique reduced the analysis overhead from 154% for traditional tracing to 6% for statistical profiling. We also evaluate different sampling techniques in this framework. The coverage of the sample space provided by purely random sampling is superior to counter- and timer-based sampling. Also, PHOTON's design reveals that frugal modifications to the MPI runtime system could facilitate such techniques on production computing systems, and it suggests that this sampling technique could execute continuously for long-running applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
George S. Almasi , Cǎlin Caşcaval , José G. Castaños , Monty Denneau , Wilm Donath , Maria Eleftheriou , Mark Giampapa , Howard Ho , Derek Lieber , José E. Moreira , Dennis Newns , Marc Snir , Henry S. Warren, Jr., Demonstrating the scalability of a molecular dynamics application on a Petaflop computer, Proceedings of the 15th international conference on Supercomputing, p.393-406, June 2001, Sorrento, Italy
[doi> 10.1145/377792.377896]
|
 |
2
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, ACM Transactions on Computer Systems (TOCS), v.15 n.4, p.357-390, Nov. 1997
[doi> 10.1145/265924.265925]
|
 |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
 |
7
|
Kimberly C. Claffy , George C. Polyzos , Hans-Werner Braun, Application of sampling methodologies to network traffic characterization, Conference proceedings on Communications architectures, protocols and applications, p.194-203, September 13-17, 1993, San Francisco, California, United States
|
| |
8
|
G. A. Geist, M. T. Heath et al., "A Users' Guide to PICL - A Portable Instrumented Communication Library," Oak Ridge National Laboratory, P.O.Box 2009, Bldg. 9207-A, Oak Ridge, TN 37831-8083 1991.
|
 |
9
|
Susan L. Graham , Peter B. Kessler , Marshall K. Mckusick, Gprof: A call graph execution profiler, Proceedings of the 1982 SIGPLAN symposium on Compiler construction, p.120-126, June 23-25, 1982, Boston, Massachusetts, United States
|
| |
10
|
|
| |
11
|
W. D. Gropp, E. Lusk, and D. Swider, "Improving the Performance of MPI Derived Datatypes," Proc. MPI Developers and Users Conference (MPIDC), 1999.
|
| |
12
|
W. Gu, G. Eisenhauer et al., "Falcon: On-line Monitoring and Steering of Parallel Programs," Concurrency: Practice and Experience, 10(9):699-736, 1998.
|
| |
13
|
|
| |
14
|
Jay Hoeflinger , Bob Kuhn , Wolfgang E. Nagel , Paul Petersen , Hrabri Rajic , Sanjiv Shah , Jeffrey S. Vetter , Michael Voss , Renee Woo, An Integrated Performance Visualizer for MPI/OpenMP Programs, Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, p.40-52, July 30-31, 2001
|
| |
15
|
|
| |
16
|
K. R. Koch, R. S. Baker, and R. E. Alcouffe, "Solution of the First-Order Form of the 3-D Discrete Ordinates Equation on a Massively Parallel Processor," Trans. Amer. Nuc. Soc., 65(198), 1992.
|
| |
17
|
J. Labarta, S. Girona et al., "DiP: A Parallel Program Development Environment," CEPBA, Barcelona, Spain 1996.
|
 |
18
|
|
 |
19
|
A. A. Mirin , R. H. Cohen , B. C. Curtis , W. P. Dannevik , A. M. Dimits , M. A. Duchaineau , D. E. Eliason , D. R. Schikore , S. E. Anderson , D. H. Porter , P. R. Woodward , L. J. Shieh , S. W. White, Very high resolution simulation of compressible turbulence on the IBM-SP system, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.70-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331601]
|
| |
20
|
D. A. Reed, P. C. Roth et al., "Scalable performance analysis: the Pablo performance analysis environment," Proc. Scalable Parallel Libraries Conf., 1994, pp. 104-13.
|
 |
21
|
Sameer Shende , Allen D. Malony , Janice Cuny , Peter Beckman , Steve Karmesin , Kathleen Lindlan, Portable profiling and tracing for parallel, scientific applications using C++, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.134-145, August 03-04, 1998, Welches, Oregon, United States
[doi> 10.1145/281035.281049]
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
C. Eric Wu , Anthony Bolmarcich , Marc Snir , David Wootton , Farid Parpia , Anthony Chan , Ewing Lusk , William Gropp, From trace generation to visualization: a performance framework for distributed parallel systems, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.50-es, November 04-10, 2000, Dallas, Texas, United States
|
CITED BY 5
|
|
Patrick G. Bridges , Arthur B. MacCabe, IMPuLSE: integrated monitoring and profiling for large-scale environments, Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems, p.1-5, October 22-23, 2004, Houston, Texas
|
|
|
|
|
|
|
|
|
|
|
|
|
|