|
ABSTRACT
Initial implementations of parallel programs typically yield disappointing performance. Tuning to improve performance is thus a significant part of the parallel programming process. The effort required to tune a parallel program, and the level of performance that eventually is achieved, both depend heavily on the quality of the instrumentation that is available to the programmer.
This paper describes Quartz, a new tool for tuning parallel program performance on shared memory multiprocessors. The philosophy underlying Quartz was inspired by that of the sequential UNIX tool gprof: to appropriately direct the attention of the programmer by efficiently measuring just those factors that are most responsible for performance and by relating these metrics to one another and to the structure of the program. This philosophy is even more important in the parallel domain than in the sequential domain, because of the dramatically greater number of possible metrics and the dramatically increased complexity of program structures.
The principal metric of Quartz is normalized processor time: the total processor time spent in each section of code divided by the number of other processors that are concurrently busy when that section of code is being executed. Tied to the logical structure of the program, this metric provides a “smoking gun” pointing towards those areas of the program most responsible for poor performance. This information can be acquired efficiently by checkpointing to memory the number of busy processors and the state of each processor, and then statistically sampling these using a dedicated processor.
In addition to describing the design rationale, functionality, and implementation of Quartz, the paper examines how Quartz would be used to solve a number of performance problems that have been reported as being frequently encountered, and describes a case study in which Quartz was used to significantly improve the performance of a CAD circuit verifier.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
Anderson et al. 1989
|
|
 |
Aral & Gertner 1988
|
Ziya Aral , Ilya Gertner, Non-intrusive and interactive profiling in parasight, Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems, p.21-30, July 19-21, 1988, New Haven, Connecticut, United States
|
| |
BBN 1985
|
BBN Laboratories. Butterfly Parallel Processor Overview. 1985.
|
| |
Bershad et al. 1988
|
|
| |
Burkhart & Millen 1989
|
|
| |
Carpenter 1987
|
R.J. Carpenter. Performance Measurement Instrumentation for Multiprocessor Systems. In High Performance Computer Systems, ed. E. Gelenbe, North-Holland, pp. 81-92, 1987.
|
 |
Fowler et al. 1988
|
Robert J. Fowler , Thomas J. LeBlanc , John M. Mellor-Crummey, An integrated approach to parallel program debugging and performance analysis onlarge-scale multiprocessors, Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging, p.163-173, May 05-06, 1988, Madison, Wisconsin, United States
|
 |
Graham et al. 1982
|
Susan L. Graham , Peter B. Kessler , Marshall K. Mckusick, Gprof: A call graph execution profiler, Proceedings of the 1982 SIGPLAN symposium on Compiler construction, p.120-126, June 23-25, 1982, Boston, Massachusetts, United States
|
 |
Gupta 1989
|
|
| |
Halstead 1986
|
|
 |
Kerola & Schwetman 1987
|
|
 |
Ma et al. 1987
|
H.-K. T. Ma , S. Devadas , A. Sangiovanni-Vincentelli , R. Wei, Logic verification algorithms and their parallel implementation, Proceedings of the 24th ACM/IEEE conference on Design automation, p.283-290, June 28-July 01, 1987, Miami Beach, Florida, United States
[doi> 10.1145/37888.37931]
|
| |
Malony et al. 1989
|
Allen Malony, Daniel Reed, James Arendt, Ruth Aydt, Dominique Grabas, and Brian Totty. An Integrated Performance Data Collection, Analysis, and Visualization System. Proc. 4th Conference on Hypercubes, Concurrent Computers, and Applications, 1989.
|
| |
Miller& Yang 1987
|
Barton P. Miller and C.-Q. Yang. IPS: An interactive and Automatic Performance Measurement Tool for Parallel and Distributed Programs. Proc. 7th International Conference on Distributed Computing Systems, September 1987.
|
| |
Moeller-Nielsen & Staunstrup 1987
|
P. Moeller-Nielsen and J. Staunstrup. Problem-Heap: A Paradigm for Multiprocessor Algorithms. Parallel Computing 4, North-Holland, 1987, pp. 63-74.
|
| |
Pfister et al. 1985
|
G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAuliffe, E. Melton, V. Norton, and J. Weise. The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture. Proc. 1985 International Conference on Parallel Processing, August 1985.
|
| |
Rodgers 1986
|
David P. Rodgers. Personal communication.
|
| |
Segall & Rudolph 1985
|
Zary Segall and Larry Rudolph. PIE: A Programming and Instrumentation Environment for Parallel Processing. IEEE Software 2,6 (November 1985).
|
| |
Sequent 1988
|
Sequent Computer Systems, Inc.Symmetry Technical Summary,
|
| |
Thacker et al. 1988
|
|
| |
Yang & Miller 1988
|
Cui-Qing Yang and Barton Miller. Critical Path Analysis for the Execution of Parallel and Distributed Programs. Proc. 9th International Conference on Distributed Computing Systems, pp. 366-373, June 1988.
|
CITED BY 38
|
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, ACM SIGOPS Operating Systems Review, v.31 n.5, p.1-14, Dec. 1997
|
|
|
|
|
|
Marc Abrams , Randy Ribler , Anup Mathur, Two performance tool design issues and CHITRA's solutions, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.98-107, May 22-23, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrea C. Arpaci-Dusseau , Remzi H. Arpaci-Dusseau , David E. Culler , Joseph M. Hellerstein , David A. Patterson, Searching for the sorting record: experiences in tuning NOW-Sort, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.124-133, August 03-04, 1998, Welches, Oregon, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, ACM Transactions on Computer Systems (TOCS), v.15 n.4, p.357-390, Nov. 1997
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wagner Meira, Jr. , Thomas J. LeBlanc , Alexandros Poulos, Waiting time analysis and performance visualization in Carnival, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.1-10, May 22-23, 1996, Philadelphia, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|