| Continuous profiling: where have all the cycles gone? |
| Full text |
Pdf
(259 KB)
|
| Source
|
ACM Transactions on Computer Systems (TOCS)
archive
Volume 15 , Issue 4 (November 1997)
table of contents
Pages: 357 - 390
Year of Publication: 1997
ISSN:0734-2071
|
|
Authors
|
|
Jennifer M. Anderson
|
Digital Equipment Corp., Palo Alto, CA
|
|
Lance M. Berc
|
Digital Equipment Corp., Palo Alto, CA
|
|
Jeffrey Dean
|
Digital Equipment Corp., Palo Alto, CA
|
|
Sanjay Ghemawat
|
Digital Equipment Corp., Palo Alto, CA
|
|
Monika R. Henzinger
|
Digital Equipment Corp., Palo Alto, CA
|
|
Shun-Tak A. Leung
|
Digital Equipment Corporation, Palo Alto, CA
|
|
Richard L. Sites
|
Digital Equipment Corporation, Palo Alto, CA
|
|
Mark T. Vandevoorde
|
Digital Equipment Corporation, Palo Alto, CA
|
|
Carl A. Waldspurger
|
Digital Equipment Corporation, Palo Alto, CA
|
|
William E. Weihl
|
Digital Equipment Corporation, Palo Alto, CA
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 112, Citation Count: 45
|
|
|
ABSTRACT
This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
BLICKSTEIN, D., CRAIG, P., DAVIDSON, C., FAIMAN, R., GLOSSOP, K., GROVE, R., HOBBS, S., AND NOYCE, W. 1992. The GEM optimizing compiler system. Digital Tech. J. 4, 4.
|
 |
4
|
|
| |
5
|
|
| |
6
|
COHN, R., GOODWIN, D., LOWNEY, P. G., AND RUBIN, N. 1997. Spike: An optimizer for Alpha/NT executables. In USENIX Windows NT Workshop. USENIX Assoc., Berkeley, Calif.
|
| |
7
|
DIGITAL. 1995a. Alpha 21164 microprocessor hardware reference manual. Digital Equipment Corp., Maynard, Mass.
|
| |
8
|
DIGITAL. 1995b. DECchip 21064 and DECchip 21064A Alpha AXP microprocessors hardware reference manual. Digital Equipment Corp., Maynard, Mass.
|
| |
9
|
|
 |
10
|
|
| |
11
|
Mary W. Hall , Jennifer M. Anderson , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , Edouard Bugnion , Monica S. Lam, Maximizing Multiprocessor Performance with the SUIF Compiler, Computer, v.29 n.12, p.84-89, December 1996
[doi> 10.1109/2.546613]
|
 |
12
|
Richard Johnson , David Pearson , Keshav Pingali, The program structure tree: computing control regions in linear time, Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, p.171-185, June 20-24, 1994, Orlando, Florida, United States
|
| |
13
|
MCCALPIN, J. D. 1995. Memory bandwidth and machine balance in high performance computers. IEEE Tech. Comm. Comput. Arch. Newslett. See also http://www.cs.virginia.edu/ stream.
|
| |
14
|
MIPS. 1990. UMIPS-V reference manual (pixie and pixstats). MIPS Computer Systems, Sunnyvale, Calif.
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Marco Zagha , Brond Larson , Steve Turner , Marty Itzkowitz, Performance analysis using the MIPS R10000 performance counters, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM), p.16-es, January 01-01, 1996, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/369028.369059]
|
 |
19
|
Xiaolan Zhang , Zheng Wang , Nicholas Gloy , J. Bradley Chen , Michael D. Smith, System support for automatic profiling and optimization, Proceedings of the sixteenth ACM symposium on Operating systems principles, p.15-26, October 05-08, 1997, Saint Malo, France
|
CITED BY 45
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
M. Burrows , U. Erlingson , S.-T. A. Leung , M. T. Vandevoorde , C. A. Waldspurger , K. Walker , W. E. Weihl, Efficient and flexible value sampling, ACM SIGPLAN Notices, v.35 n.11, p.160-167, Nov. 2000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Howard Chen , Wei-Chung Hsu , Jiwei Lu , Pen-Chung Yew , Dong-Yuan Chen, Dynamic trace selection using performance monitoring hardware sampling, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiao Zhang , Sandhya Dwarkadas , Girts Folkmanis , Kai Shen, Processor hardware counter statistics as a first-class system resource, Proceedings of the 11th USENIX workshop on Hot topics in operating systems, p.1-6, May 07-09, 2007, San Diego, CA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Peter F. Sweeney , Matthias Hauswirth , Brendon Cahoon , Perry Cheng , Amer Diwan , David Grove , Michael Hind, Using hardware performance monitors to understand the behavior of java applications, Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium, p.5-5, May 06-07, 2004, San Jose, California
|
|
|
|
|
|
Shashidhar Mysore , Banit Agrawal , Rodolfo Neuber , Timothy Sherwood , Nisheeth Shrivastava , Subhash Suri, Formulating and implementing profiling over adaptive ranges, ACM Transactions on Architecture and Code Optimization (TACO), v.5 n.1, p.1-32, May 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Murali Haran , Alan Karr , Michael Last , Alessandro Orso , Adam A. Porter , Ashish Sanil , Sandro Fouche, Techniques for Classifying Executions of Deployed Software to Support Software Engineering Tasks, IEEE Transactions on Software Engineering, v.33 n.5, p.287-304, May 2007
|
|
|
|
|
|
|
|
|
Alex Shye , Berkin Ozisikyilmaz , Arindam Mallik , Gokhan Memik , Peter A. Dinda , Robert P. Dick , Alok N. Choudhary, Learning and Leveraging the Relationship between Architecture-Level Measurements and Individual User Satisfaction, ACM SIGARCH Computer Architecture News, v.36 n.3, p.427-438, June 2008
|
|
|
|
|
|
|
|
|
|
|
|
Lei Gao , Jia Huang , Jianjiang Ceng , Rainer Leupers , Gerd Ascheid , Heinrich Meyr, TotalProf: a fast and accurate retargetable source code profiler, Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, October 11-16, 2009, Grenoble, France
|
|
|
|
|