ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Continuous profiling: where have all the cycles gone?
Full text PdfPdf (259 KB)
Source ACM Transactions on Computer Systems (TOCS) archive
Volume 15 ,  Issue 4  (November 1997) table of contents
Pages: 357 - 390  
Year of Publication: 1997
ISSN:0734-2071
Authors
Jennifer M. Anderson  Digital Equipment Corp., Palo Alto, CA
Lance M. Berc  Digital Equipment Corp., Palo Alto, CA
Jeffrey Dean  Digital Equipment Corp., Palo Alto, CA
Sanjay Ghemawat  Digital Equipment Corp., Palo Alto, CA
Monika R. Henzinger  Digital Equipment Corp., Palo Alto, CA
Shun-Tak A. Leung  Digital Equipment Corporation, Palo Alto, CA
Richard L. Sites  Digital Equipment Corporation, Palo Alto, CA
Mark T. Vandevoorde  Digital Equipment Corporation, Palo Alto, CA
Carl A. Waldspurger  Digital Equipment Corporation, Palo Alto, CA
William E. Weihl  Digital Equipment Corporation, Palo Alto, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 112,   Citation Count: 45
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/265924.265925
What is a DOI?

ABSTRACT

This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
BLICKSTEIN, D., CRAIG, P., DAVIDSON, C., FAIMAN, R., GLOSSOP, K., GROVE, R., HOBBS, S., AND NOYCE, W. 1992. The GEM optimizing compiler system. Digital Tech. J. 4, 4.
4
 
5
 
6
COHN, R., GOODWIN, D., LOWNEY, P. G., AND RUBIN, N. 1997. Spike: An optimizer for Alpha/NT executables. In USENIX Windows NT Workshop. USENIX Assoc., Berkeley, Calif.
 
7
DIGITAL. 1995a. Alpha 21164 microprocessor hardware reference manual. Digital Equipment Corp., Maynard, Mass.
 
8
DIGITAL. 1995b. DECchip 21064 and DECchip 21064A Alpha AXP microprocessors hardware reference manual. Digital Equipment Corp., Maynard, Mass.
 
9
10
 
11
12
 
13
MCCALPIN, J. D. 1995. Memory bandwidth and machine balance in high performance computers. IEEE Tech. Comm. Comput. Arch. Newslett. See also http://www.cs.virginia.edu/ stream.
 
14
MIPS. 1990. UMIPS-V reference manual (pixie and pixstats). MIPS Computer Systems, Sunnyvale, Calif.
15
 
16
 
17
 
18
19

CITED BY  45

Collaborative Colleagues:
Jennifer M. Anderson: colleagues
Lance M. Berc: colleagues
Jeffrey Dean: colleagues
Sanjay Ghemawat: colleagues
Monika R. Henzinger: colleagues
Shun-Tak A. Leung: colleagues
Richard L. Sites: colleagues
Mark T. Vandevoorde: colleagues
Carl A. Waldspurger: colleagues
William E. Weihl: colleagues