ACM Home Page
Please provide us with feedback. Feedback
Rapid profiling via stratified sampling
Full text PdfPdf (1.02 MB)
Source International Symposium on Computer Architecture archive
Proceedings of the 28th annual international symposium on Computer architecture table of contents
Göteborg, Sweden
Pages: 278 - 289  
Year of Publication: 2001
ISBN:0-7695-1162-7
Also published in ...
Authors
S. Subramanya Sastry  Computer Sciences Dept., University of Wisconsin-Madison
Rastislav Bodík  Computer Sciences Dept., University of Wisconsin-Madison
James E. Smith  Dept. of ECE, University of Wisconsin-Madison
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\TCCA : TC on Computer Arhitecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 53,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/379240.379273
What is a DOI?

ABSTRACT

Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously.

Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
5
 
6
7
 
8
B. Calder, P. Feller, and A. Eustace. Value profiling. Journal of lnstruction Level Parallelism, March 1999.
 
9
 
10
 
11
 
12
Charles Consel, Luke Homof, Francois Noel, Jacques Noye, and Nicolae Volanschi. A Uniform Approach for Compile-time and Runtime Specialization. Technical Report RR-2775, lnria, Institut National de Recherche en lnformatique et en Automatique, 1996.
 
13
 
14
 
15
 
16
Doug Burger, Todd M. Austin, and Steve Bennett. Evaluating Future Microprocessors: The SimpleScalar Tool Set. Technical Report CS- TR-96-1308 (Available from http://www.cs.wisc.edu/trs.html), University of Wisconsin-Madison, July 1996.
17
 
18
19
 
20
Joseph A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478-490, July 1981.
21
 
22
William G.Cochran. Sampling Techniques. John Wiley and Sons, 1977.
 
23
 
24
B. Grant, M. Mock, M. Philipose, C. Chambers, and S. Eggers. DyC: An Expressive Annotation-Directed Dynamic Compiler for C. Technical Report TR-97-03-03, University of Washington, Department of Computer Science and Engineering, March 1997.
 
25
 
26
 
27
 
28
 
29
Alexander Klaiber. The technology behind Crusoe(tm) Processors, January 2000.
30
 
31
32
33
 
34
Steve Meloan. The Java HotSpot (tm) Perfomance Engine: An In- Depth Look. Article on Sun's Java Developer Connection site, 1999.
35
 
36
37
 
38
 
39
Timothy Sherwood and Brad Calder. Time Varying Behavior of Programs. TechReport CS99-630, University of California-San Diego, August 1999.
 
40
James E Smith, Subramanya Sastry, Timothy Hell, and Todd Bezenek. Achieving High Performance via Co-Designed Virtual Machines. In International Workshop on Innovative Architecture, October 1999.
41
42
 
43

CITED BY  14

Collaborative Colleagues:
S. Subramanya Sastry: colleagues
Rastislav Bodík: colleagues
James E. Smith: colleagues