|
ABSTRACT
Sophisticated binary translators and dynamic optimizers demand a program profiler with low overhead, high accuracy, and the ability to collect a variety of profile types. A profiling scheme that achieves these goals is proposed. Conceptually, the hardware compresses a stream of profile data by counting identical events; the compressed profile dam is passed to software for analysis. Compressing the high-bandwidth event stream greatly reduces software overhead. Because optimizations can tolerate some profiling errors, we allow the stream compressor to be lossy, thereby enabling a low-cost sampling-based hardware design. Because the hardware compressor is insensitive to the event content, it supports various profile types and can process multiple types simultaneously.
Basic components of our framework are periodic and random samplers, counters, and hash functions. These components are composed to form a variety of stream compressors. One design is both simple and very effective: the input stream is hash-split into multiple substreams, each of which is fed into a simple periodic sampler that selects every kth event. This stratified periodic sampler performs better than conventional random sampling because it biases each substream towards a small number of unique events, thereby reducing sampling error, and allowing faster convergence to an accurate profile. For example, convergence to a given level of accuracy is about twice as fast for gcc. When sampling overhead is considered, the stratified periodic profiler achieves less than 3% error while incurring an overhead of only 3.5% for gcc.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Glenn Ammons , Thomas Ball , James R. Larus, Exploiting hardware performance counters with flow and context sensitive profiling, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.85-96, June 16-18, 1997, Las Vegas, Nevada, United States
|
 |
2
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, ACM Transactions on Computer Systems (TOCS), v.15 n.4, p.357-390, Nov. 1997
[doi> 10.1145/265924.265925]
|
 |
3
|
Matthew Arnold , Stephen Fink , David Grove , Michael Hind , Peter F. Sweeney, Adaptive optimization in the Jalapeño JVM, Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, p.47-65, October 2000, Minneapolis, Minnesota, United States
|
 |
4
|
Vasanth Bala , Evelyn Duesterwald , Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.1-12, June 18-21, 2000, Vancouver, British Columbia, Canada
|
 |
5
|
|
| |
6
|
|
 |
7
|
Michael G. Burke , Jong-Deok Choi , Stephen Fink , David Grove , Michael Hind , Vivek Sarkar , Mauricio J. Serrano , V. C. Sreedhar , Harini Srinivasan , John Whaley, The Jalapeño dynamic optimizing compiler for Java, Proceedings of the ACM 1999 conference on Java Grande, p.129-141, June 12-14, 1999, San Francisco, California, United States
[doi> 10.1145/304065.304113]
|
| |
8
|
B. Calder, P. Feller, and A. Eustace. Value profiling. Journal of lnstruction Level Parallelism, March 1999.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Charles Consel, Luke Homof, Francois Noel, Jacques Noye, and Nicolae Volanschi. A Uniform Approach for Compile-time and Runtime Specialization. Technical Report RR-2775, lnria, Institut National de Recherche en lnformatique et en Automatique, 1996.
|
| |
13
|
|
| |
14
|
|
| |
15
|
Jeffrey Dean , James E. Hicks , Carl A. Waldspurger , William E. Weihl , George Chrysos, ProfileMe: hardware support for instruction-level profiling on out-of-order processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.292-302, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
16
|
Doug Burger, Todd M. Austin, and Steve Bennett. Evaluating Future Microprocessors: The SimpleScalar Tool Set. Technical Report CS- TR-96-1308 (Available from http://www.cs.wisc.edu/trs.html), University of Wisconsin-Madison, July 1996.
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
Joseph A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478-490, July 1981.
|
 |
21
|
David M. Gallagher , William Y. Chen , Scott A. Mahlke , John C. Gyllenhaal , Wen-mei W. Hwu, Dynamic memory disambiguation using the memory conflict buffer, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.183-193, October 05-07, 1994, San Jose, California, United States
|
| |
22
|
William G.Cochran. Sampling Techniques. John Wiley and Sons, 1977.
|
| |
23
|
|
| |
24
|
B. Grant, M. Mock, M. Philipose, C. Chambers, and S. Eggers. DyC: An Expressive Annotation-Directed Dynamic Compiler for C. Technical Report TR-97-03-03, University of Washington, Department of Computer Science and Engineering, March 1997.
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
|
| |
29
|
Alexander Klaiber. The technology behind Crusoe(tm) Processors, January 2000.
|
 |
30
|
|
| |
31
|
|
 |
32
|
|
 |
33
|
M. Burrows , U. Erlingson , S-T. A. Leung , M. T. Vandevoorde , C. A. Waldspurger , K. Walker , W. E. Weihl, Efficient and flexible value sampling, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.160-167, November 2000, Cambridge, Massachusetts, United States
|
| |
34
|
Steve Meloan. The Java HotSpot (tm) Perfomance Engine: An In- Depth Look. Article on Sun's Java Developer Connection site, 1999.
|
 |
35
|
Matthew C. Merten , Andrew R. Trick , Christopher N. George , John C. Gyllenhaal , Wen-mei W. Hwu, A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization, Proceedings of the 26th annual international symposium on Computer architecture, p.136-147, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
36
|
|
 |
37
|
Massimiliano Poletto , Dawson R. Engler , M. Frans Kaashoek, tcc: a system for fast, flexible, and high-level dynamic code generation, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.109-121, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
38
|
|
| |
39
|
Timothy Sherwood and Brad Calder. Time Varying Behavior of Programs. TechReport CS99-630, University of California-San Diego, August 1999.
|
| |
40
|
James E Smith, Subramanya Sastry, Timothy Hell, and Todd Bezenek. Achieving High Performance via Co-Designed Virtual Machines. In International Workshop on Innovative Architecture, October 1999.
|
 |
41
|
|
 |
42
|
|
| |
43
|
|
CITED BY 14
|
|
|
|
|
|
|
|
Howard Chen , Wei-Chung Hsu , Jiwei Lu , Pen-Chung Yew , Dong-Yuan Chen, Dynamic trace selection using performance monitoring hardware sampling, Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, March 23-26, 2003, San Francisco, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shashidhar Mysore , Banit Agrawal , Rodolfo Neuber , Timothy Sherwood , Nisheeth Shrivastava , Subhash Suri, Formulating and implementing profiling over adaptive ranges, ACM Transactions on Architecture and Code Optimization (TACO), v.5 n.1, p.1-32, May 2008
|
|
|
Takanobu Baba , Tomohisa Masuho , Takashi Yokota , Kanemitsu Ootsu, Design of a two-level hot path detector for path-based loop optimizations, Proceedings of the third conference on IASTED International Conference: Advances in Computer Science and Technology, p.23-28, April 02-04, 2007, Phuket, Thailand
|
|
|
|
|
|
|
|
|
|
|
|
|
|