|
ABSTRACT
Modern computer systems are called on to deal with billions of events every second, whether they are executed instructions, accessed memory locations, or forwarded packets. This presents a serious challenge to those who seek to quantify, analyze, or optimize such systems, because important trends and behaviors may easily be lost in a sea of data. We present range-adaptive profiling (RAP) as a new and general-purpose profiling method capable of hierarchically efficiently classifying streams of data in hardware. Through the use of RAP, events in an input stream are dynamically classified into increasingly precise categories, based on the frequency with which they occur. The more important a class, or range of events, the more precisely it is quantified. Despite the dynamic nature of our technique, we build upon tight theoretic bounds covering both worst-case error, as well as the required memory. In the limit, it is known that error and the memory bounds can be independent of the stream size and grow only linearly with the level of precision desired. Significantly, we expose the critical constants in these algorithms and through careful engineering, algorithm redesign, and use of heuristics, we show how a high-performance profile system can be implemented for range-adaptive profiling. RAP can be used on various profiles, such as PCs, load values, and memory addresses, and has a broad range of uses, from hot-region profiling to quantifying cache miss value locality. We propose two methods of implementation of RAP, one in software and the other with specialized hardware, for which we also describe our prototype FPGA implementation. We show that with just 8KB of memory, range profiles can be gathered with an average accuracy of 98%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Agrawal, B. and Sherwood, T. 2006. Modeling tcam power for next generation network devices. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX.
|
 |
2
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, ACM Transactions on Computer Systems (TOCS), v.15 n.4, p.357-390, Nov. 1997
[doi> 10.1145/265924.265925]
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
Bruno De Bus , Dominique Chanet , Bjorn De Sutter , Ludo Van Put , Koen De Bosschere, The design and implementation of FIT: a flexible instrumentation toolkit, Proceedings of the 5th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, June 07-08, 2004, Washington DC, USA
[doi> 10.1145/996821.996833]
|
| |
9
|
Brad Calder , Peter Feller , Alan Eustace, Value profiling, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.259-269, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
10
|
|
 |
11
|
|
 |
12
|
Thomas M. Conte , Burzin A. Patel , J. Stan Cox, Using branch handling hardware to support profile-driven optimization, Proceedings of the 27th annual international symposium on Microarchitecture, p.12-21, November 30-December 02, 1994, San Jose, California, United States
[doi> 10.1145/192724.192726]
|
| |
13
|
|
| |
14
|
Corporation, D. E. 1995. Alpha 21164 Microprocessor Hardware Reference Manual.
|
| |
15
|
Corporation, I. 1997. Pentium(r) Pro Processor Developer's Manual. McGraw-Hill, New York.
|
| |
16
|
Jeffrey Dean , James E. Hicks , Carl A. Waldspurger , William E. Weihl , George Chrysos, ProfileMe: hardware support for instruction-level profiling on out-of-order processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.292-302, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
17
|
|
 |
18
|
Cristian Estan , Stefan Savage , George Varghese, Automatically inferring patterns of resource consumption in network traffic, Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, August 25-29, 2003, Karlsruhe, Germany
[doi> 10.1145/863955.863972]
|
| |
19
|
fs2. First Silicon Solutions. Home page. http://www.fs2.com.
|
| |
20
|
Fung, W. W., and Sachdev, M. 2004. High performance priority encoder for content addressable memories. In Micronet R&D Annual Workshop.
|
| |
21
|
|
 |
22
|
|
| |
23
|
Hershberger, J., Shrivastava, N., Suri, S., and Toth, C. 2004. Adaptive spatial partitioning for multidimensional data streams. In International Symposium on Algorithms and Computation (ISAAC).
|
| |
24
|
Hewlett-Packard. 1994. PA-RISC 1.1 Architecture and Instruction Set Reference Manual.
|
| |
25
|
Hirzel, M. and Chilimbi, T. 2001. Bursty tracing: A framework for low-overhead temporal profiling. In Proceedings of the 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4).
|
| |
26
|
Inc, M. T. 1995. Mips R10000 Microprocessor User's Manual.
|
| |
27
|
Quinn Jacobson , Eric Rotenberg , James E. Smith, Path-based next trace prediction, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.14-23, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
| |
32
|
Li, X., Liu, Z., Li, W., and Liu, B. 2004. SCP-TCAM: A power-efficient search engine for fast IP Lookup. In ISBN Proceedings.
|
| |
33
|
|
| |
34
|
|
 |
35
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
| |
36
|
|
| |
37
|
|
| |
38
|
Pagiamtziz, K. and Sheikholeslami, A. 2004. A low-power content-addressable memory (CAM) using pipelined hierarchical search engine. IEEE J. Solid-State Circuits.
|
| |
39
|
Pagiamtzis, K. and Sheikholeslami, A. 2006. Content-addressable memory (CAM) circuits and architectures: A tutorial and survey. IEEE J. Solid-State Circuits. 41.
|
| |
40
|
Peri, R. V., Jinturkar, S., and Fajardo, L. 1999. A novel technique for profiling programs in embedded systems. In ACM Workshop on Feedback-Directed and Dynamic Optimization.
|
 |
41
|
|
| |
42
|
Sanchez, M., Biersack, E., and Dabbous, W. 2001. Survey and taxonomy of IP address lookup algorithms. IEEE Netw. Maga. 15, 2, 8--23.
|
 |
43
|
|
| |
44
|
Shivakumar, P., and Jouppi, N. 2001/2. Cacti 3.0: An Integrated Cache Timing, Power and Area Model.
|
 |
45
|
|
 |
46
|
|
 |
47
|
|
| |
48
|
Srivastava, A., Edwards, A., and Vo., H. 2001. Vulcan: Binary transformation in a distributed environment. Technical report msr-tr-2001-50. Microsoft Research., Redmond, Washington.
|
| |
49
|
|
| |
50
|
Wang, J. and Huang, C. 2000. High-speed and low-power CMOS priority encoders. IEEE J. Solid-State Circuits 35, 10 (Oct.), 1511--1514.
|
 |
51
|
|
 |
52
|
|
| |
53
|
|
 |
54
|
|
| |
55
|
|
 |
56
|
|
 |
57
|
Pin Zhou , Feng Qin , Wei Liu , Yuanyuan Zhou , Josep Torrellas, iWatcher: Efficient Architectural Support for Software Debugging, Proceedings of the 31st annual international symposium on Computer architecture, p.224, June 19-23, 2004, München, Germany
|
| |
58
|
|
|