ACM Home Page
Please provide us with feedback. Feedback
Formulating and implementing profiling over adaptive ranges
Full text PdfPdf (806 KB)
Source
ACM Transactions on Architecture and Code Optimization (TACO) archive
Volume 5 ,  Issue 1  (May 2008) table of contents
Article No. 2  
Year of Publication: 2008
ISSN:1544-3566
Authors
Shashidhar Mysore  University of California, Santa Barbara, California
Banit Agrawal  University of California, Santa Barbara, California
Rodolfo Neuber  University of California, Santa Barbara, California
Timothy Sherwood  University of California, Santa Barbara, California
Nisheeth Shrivastava  University of California, Santa Barbara, California
Subhash Suri  University of California, Santa Barbara, California
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 88,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1369396.1369398
What is a DOI?

ABSTRACT

Modern computer systems are called on to deal with billions of events every second, whether they are executed instructions, accessed memory locations, or forwarded packets. This presents a serious challenge to those who seek to quantify, analyze, or optimize such systems, because important trends and behaviors may easily be lost in a sea of data. We present range-adaptive profiling (RAP) as a new and general-purpose profiling method capable of hierarchically efficiently classifying streams of data in hardware. Through the use of RAP, events in an input stream are dynamically classified into increasingly precise categories, based on the frequency with which they occur. The more important a class, or range of events, the more precisely it is quantified. Despite the dynamic nature of our technique, we build upon tight theoretic bounds covering both worst-case error, as well as the required memory. In the limit, it is known that error and the memory bounds can be independent of the stream size and grow only linearly with the level of precision desired. Significantly, we expose the critical constants in these algorithms and through careful engineering, algorithm redesign, and use of heuristics, we show how a high-performance profile system can be implemented for range-adaptive profiling. RAP can be used on various profiles, such as PCs, load values, and memory addresses, and has a broad range of uses, from hot-region profiling to quantifying cache miss value locality. We propose two methods of implementation of RAP, one in software and the other with specialized hardware, for which we also describe our prototype FPGA implementation. We show that with just 8KB of memory, range profiles can be gathered with an average accuracy of 98%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Agrawal, B. and Sherwood, T. 2006. Modeling tcam power for next generation network devices. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX.
2
3
 
4
 
5
6
 
7
8
 
9
10
11
12
 
13
 
14
Corporation, D. E. 1995. Alpha 21164 Microprocessor Hardware Reference Manual.
 
15
Corporation, I. 1997. Pentium(r) Pro Processor Developer's Manual. McGraw-Hill, New York.
 
16
17
18
 
19
fs2. First Silicon Solutions. Home page. http://www.fs2.com.
 
20
Fung, W. W., and Sachdev, M. 2004. High performance priority encoder for content addressable memories. In Micronet R&D Annual Workshop.
 
21
22
 
23
Hershberger, J., Shrivastava, N., Suri, S., and Toth, C. 2004. Adaptive spatial partitioning for multidimensional data streams. In International Symposium on Algorithms and Computation (ISAAC).
 
24
Hewlett-Packard. 1994. PA-RISC 1.1 Architecture and Instruction Set Reference Manual.
 
25
Hirzel, M. and Chilimbi, T. 2001. Bursty tracing: A framework for low-overhead temporal profiling. In Proceedings of the 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4).
 
26
Inc, M. T. 1995. Mips R10000 Microprocessor User's Manual.
 
27
 
28
 
29
 
30
31
 
32
Li, X., Liu, Z., Li, W., and Liu, B. 2004. SCP-TCAM: A power-efficient search engine for fast IP Lookup. In ISBN Proceedings.
 
33
 
34
35
 
36
 
37
 
38
Pagiamtziz, K. and Sheikholeslami, A. 2004. A low-power content-addressable memory (CAM) using pipelined hierarchical search engine. IEEE J. Solid-State Circuits.
 
39
Pagiamtzis, K. and Sheikholeslami, A. 2006. Content-addressable memory (CAM) circuits and architectures: A tutorial and survey. IEEE J. Solid-State Circuits. 41.
 
40
Peri, R. V., Jinturkar, S., and Fajardo, L. 1999. A novel technique for profiling programs in embedded systems. In ACM Workshop on Feedback-Directed and Dynamic Optimization.
41
 
42
Sanchez, M., Biersack, E., and Dabbous, W. 2001. Survey and taxonomy of IP address lookup algorithms. IEEE Netw. Maga. 15, 2, 8--23.
43
 
44
Shivakumar, P., and Jouppi, N. 2001/2. Cacti 3.0: An Integrated Cache Timing, Power and Area Model.
45
46
47
 
48
Srivastava, A., Edwards, A., and Vo., H. 2001. Vulcan: Binary transformation in a distributed environment. Technical report msr-tr-2001-50. Microsoft Research., Redmond, Washington.
 
49
 
50
Wang, J. and Huang, C. 2000. High-speed and low-power CMOS priority encoders. IEEE J. Solid-State Circuits 35, 10 (Oct.), 1511--1514.
51
52
 
53
54
 
55
56
57
 
58

Collaborative Colleagues:
Shashidhar Mysore: colleagues
Banit Agrawal: colleagues
Rodolfo Neuber: colleagues
Timothy Sherwood: colleagues
Nisheeth Shrivastava: colleagues
Subhash Suri: colleagues