ACM Home Page
Please provide us with feedback. Feedback
A performance counter architecture for computing accurate CPI components
Full text PdfPdf (134 KB)
Source Architectural Support for Programming Languages and Operating Systems archive
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems table of contents
San Jose, California, USA
SESSION: Estimation and prediction of power and performance table of contents
Pages: 175 - 184  
Year of Publication: 2006
ISBN:1-59593-451-0
Also published in ...
Authors
Stijn Eyerman  Ghent University
Lieven Eeckhout  Ghent University
Tejas Karkhanis  University of Wisconsin-Madison
James E. Smith  University of Wisconsin-Madison
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
SIGPLAN: ACM Special Interest Group on Programming Languages
SIGOPS: ACM Special Interest Group on Operating Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 151,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1168857.1168880
What is a DOI?

ABSTRACT

A common way of representing processor performance is to use Cycles per Instruction (CPI) `stacks' which break performance into a baseline CPI plus a number of individual miss event CPI components. CPI stacks can be very helpful in gaining insight into the behavior of an application on a given microprocessor; consequently, they are widely used by software application developers and computer architects. However, computing CPI stacks on superscalar out-of-order processors is challenging because of various overlaps among execution and miss events (cache misses, TLB misses, and branch mispredictions).This paper shows that meaningful and accurate CPI stacks can be computed for superscalar out-of-order processors. Using interval analysis, a novel method for analyzing out-of-order processor performance, we gain understanding into the performance impact of the various miss events. Based on this understanding, we propose a novel way of architecting hardware performance counters for building accurate CPI stacks. The additional hardware for implementing these counters is limited and comparable to existing hardware performance counter architectures while being significantly more accurate than previous approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
S. Eyerman, J.E. Smith, and L. Eeckhout. Characterizing the branch misprediction penalty. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2006), pages 48--58, Mar. 2006.
5
6
 
7
Intel. Intel Itanium 2 Processor Reference Manual for Software Development and Optimization, May 2004. 251110-003.
 
8
T. Karkhanis and J.E. Smith. A day in the life of a data cache miss. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues (WMPI 2002) held in conjunction with ISCA-29, May 2002.
9
10
 
11
 
12
A. Mericas. POWER5 performance measurement and characterization. Tutorial at the IEEE International Symposium on Workload Characterization, Oct. 2005.
 
13
A. Mericas. Performance monitoring on the POWER5 microprocessor. In L.K. John and L. Eeckhout, editors, Performance Evaluation and Benchmarking, pages 247--266. CRC Press, 2006.
 
14
15
 
16
17
 
18
E.M. Riseman and C.C. Foster. The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, C-21(12):1405--1411, Dec. 1972.
 
19
 
20
 
21

CITED BY  12

Collaborative Colleagues:
Stijn Eyerman: colleagues
Lieven Eeckhout: colleagues
Tejas Karkhanis: colleagues
James E. Smith: colleagues