ACM Home Page
Please provide us with feedback. Feedback
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
Full text PdfPdf (913 KB)
Source
ACM Transactions on Programming Languages and Systems (TOPLAS) archive
Volume 29 ,  Issue 2  (April 2007) table of contents
Article No. 12  
Year of Publication: 2007
ISSN:0164-0925
Authors
Jaydeep Marathe  North Carolina State University, Raleigh, NC
Frank Mueller  North Carolina State University, Raleigh, NC
Tushar Mohan  IBM India Research Lab, Hauz Khas, New Delhi
Sally A. Mckee  Cornell University, Ithaca, NY
Bronis R. De Supinski  Lawrence Livermore National Laboratory, Livermore, CA
Andy Yoo  Lawrence Livermore National Laboratory, Livermore, CA
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 97,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1216374.1216380
What is a DOI?

ABSTRACT

With the diverging improvements in CPU speeds and memory access latencies, detecting and removing memory access bottlenecks becomes increasingly important. In this work we present METRIC, a software framework for isolating and understanding such bottlenecks using partial access traces. METRIC extracts access traces from executing programs without special compiler or linker support. We make four primary contributions. First, we present a framework for extracting partial access traces based on dynamic binary rewriting of the executing application. Second, we introduce a novel algorithm for compressing these traces. The algorithm generates constant space representations for regular accesses occurring in nested loop structures. Third, we use these traces for offline incremental memory hierarchy simulation. We extract symbolic information from the application executable and use this to generate detailed source-code correlated statistics including per-reference metrics, cache evictor information, and stream metrics. Finally, we demonstrate how this information can be used to isolate and understand memory access inefficiencies. This illustrates a potential advantage of METRIC over compile-time analysis for sample codes, particularly when interprocedural analysis is required.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
 
4
Burrows, M. and Wheeler, D. J. 1994. A block-sorting lossless data compression algorithm. Tech. Rep. 124.
5
 
6
Burtscher, M. 2004b. Vpc3 source code. http://www.csl.cornell.edu/burtscher/research/tracecom pression/.
7
8
9
10
 
11
 
12
13
14
15
 
16
17
 
18
Intel. 2004. Intel Itanium2 Processor Reference Manual for Software Development and Optimization Vol.1, Intel, Santa Clara, CA.
 
19
20
 
21
22
 
23
Manning, N. 2005. Sequitur source code. http://sequence.rutgers.edu/sequitur/sequitur.cc.
 
24
Marathe, J. and Mueller, F. 2002. Detecting memory performance bottlenecks via binary rewriting. In Proceedings of the Workshop on Binary Translation.
25
 
26
27
28
 
29
 
30
 
31
Mueller, F., Mohan, T., de Supinski, B. R., McKee, S. A., and Yoo, A. 2001. Partial data traces: Efficient generation and representation. In Workshop on Binary Translation. IEEE Technical Committee on Computer Architecture Newsletter.
 
32
Nevill-Manning, C. G. and Witten, I. H. 1997a. Compression and explanation using hierarchical grammars. Comput. J. 40, 2--3.
 
33
 
34
Seward, J. 2005. Libbzip2 source code. http://www.bzip.org/index.html.
35
36
 
37
Tendler, J. M., Dodson, J. S., Fields, Jr., J. S., Le, H., and Sinharoy, B. 2002. POWER4 system microarchitecture. IBM J. Res. Develop. 46, 1 (Jan.), 5--25.
 
38
Ung, D. and Cifuentes, C. 2000. Optimising hot paths in a dynamic binary translator. In Proceedings of the Workshop on Binary Translation.
 
39
 
40
Weikle, D., McKee, S. A., Skadron, K., and Wulf, W. 2000. Caches as filters: A framework for the analysis of caching systems. In Proceedings of the Grace Murray Hopper Conference.
41
42



REVIEW

"Olivier Louis Marie Lecarme : Reviewer"

A long time ago, computer hardware was designed in order to efficiently execute the code generated by compilers for higher-level programming languages. Now, computer hardware is designed in order to claim extraordinary performances, but the burden  more...

Collaborative Colleagues:
Jaydeep Marathe: colleagues
Frank Mueller: colleagues
Tushar Mohan: colleagues
Sally A. Mckee: colleagues
Bronis R. De Supinski: colleagues
Andy Yoo: colleagues