|
ABSTRACT
Performance tools based on hardware counters can efficiently profile the cache behavior of an application and help software developers improve its cache utilization. Simulator-based tools can potentially provide more insights and flexibility and model many different cache configurations, but have the drawback of large run-time overhead.We present StatCache, a performance tool based on a statistical cache model. It has a small run-time overhead while providing much of the flexibility of simulator-based tools. A monitor process running in the background collects sparse memory access statistics about the analyzed application running natively on a host computer. Generic locality information is derived and presented in a code-centric and/or data-centric view.We evaluate the accuracy and performance of the tool using ten SPEC CPU2000 benchmarks. We also exemplify how the flexibility of the tool can be used to better understand the characteristics of cache-related performance problems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Jennifer M. Anderson , Lance M. Berc , Jeffrey Dean , Sanjay Ghemawat , Monika R. Henzinger , Shun-Tak A. Leung , Richard L. Sites , Mark T. Vandevoorde , Carl A. Waldspurger , William E. Weihl, Continuous profiling: where have all the cycles gone?, ACM Transactions on Computer Systems (TOCS), v.15 n.4, p.357-390, Nov. 1997
[doi> 10.1145/265924.265925]
|
| |
2
|
|
| |
3
|
E. Berg and E. Hagersten. StatCache: A probabilistic approach to efficient and accurate data locality analysis. Technical report 2003-57, Department of information technology, Uppsala University, Sweden, 2003.
|
| |
4
|
E. Berg and E. Hagersten. StatCache: A probabilistic approach to efficient and accurate data locality analysis. In Proceedings of International Symposium on Performance Analysis of Systems And Software, 2004.
|
| |
5
|
K. Beyls, E. D'Hollander, and Y. Yu. Visualization enables the programmer to reduce cache misses. In Proceedings of Conference on Parallel and Distributed Computing and Systems, 2002.
|
| |
6
|
S. Browne , J. Dongarra , N. Garner , K. London , P. Mucci, A scalable cross-platform infrastructure for application performance tuning using hardware counters, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.42-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
Intel Corporation. Intel VTune Performance Analyzers http://www.intel.com/software/products/vtune/.
|
| |
13
|
Luiz DeRose , K. Ekanadham , Jeffrey K. Hollingsworth , Simone Sbaraglia, SIGMA: a simulator infrastructure to guide memory analysis, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-13, November 16, 2002, Baltimore, Maryland
|
| |
14
|
A. Eustace and A. Srivastava. ATOM: A flexible interface for building high performance program analysis tools. In USENIX Winter, pages 303--314, 1995.
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
J. Maebe, M. Ronsse, and K. De Bosschere. DIOTA: Dynamic instrumentation, optimization and transformation of applications. In Compendium of Workshops and Tutorials. Held in conjunction with International Conference on Parallel Architectures and Compilation Techniques., September 2002.
|
| |
24
|
P. Magnusson, F. Larsson, A. Moestedt, B. Werner, F. Dahlgren, M. Karlsson, F. Lundholm, J. Nilsson, P. Stenström, and H. Grahn. SimICS/sun4m: A virtual workstation. In Proceedings of the Usenix Annual Technical Conference, pages 119--130, 1998.
|
 |
25
|
|
 |
26
|
Margaret Martonosi , Anoop Gupta , Thomas Anderson, MemSpy: analyzing memory system bottlenecks in programs, Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.1-12, June 01-05, 1992, Newport, Rhode Island, United States
|
| |
27
|
|
| |
28
|
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, 1970.
|
| |
29
|
Tushar Mohan , Bronis R. de Supinski , Sally A. McKee , Frank Mueller , Andy Yoo , Martin Schulz, Identifying and Exploiting Spatial Regularity in Data Memory References, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.49, November 15-21, 2003
|
| |
30
|
|
 |
31
|
Erez Perelman , Greg Hamerly , Michael Van Biesbrouck , Timothy Sherwood , Brad Calder, Using SimPoint for accurate and efficient simulation, Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 11-14, 2003, San Diego, CA, USA
|
| |
32
|
|
| |
33
|
SPEC. Standard performance evaluation corporation http://www.spec.org/.
|
 |
34
|
Richard Uhlig , David Nagle , Trevor Mudge , Stuart Sechrest, Trap-driven simulation with Tapeworm II, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.132-144, October 05-07, 1994, San Jose, California, United States
|
| |
35
|
|
 |
36
|
|
 |
37
|
|
| |
38
|
|
CITED BY 8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiaoming Gu , Ian Christopher , Tongxin Bai , Chengliang Zhang , Chen Ding, A component model of spatial locality, Proceedings of the 2009 international symposium on Memory management, June 19-20, 2009, Dublin, Ireland
|
|
|
|
|
|
|
|