ACM Home Page
Please provide us with feedback. Feedback
Accurate memory signatures and synthetic address traces for HPC applications
Full text PdfPdf (211 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 22nd annual international conference on Supercomputing table of contents
Island of Kos, Greece
SESSION: Performance evaluation 1 table of contents
Pages 36-45  
Year of Publication: 2008
ISBN:978-1-60558-158-3
Authors
Jonathan Weinberg  University of California, San Diego, San Diego, CA, USA
Allan Edward Snavely  San Diego Supercomputer Center, San Diego, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 96,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1375527.1375536
What is a DOI?

ABSTRACT

Though the performance of many scientific codes is dominated by memory behavior, our ability to describe, capture, compare, and recreate that behavior is quite limited. This inability underlies much of the complexity in the field of performance analysis: it is fundamentally difficult to relate benchmarks and applications or use realistic workloads to guide system design and procurement. An observable, reproducible, and machine-independent memory characterization is needed.

The Chameleon framework is a software suite that includes tools to capture a concise, machine-independent memory signature from any application and produce synthetic memory address traces that mimic that signature. By simultaneously modeling both spatial and temporal locality, Chameleon produces uniquely accurate, general-purpose synthetic traces. Our results demonstrate that the cache hit rates generated by each synthetic trace are nearly identical to those of the application it targets on dozens of memory hierarchies representing many of today's commercial offerings.

We apply the framework to high-performance computing (HPC) by leveraging sampling techniques to capture the memory signatures of full-scale, parallel applications with only a 5x slowdown. The overall result is therefore a concise, observable, and machine-independent representation of the memory requirements of full-scale applications that can be tractably captured and accurately mimicked.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
High Performance Computing Modernization Program: http://www.hpcmo.hpc.mil/.
 
2
 
3
RandomAccess benchmark: http://icl.cs.utk.edu/projectsfiles/hpcc/RandomAccess/.
 
4
Spec benchmarks: http://www.spec.org/.
 
5
Stream benchmark: http://www.cs.virginia.edu/stream/.
6
7
8
9
 
10
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.
 
11
12
 
13
K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of PDCS'01, pages 617--662, August 2001.
 
14
M. Brehob and R. Enbody. An analytical model of locality and caching. Technical Report MSU-CSE-99-31, Michigan State University, September 1999.
 
15
 
16
 
17
L. Carrington, N. Wolter, A. Snavely, and C. B. Lee. Applying an Automated Framework to Produce Accurate Blind Performance Predictions of Full-Scale HPC Applications. In Proceedings of the 2004 Department of Defense Users Group Conference. IEEE Computer Society Press, 2004.
 
18
R. Cheng and C. Ding. Measuring temporal locality variation across program inputs. Technical Report TR 875, University of Rochester. Computer Science Department., 2005.
 
19
Conte and Hwu. Benchmark characterization for experimental system evaluation. In Proceedings of the Twenty-Third Annual Hawaii International Conference on System Sciences, volume 1, pages 6--18, January 1990.
20
21
 
22
X. Gao, M. Laurenzano, B. Simon, and A. Snavely. Reducing overheads for acquiring dynamic traces. In International Symposium on Workload Characterization, 2005.
 
23
X. Gao, A. Snavely, and L. Carter. Path grammar guided trace compression and trace approximation. In HPDC'06: Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, Paris, France, June 2006.
 
24
 
25
 
26
R. Hassan, A. Harris, N. Topham, and A. Efthymiou. A hybrid markov model for accurate memory reference generation. In Proceedings of the IAENG International Conference on Computer Science. IAENG, 2007.
 
27
28
29
 
30
 
31
S. Liu and J. Chen. The effect of product gas enrichment on the chemical response of premixed diluted methane/air flames. In Proceedings of the Third Joint Meeting of the U.S. Sections of the Combustion Institute, Chicago, Illinois, March 16-19 2003.
32
 
33
P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Baily, and D. Takahashi. Introduction to the HPC Challenge Benchmark Suite, April 2005. Paper LBNL-57493.
34
 
35
M. Mathis and D. J. Kerbyson. Performance modeling of mcnp on large-scale systems. In Proceedings of the LACSI Symposium, Los Alamos, NM, 2002. Los Alamos Computer Institute.
 
36
R. Mattson, J. Gecsei, D. Slutz, and I. Traiger. Evaluation Techniques for Storage Hierarchies. IBM System Journal, 9(2):78--117, 1970.
 
37
X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.
 
38
 
39
M. Snir and J. Yu. On the theory of spatial and temporal locality. Technical Report UIUCDCS-R-2005-2611, July 2005.
 
40
E. S. Sorenson. Using locality to predict cache performance. Master's thesis, Brigham Young University, 2001.
 
41
 
42
 
43
E. S. Sorenson and J. K. Flanagan. Evaluating synthetic trace models using locality surfaces. In Proceedings of the Fifth IEEE Annual Workshop on Workload Characterization, pages 23--33, November 2002.
 
44
 
45
 
46
 
47
 
48
 
49
50
 
51
M. Tikir, M. Laurenzano, L. Carrington, and A. Snavely. The PMaC binary instrumentation library for PowerPC. In Workshop on Binary Instrumentation and Applications, 2006.
 
52
 
53
J. Weinberg and A. Snavely. Symbiotic space-sharing on sdsc's datastar system. In The 12th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP '06), St. Malo, France, June 2006.
 
54
J. Weinberg and A. Snavely. Chameleon: A framework for observing, understanding, and imitating the memory behavior of applications. In PARA08: Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May, 2008.
55
 
56
 
57
Y. Zhong, C. Ding, and K. Kennedy. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, Washington DC, March 2002.
 
58

Collaborative Colleagues:
Jonathan Weinberg: colleagues
Allan Edward Snavely: colleagues