| The implications of working set analysis on supercomputing memory hierarchy design |
| Full text |
Pdf
(698 KB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 19th annual international conference on Supercomputing
table of contents
Cambridge, Massachusetts
SESSION: Session 9: operating systems
table of contents
Pages: 332 - 340
Year of Publication: 2005
ISBN:1-59593-167-8
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 44, Citation Count: 2
|
|
|
ABSTRACT
Supercomputer architects strive to maximize the performance of scientific applications. Unfortunately, the large, unwieldy nature of most scientific applications has lead to the creation of artificial benchmarks, such as SPEC-FP, for architecture research. Given the impact that these benchmarks have on architecture research, this paper seeks an understanding of how they relate to real-world applications within the Department of Energy. Since the memory system has been found to be a particularly key issue for many applications, the focus of the paper is on the relationship between how the SPEC-FP benchmarks and DOE applications use the memory system. The results indicate that while the SPEC-FP suite is a well balanced suite, supercomputing applications typically demand more from the memory system and must perform more "other work" (in the form of integer computations) along with the floating point operations. The SPEC-FP suite generally demonstrates slightly more temporal locality leading to somewhat lower bandwidth demands. The most striking result is the cumulative difference between the benchmarks and the applications in terms of the requirements to sustain the floating-point operation rate: the DOE applications require significantly more data from main memory (not cache) per FLOP and dramatically more integer instructions per FLOP.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ASC Purple benchmark codes, July 2004. http://www.llnl.gov/asci/purple/benchmarks/limited/code_list.html.
|
| |
2
|
SPEC website, July 2004. http://www.spec.org.
|
| |
3
|
Apple Architecture Performance Groups. Computer Hardware Understanding Development Tools 2.0 Reference Guide for MacOS X. Apple Computer Inc, July 2002.
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
Kimberly Keeton , David A. Patterson , Yong Qiang He , Roger C. Raphael , Walter E. Baker, Performance characterization of a Quad Pentium Pro SMP using OLTP workloads, Proceedings of the 25th annual international symposium on Computer architecture, p.15-26, June 27-July 02, 1998, Barcelona, Spain
|
 |
15
|
Dennis C. Lee , Patrick J. Crowley , Jean-Loup Baer , Thomas E. Anderson , Brian N. Bershad, Execution characteristics of desktop applications on Windows NT, Proceedings of the 25th annual international symposium on Computer architecture, p.27-38, June 27-July 02, 1998, Barcelona, Spain
|
 |
16
|
Jack L. Lo , Luiz André Barroso , Susan J. Eggers , Kourosh Gharachorloo , Henry M. Levy , Sujay S. Parekh, An analysis of database workload performance on simultaneous multithreaded processors, Proceedings of the 25th annual international symposium on Computer architecture, p.39-50, June 27-July 02, 1998, Barcelona, Spain
|
 |
17
|
Ann Marie Grizzaffi Maynard , Colette M. Donnelly , Bret R. Olszewski, Contrasting characteristics and cache performance of technical and multi-user commercial workloads, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.145-156, October 05-07, 1994, San Jose, California, United States
|
| |
18
|
McCalpin, John D. Stream: Sustainable memory bandwidth in high performance computers, 1997.
|
 |
19
|
|
| |
20
|
SPEC Open Systems Steering Committee. Spec cpu 2000 run and reporting rules (revised). March 15, 2001.
|
| |
21
|
|
| |
22
|
Steven J. Plimpton. Lammps web page, July 2004. http://www.cs.sandia.gov/ sjplimp/lammps.html.
|
| |
23
|
Steven J. Plimpton, R. Pollock, and M, Stevens. Particle-mesh ewald and rRESPA for parallel molecular dynamics. In Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, Minneapolis, MN, March 1997.
|
 |
24
|
|
 |
25
|
|
 |
26
|
Edward Rothberg , Jaswinder Pal Singh , Anoop Gupta, Working sets, cache sizes, and node granularity issues for large-scale multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.14-26, May 16-19, 1993, San Diego, California, United States
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
 |
30
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
CITED BY 2
|
|
Oreste Villa , Gianluca Palermo , Cristina Silvano, Efficiency and scalability of barrier synchronization on NoC based many-core architectures, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|
|
|
|