| Scientific applications vs. SPEC-FP: a comparison of program behavior |
| Full text |
Pdf
(619 KB)
|
| Source
|
International Conference on Supercomputing
archive
Proceedings of the 20th annual international conference on Supercomputing
table of contents
Cairns, Queensland, Australia
SESSION: Benchmarking and modeling
table of contents
Pages: 66 - 74
Year of Publication: 2006
ISBN:1-59593-282-8
|
|
Authors
|
|
Kyle Rupnow
|
Univ. of Wisconsin, Madison, WI and Sandia National Labs, Albuquerque, NM
|
|
Arun Rodrigues
|
Univ. of Notre Dame, Notre Dame, IN
|
|
Keith Underwood
|
Sandia National Labs, Albuquerque, NM
|
|
Katherine Compton
|
Univ. of Wisconsin, Madison, WI
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 48, Citation Count: 1
|
|
|
ABSTRACT
Many modern scientific applications execute on massively parallel collections of microprocessors. Supercomputers such as the Cray XT3 (Red Storm) and Blue Gene/L support thousands to tens of thousands of processors per parallel job. However, individual microprocessor performance remains a critical component of overall performance. Traditional approaches to improve scientific application performance concentrate on floating-point (FP) instructions; however, our studies show that in the scientific applications used at Sandia National Labs, integer instructions constitute a large and critical part of the instruction mix. Although the SPEC-FP benchmark suite is considered representative of FP workloads, it has a much smaller proportion of integer computation instructions than the Sandia scientific applications, with 22.9% as compared to 36.9%. Integer instructions in Sandia applications also behave differently than in SPEC-FP. Integer instruction outputs are reused 8.8x to 13.1x more often in SPEC-FP benchmarks, and integer dataflow in Sandia applications is more complex than in the SPEC-FP suite. In this work, we examine common dataflow and usage patterns of integer instructions---information essential to develop hardware techniques to accelerate critical scientific applications. We present statistics for SPEC-FP and Sandia applications, summarizing integer computation usage and the size, shape and interface (number of inputs/outputs) of dataflow graphs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Apple Architecture Performance Groups, Computer Hardware Understanding Development Tools 2.0 Reference Guide for MacOS X, Apple Computer Inc, 2002.
|
 |
2
|
|
| |
3
|
D. Boggs, A. Baktha, J. Hawkins, D. T. Marr, J. A. Miller, P. Roussel, R. Singhal, B. Toll and K. S. Venkatraman, "The Microarchitecture of the Intel Pentium 4 Processor on 90nm Technology," Intel Technology Journal, vol. 8, pp. 1--20, February 2004.
|
| |
4
|
|
| |
5
|
D. C. Burger and T. M. Austin, "The Simplescalar tool set, version 2.0," University of Wisconsin, Madison., Madison, WI, Tech. Rep. CS-TR-97-1342, 1997.
|
| |
6
|
|
 |
7
|
|
| |
8
|
N. Clark, J. Blome, M. Chu, S. Mahlke, S. Biles and K. Flautner, "An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors," ISCA, vol. 1, pp. 0--12, ISCA 2005.
|
| |
9
|
A. Gara, "Blue Gene/L Architecture," Supercomputer Best Practices Symposium, vol. 2005, pp. 1--2, May 11--12, 2005.
|
 |
10
|
|
| |
11
|
M. R. Guthaus , J. S. Ringenberg , D. Ernst , T. M. Austin , T. Mudge , R. B. Brown, MiBench: A free, commercially representative embedded benchmark suite, Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop on, p.3-14, December 02-02, 2001
[doi> 10.1109/WWC.2001.15]
|
| |
12
|
|
 |
13
|
|
 |
14
|
Michael Huang , Jose Renau , Seung-Moon Yoo , Josep Torrellas, L1 data cache decomposition for energy efficiency, Proceedings of the 2001 international symposium on Low power electronics and design, p.10-15, August 2001, Huntington Beach, California, United States
[doi> 10.1145/383082.383086]
|
 |
15
|
Sorin Iacobovici , Lawrence Spracklen , Sudarshan Kadambi , Yuan Chou , Santosh G. Abraham, Effective stream-based and execution-based data prefetching, Proceedings of the 18th annual international conference on Supercomputing, June 26-July 01, 2004, Malo, France
[doi> 10.1145/1006209.1006211]
|
 |
16
|
Koji Inoue , Tohru Ishihara , Kazuaki Murakami, Way-predicting set-associative cache for high performance and low energy consumption, Proceedings of the 1999 international symposium on Low power electronics and design, p.273-275, August 16-17, 1999, San Diego, California, United States
[doi> 10.1145/313817.313948]
|
| |
17
|
|
| |
18
|
|
| |
19
|
Chunho Lee , Miodrag Potkonjak , William H. Mangione-Smith, MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.330-335, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
20
|
Nadeem Malik , Richard J. Eickemeyer , Stamatis Vassiliadis, Interlock collapsing ALU for increased instruction-level parallelism, Proceedings of the 25th annual international symposium on Microarchitecture, p.149-157, December 01-04, 1992, Portland, Oregon, United States
|
| |
21
|
|
 |
22
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
23
|
|
| |
24
|
SPEC, "SPECJBB2000," vol. 2005, pp. 1, 2005.
|
| |
25
|
SPEC, "SpecJVM'98 Benchmarks," vol. 2005, pp. 1, 2001.
|
 |
26
|
|
| |
27
|
|
| |
28
|
Top500.org, "Top 500 Supercomputer Sites," vol. 2005, pp. 1, June, 2005.
|
| |
29
|
|
| |
30
|
Transaction Processing Council, "Transaction Processing Council (Benchmark Standard Specification)," vol. 2005, pp. 1, 2005.
|
| |
31
|
|
|