|
ABSTRACT
Due to the amount of time required to design a new processor, one set of benchmark programs may be used during the design phase while another may be the standard when the design is finally delivered. Using one benchmark suite to design a processor while using a different, presumably more current, suite to evaluate its ultimate performance may lead to sub-optimal design decisions if there are large differences between the characteristics of the two suites and their respective compilers. We call this changes across time "drift". To evaluate the impact of using yesterday's benchmark and compiler technology to design tomorrow's processors, we compare common benchmarks from the SPEC 95 and SPEC 2000 benchmark suites. Our results yield three key conclusions. First, we show that the amount of drift, for common programs in successive SPEC benchmark suites, is significant. In SPEC 2000, the main memory access time is a far more significant performance bottleneck than in SPEC 95, while less significant SPEC 2000 performance bottlenecks include the L2 cache latency, the L1 I-cache size, and the number of reorder buffer entries. Second, using two different statistical techniques, we show that compiler drift is not as significant as benchmark drift. Third, we show that benchmark and compiler drift can have a significant impact on the final design decisions. Specifically, we use a one-parameter-at-a-time optimization algorithm to design two different year-2000 processors, one optimized for SPEC 95 and the other optimized for SPEC 2000, using the energy-delay product (EDP) as the optimization criterion. The results show that using SPEC 95 to design a year-2000 processor results in an 18.5% larger EDP and a 20.8% higher CPI than using the SPEC 2000 benchmarks to design the corresponding processor. Finally, we make a few recommendations to help computer architects minimize the effects of benchmark and compiler drift.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bell Jr., R. and John, L., "The Case for Automatic Synthesis of Miniature Benchmarks," In Proceedings of the Workshop on Modeling, Benchmarking, and Simulation (MoBS '05) (Madison, WI, USA, June 4--8, 2005), 88--97.
|
 |
2
|
|
 |
3
|
|
| |
4
|
David M. Brooks , Pradip Bose , Stanley E. Schuster , Hans Jacobson , Prabhakar N. Kudva , Alper Buyuktosunoglu , John-David Wellman , Victor Zyuban , Manish Gupta , Peter W. Cook, Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors, IEEE Micro, v.20 n.6, p.26-44, November 2000
[doi> 10.1109/40.888701]
|
 |
5
|
|
| |
6
|
Calder, B., Grunwald, D., and Zorn, B., "Quantifying Behavioral Differences Between C and C++ Programs," Journal of Programming Languages, 2, 4, (1994), 313--351.
|
| |
7
|
Carlton, A., "Lessons Learned from 072.sc", SPEC Newsletter, (Mar. 1995).
|
| |
8
|
|
 |
9
|
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Henning, J., "SPEC CPU2000 Memory Footprint," http://www.spec.org/cpu2000/analysis/memory
|
| |
15
|
Lilja, D., "Measuring Computer Performance," Cambridge University Press, New York, NY, 2000.
|
| |
16
|
Phansalkar, A., Joshi, A., Eeckhout, L., and John, L., "Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites," Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS '05) (Austin, TX, March 20--22, 2005), 10--20.
|
| |
17
|
Plackett, R. and Burman, J. "The Design of Optimum Multifactorial Experiments," Biometrika, 33, 4, (June 1946), 305--325.
|
 |
18
|
|
| |
19
|
|
| |
20
|
Kevin Skadron , Margaret Martonosi , David I. August , Mark D. Hill , David J. Lilja , Vijay S. Pai, Challenges in Computer Architecture Evaluation, Computer, v.36 n.8, p.30-36, August 2003
[doi> 10.1109/MC.2003.1220579]
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
Weicker, R., "An Example of Benchmark Obsolescence: 023.eqntott," SPEC Newsletter, (Dec. 1995).
|
 |
25
|
|
| |
26
|
|
| |
27
|
|
CITED BY
|
|
Stephen M. Blackburn , Kathryn S. McKinley , Robin Garner , Chris Hoffmann , Asjad M. Khan , Rotem Bentzur , Amer Diwan , Daniel Feinberg , Daniel Frampton , Samuel Z. Guyer , Martin Hirzel , Antony Hosking , Maria Jump , Han Lee , J. Eliot B. Moss , Aashish Phansalkar , Darko Stefanovik , Thomas VanDrunen , Daniel von Dincklage , Ben Wiedermann, Wake up and smell the coffee: evaluation methodology for the 21st century, Communications of the ACM, v.51 n.8, August 2008
|
|