|
ABSTRACT
Benchmarks set standards for innovation in computer architecture research and industry product development. Consequently, it is of paramount importance that these workloads are representative of real-world applications. However, composing such representative workloads poses practical challenges to application analysis teams and benchmark developers (1) real-world workloads are intellectual property and vendors hesitate to share these proprietary applications; and (2) porting and reducing these applications to benchmarks that can be simulated in a tractable amount of time is a nontrivial task. In this paper, we address this problem by proposing a technique that automatically distills key inherent behavioral attributes of a proprietary workload and captures them into a miniature synthetic benchmark clone. The advantage of the benchmark clone is that it hides the functional meaning of the code but exhibits similar performance characteristics as the target application. Moreover, the dynamic instruction count of the synthetic benchmark clone is substantially shorter than the proprietary application, greatly reducing overall simulation time for SPEC CPU, the simulation time reduction is over five orders of magnitude compared to entire benchmark execution. Using a set of benchmarks representative of general-purpose, scientific, and embedded applications, we demonstrate that the power and performance characteristics of the synthetic benchmark clone correlate well with those of the original application across a wide range of microarchitecture configurations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Bell, R. and John, L. 2006. Efficient power analysis using synthetic testcases. In Proceedings of the IEEE International Symposium on Workload Characterization. 110--118.
|
 |
3
|
|
| |
4
|
Burger, D. and Austin, T. 1997. The SimpleScalar toolset, version 2.0, University of Wisconsin-Madison, Computer Sciences Department Tech. Rep. #1342.
|
| |
5
|
|
 |
6
|
Peter M. Chen , David A. Patterson, A new approach to I/O performance evaluation: self-scaling I/O benchmarks, predicted I/O performance, Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.1-12, May 10-14, 1993, Santa Clara, California, United States
|
| |
7
|
Coleman, C. 1998. Using Inline Assembly with Gcc. See www.cs.virginia.edu/~clc5q/gcc-inline-asm.pdf.
|
 |
8
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
| |
9
|
Conte, T. and Hwu, W.-M. 1990. Benchmark characterization for experimental system evaluation. In Proceedings of the 1990 Hawaii International Conference on System Sciences (HICSS). Architecture Track, vol. I. 6--16.
|
| |
10
|
|
| |
11
|
Curnow, H. and Wichman, B. 1976. A synthetic benchmark. Comput. J. 19, 1, 43--49.
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
 |
15
|
Lieven Eeckhout , Robert H. Bell Jr. , Bastiaan Stougie , Koen De Bosschere , Lizy K. John, Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies, Proceedings of the 31st annual international symposium on Computer architecture, p.350, June 19-23, 2004, München, Germany
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
Haungs, M., Sallee, P., and Farrens, M. 2000. Branch transition rate: A new metric for improved branch classification analysis. In Proceedings of the International Symposium on High Performance Computer Architecture. 241--250.
|
| |
20
|
|
| |
21
|
Hsieh, C. and Pedram, M. 1998. Microprocessor power estimation using profile-driven program synthesis. IEEE Trans. Comput. Aided Design Integrated Circ. Sys. 17, 11, 1080--1089.
|
| |
22
|
Iyengar, V. and Trevillyan, L. 1996. Evaluation and generation of reduced traces for benchmarks. Tech. Rep. RC20610. IBM Research Division. T. J. Watson Research Center.
|
| |
23
|
|
| |
24
|
Joshi, A., Eeckhout, L., Bell, R., and John, L. 2006a. Performance cloning: A technique for disseminating proprietary applications as benchmarks. In Proceedings of the IEEE International Symposium on Workload Characterization. 105--115.
|
| |
25
|
Joshi, A., Yi, J., Bell, R., Eeckhout, L., John, L., and Lilja, D. 2006b. Evaluating the efficacy of statistical simulation for design space exploration. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 70--79.
|
| |
26
|
|
 |
27
|
Chi-Keung Luk , Robert Cohn , Robert Muth , Harish Patil , Artur Klauser , Geoff Lowney , Steven Wallace , Vijay Janapa Reddi , Kim Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
 |
28
|
|
| |
29
|
|
 |
30
|
|
| |
31
|
|
 |
32
|
|
 |
33
|
|
| |
34
|
SimPoint Website. http://www-cse.ucsd.edu/~calder/simpoint/
|
| |
35
|
Kevin Skadron , Margaret Martonosi , David I. August , Mark D. Hill , David J. Lilja , Vijay S. Pai, Challenges in Computer Architecture Evaluation, Computer, v.36 n.8, p.30-36, August 2003
[doi> 10.1109/MC.2003.1220579]
|
| |
36
|
Sorenson, E. and Flanagan, J. 2002. Evaluating synthetic trace models using locality surfaces. In Proceedings of the IEEE International Workshop on Workload Characterization. 23--33.
|
 |
37
|
|
| |
38
|
Srivastava, A. and Eustace, A. 1994. ATOM: A system for building customized program analysis tools. Tech. Rep. 94/2, Western Research Lab, Compaq (Mar.).
|
| |
39
|
Artour Stoutchinin , José N. Amaral , Guang R. Gao , James C. Dehnert , Suneel Jain , Alban Douillet, Speculative Prefetching of Induction Pointers, Proceedings of the 10th International Conference on Compiler Construction, p.289-303, April 02-06, 2001
|
| |
40
|
|
 |
41
|
|
 |
42
|
|
| |
43
|
|
 |
44
|
|
|