|
ABSTRACT
Simulating chip-multiprocessor systems (CMP) can take a long time. For single-threaded workloads, earlier work has shown the utility of phase analysis, that is identification of repetitive program behaviors, in reducing overall simulation time while maintaining an acceptable loss in accuracy. To cope with multithreaded workloads, a combination of phases from all executing threads must be taken into consideration since inter-thread interference may distort the homogeneity of each phases' true performance. Unfortunately, phase analysis does not work for multithreaded (MT) workloads because the possible phase combinations in an inherently nondeterministic execution model grows exponentially with the number of threads. To this end, we propose a new technique to reduce the number of simulation samples by synthesizing samples from similar phase combinations. We present a simple cost function for measuring the similarity between phase combinations and by using the individual thread samples from the similar phase combinations, a new sample can be constructed. This cost function provides a convenient control knob for exploiting tradeoffs between simulation speed and accuracy. Our experimental results show that in most cases, properly setting the cost function's threshold can yield a reduction in sampling by 90%, while maintaining error to less than 5%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
M. V. Biesbrouck, L. Eeckhout, and B. Calder. Considering all starting points for simultaneous multithreading simulation. In IEEE International Symposium on Performance Analysis of Systems and Software, March 2006.
|
| |
4
|
|
| |
5
|
H. Jin, M. Frumkin, and J. Yan. The openmp implementation of nas parallel benchmarks and its performance. In NAS Technical Report NAS-99-011, October 1999.
|
| |
6
|
|
| |
7
|
J. Lau, E. Perelman, G. Hamerly, T. Sherwood, and B. Calder. Motivation for variable length intervals and hierarchical pahse behavior. In IEEE International Symposium on Performance Analysis of Systems and Software, March 2005.
|
| |
8
|
V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics Doklady, February 1966.
|
| |
9
|
Harish Patil , Robert Cohn , Mark Charney , Rajiv Kapoor , Andrew Sun , Anand Karunanidhi, Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation, Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, p.81-92, December 04-08, 2004, Portland, Oregon
[doi> 10.1109/MICRO.2004.28]
|
 |
10
|
Erez Perelman , Greg Hamerly , Michael Van Biesbrouck , Timothy Sherwood , Brad Calder, Using SimPoint for accurate and efficient simulation, Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 11-14, 2003, San Diego, CA, USA
|
| |
11
|
E. Perelman, M. Polito, J.-Y. Bouguet, J. Sampson, B. Calder, and C. Dulong. Detecting phases in parallel applications on shared memory architectures. In IEEE International Parallel and Distributed Processing Symposium, April 2006.
|
| |
12
|
T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. Discovering and exploiting program phases. In IEEE Micro, December 2003.
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
CITED BY 2
|
|
Lei Gao , Kingshuk Karuri , Stefan Kraemer , Rainer Leupers , Gerd Ascheid , Heinrich Meyr, Multiprocessor performance estimation using hybrid simulation, Proceedings of the 45th annual conference on Design automation, June 08-13, 2008, Anaheim, California
|
|
|
Melhem Tawk , Khaled Z. Ibrahim , Smail Niar, Multi-granularity sampling for simulating concurrent heterogeneous applications, Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, October 19-24, 2008, Atlanta, GA, USA
|
|