|
ABSTRACT
While architecture simulation is often treated as a methodology issue, it is at the core of most processor architecture research works, and simulation speed is often the bottleneck of the typical trial-and-error research process. To speedup simulation during this research process and get trends faster, researchers usually reduce the trace size. More sophisticated techniques like trace sampling or distributed simulation are scarcely used because they are considered unreliable and complex due to their impact on accuracy and the associated warm-up issues.In this article, we present DiST, a practical distributed simulation scheme where, unlike in other simulation techniques that trade accuracy for speed, the user is relieved from most accuracy issues thanks to an automatic and dynamic mechanism for adjusting the warm-up interval size. Moreover, the mechanism is designed so as to always privilege accuracy over speedup. The speedup scales with the amount of available computing resources, bringing an average 7.35 speedup on 10 machines with an average IPC error of 1.81% and a maximum IPC error of 5.06%.Besides proposing a solution to the warm-up issues in distributed simulation, we experimentally show that our technique is significantly more accurate than trace size reduction or trace sampling for identical speedups. We also show that not only the error always remains small for IPC and other metrics, but that a researcher can reliably base research decisions on DiST simulation results. Finally, we explain how the DiST tool is designed to be easily pluggable into existing architecture simulators with very few modifications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Anderson, L. Berc, J. Dean, S. Ghemawat, M. Henzinger, S. Leung, D. Sites, M. Vandevoorde, C. Waldspurger, and W. Weihl. Continuous profiling: Where have all the cycles gone, July 1997.
|
| |
2
|
|
| |
3
|
D. Burger and T. Austin. The simplescalar tool set, version 2.0. Technical Report CS-TR-97-1342, Department of Computer Sciences, University of Wisconsin, June 1997.
|
| |
4
|
S. Chatterjee and S. Sen. Cache-efficient matrix transposition. In Sixth International Symposium on High-Performance Computer Architecture, pages 195--205, Toulouse, France, 2000.
|
| |
5
|
|
| |
6
|
Jeffrey Dean , James E. Hicks , Carl A. Waldspurger , William E. Weihl , George Chrysos, ProfileMe: hardware support for instruction-level profiling on out-of-order processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.292-302, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
7
|
|
| |
8
|
L. Eeckhout, K. DeBousschere, and H. Neefs. Performance analysis through synthetic trace generation. In Int. Symp. on Performance Analysis of Systems and Software, Liege, Belgium, April 2000.
|
| |
9
|
J. Haskins and K. Skadron. Minimal subset evaluation: Rapid warm-up for simulated hardware state. In Proc. of the 2001 International Conference on Computer Design, Austin, Texas, September 2001.
|
| |
10
|
V. S. Iyengar and L. H. Trevillyan. Evaluation and generation of reduced traces for benchmarks. Technical Report RC20610, IBM T. J. Watson, Oct 1996.
|
| |
11
|
A. KleinOsowski, J. Flynn, N. Meares, and D. Lilja. Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research. In Proceedings of the Third IEEE Annual Workshop on Workload Characterization, International Conference on Computer Design (ICCD),, pages 73--82, September 2000.
|
| |
12
|
|
| |
13
|
M. J. Litzkow, M. Livny, and M. W. Mutka. Condor - a hunter of idle workstations. In Proc. of the 8th Intl. Conf. on Distributed Computing Systems, pages 104--111, San Jose, Calif., June 1988.
|
 |
14
|
Margaret Martonosi , Anoop Gupta , Thomas Anderson, Effectiveness of trace sampling for performance debugging tools, Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.248-259, May 10-14, 1993, Santa Clara, California, United States
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
Synopsys. SystemC. http://www.systemc.org, 2000-2002.
|
| |
23
|
|
| |
24
|
Z. Wang, K. Pierce, and S. McFarling. BMAT --- a binary matching tool for stale profile propagation. Journal of Instruction-Level Parallelism, 2(1--6), 2000.
|
CITED BY 9
|
|
|
|
|
Kaushal Sanghai , Ting Su , Jennifer Dy , David Kaeli, A multinomial clustering model for fast simulation of computer architecture designs, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
Joshua J. Yi , Lieven Eeckhout , David J. Lilja , Brad Calder , Lizy K. John , James E. Smith, The Future of Simulation: A Field of Dreams, Computer, v.39 n.11, p.22-29, November 2006
|
|
|
|
|
|
|
|
|
|
|