|
ABSTRACT
Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by hindering their ability to create parallel applications with repeatable results. As a consequence, parallel applications are significantly harder to debug, test, and maintain than sequential programs. This paper introduces Kendo: a new software-only system that provides deterministic multithreading of parallel applications. Kendo enforces a deterministic interleaving of lock acquisitions and specially declared non-protected reads through a novel dynamically load-balanced deterministic scheduling algorithm. The algorithm tracks the progress of each thread using performance counters to construct a deterministic logical time that is used to compute an interleaving of shared data accesses that is both deterministic and provides good load balancing. Kendo can run on today's commodity hardware while incurring only a modest performance cost. Experimental results on the SPLASH-2 applications yield a geometric mean overhead of only 16% when running on 4 processors. This low overhead makes it possible to benefit from Kendo even after an application is deployed. Programmers can start using Kendo today to program parallel applications that are easier to develop, debug, and test.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
R. Bocchino, V. Adve, S. Adve, and M. Snir. Parallel programming must be deterministic by default. Technical Report UIUCDCS-R-2008-3012, University of Illinois at Urbana-Champaign, 2008. URL http://dpj.cs.uiuc.edu/DPJ/Publications_files/paper.pdf.
|
 |
3
|
Guang-Ien Cheng , Mingdong Feng , Charles E. Leiserson , Keith H. Randall , Andrew F. Stark, Detecting data races in Cilk programs that use locks, Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, p.298-309, June 28-July 02, 1998, Puerto Vallarta, Mexico
[doi> 10.1145/277651.277696]
|
| |
4
|
Joe Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. Explicitly parallel programming with shared-memory is insane: At least make it deterministic! In proceedings of SHCMP 2008: Workshop on Software and Hardware Challenges of Manycore Platforms, Beijing, China, June 22, 2008.
|
| |
5
|
|
| |
6
|
Jorg Domaschka, Franz J. Hauck, Hans P. Reiser, and Rutiger Kapitza. Deterministic multithreading for java-based replicated objects. In proceedings of the International Conference on Parallel and Distributed Computing and Systems, 2006.
|
| |
7
|
Jorg Domaschka, Andreas I. Schmied, Hans P. Reiser, and Franz J. Hauck. Revisiting deterministic multithreading strategies. International Parallel and Distributed Processing Symposium, pages 1--8, 2007.
|
 |
8
|
George W. Dunlap , Dominic G. Lucchetti , Michael A. Fetterman , Peter M. Chen, Execution replay of multiprocessor virtual machines, Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, March 05-07, 2008, Seattle, WA, USA
[doi> 10.1145/1346256.1346273]
|
 |
9
|
Matteo Frigo , Charles E. Leiserson , Keith H. Randall, The implementation of the Cilk-5 multithreaded language, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.212-223, June 17-19, 1998, Montreal, Quebec, Canada
|
 |
10
|
|
| |
11
|
Wolfgang Karl, Markus Leberecht, and Michael Oberhuber. Forcing deterministic execution of parallel programs -- debugging support through the smile monitoring approach. In proceedings of the SCI-Europe, September 1998.
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
Shan Lu , Joseph Tucek , Feng Qin , Yuanyuan Zhou, AVIO: detecting atomicity violations via access interleaving invariants, Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, October 21-25, 2006, San Jose, California, USA
|
 |
17
|
|
 |
18
|
Shan Lu , Soyeon Park , Chongfeng Hu , Xiao Ma , Weihang Jiang , Zhenmin Li , Raluca A. Popa , Yuanyuan Zhou, MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs, Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, October 14-17, 2007, Stevenson, Washington, USA
|
 |
19
|
Shan Lu , Soyeon Park , Eunsoo Seo , Yuanyuan Zhou, Learning from mistakes: a comprehensive study on real world concurrency bug characteristics, Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, March 01-05, 2008, Seattle, WA, USA
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
| |
24
|
Debashis Saha and Sourav K. Dutta. Specification of deterministic execution timing schema for parallel programs on a multiprocessor. Computer, Communication, Control and Power Engineering, pages 114--116 vol.1, 1993.
|
| |
25
|
|
| |
26
|
|
 |
27
|
Joseph Tucek , Shan Lu , Chengdu Huang , Spiros Xanthos , Yuanyuan Zhou, Triage: diagnosing production run failures at the user's site, Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, October 14-17, 2007, Stevenson, Washington, USA
|
 |
28
|
|
 |
29
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
30
|
|
|