|
ABSTRACT
In message-passing parallel applications, messages are not delivered in a strict order. The number of messages, their content and their destination may depend on the ordering of their delivery. Nevertheless, for most applications, the computation results should be the same for all possible orderings. Finding an ordering that produces a different outcome or that prevents the execution from terminating reveals a message race or a deadlock. Starting from the initial application state, we dynamically build an acyclic message-passing state graph such that each path within the graph represents one possible message ordering. All paths lead to the same final state if no deadlock or message race exists. If multiple final states are reached, we reveal message orderings that produce the different outcomes. The corresponding executions may then be replayed for debugging purposes. We reduce the number of states to be explored by using previously acquired knowledge about communication patterns and about how operations read and modify local process variables. We also describe a heuristic that tests a subset of orderings that are likely to reveal existing message races or deadlocks. We applied our approach on several applications developed using the Dynamic Parallel Schedules (DPS) parallelization framework. Compared to the naive execution of all message orderings, the use of a message-passing state graph reduces the cost of testing all orderings by several orders of magnitude. The use of prior information further reduces the number of visited states by a factor of up to fifty in our tests. The heuristic relying on a subset of orderings was able to reveal race conditions in all tested cases. We finally present a first step in generalizing the approach to MPI applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Bouteiller, G. Bosilca, J. Dongarra, Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science (LNCS), Vol. 4757, pp. 297--306, Springer Verlag, 2007, doi:10.1007/978-3-540-75416-9_41
|
| |
2
|
F. Chen, G. Roşu, Parametric and Sliced Causality, Proceedings of the 19th Conference on Computer Aided Verification (CAV'07), Lecture Notes In Computer Science (LNCS), Vol. 4590, Springer, pp 240--253, 2007
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
G. H. Golub, C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, pp. 94--116, 1996
|
| |
8
|
|
| |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
Robert H. B. Netzer , Timothy W. Brennan , Suresh K. Damodaran-Kamal, Debugging race conditions in message-passing programs, Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, p.31-40, May 22-23, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/238020.238033]
|
| |
14
|
M. Otta, S. Racek, A method for testing and debugging distributed applications, Int'l Conference on Trends in Communications (EUROCON'2001), vol. 2, pp. 548--551, July 2001
|
| |
15
|
|
 |
16
|
|
| |
17
|
J. F. Ruscio, M. A. Heffner, S. Varadarajan, DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems, Proc. 21st Int'l Parallel and Distributed Processing Symposium (IPDPS'07), pp. 1--8, Long Beach, USA, March 2007.
|
| |
18
|
S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, E. Roman}, The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing, Int'l Journal of High Performance Computing Applications, Vol. 19, No. 4, pp. 479--493, 2005
|
| |
19
|
B. Schaeli, S. Gerlach, R. D. Hersch, Decomposing Partial Order Execution Graphs to Improve Message Race Detection, Proc. 21st Int'l Parallel and Distributed Processing Symposium (IPDPS'07), pp. 1--8, Long Beach, USA, March 2007.
|
 |
20
|
|
| |
21
|
S. F. Siegel, G. S. Avrunin, Verification of halting properties for MPI programs using nonblocking operations, Proc. 14th European PVM/MPI Users' Group Conference, 2007.
|
 |
22
|
Sarvani S. Vakkalanka , Subodh Sharma , Ganesh Gopalakrishnan , Robert M. Kirby, ISP: a tool for model checking MPI programs, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, February 20-23, 2008, Salt Lake City, UT, USA
[doi> 10.1145/1345206.1345258]
|
| |
23
|
|
|