ACM Home Page
Please provide us with feedback. Feedback
Dynamic testing of flow graph based parallel applications
Full text PdfPdf (226 KB)
Source International Symposium on Software Testing and Analysis archive
Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debugging table of contents
Seattle, Washington
Article No. 2  
Year of Publication: 2008
ISBN:978-1-60558-052-4
Authors
Basile Schaeli  Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Roger D. Hersch  Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 60,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390841.1390843
What is a DOI?

ABSTRACT

In message-passing parallel applications, messages are not delivered in a strict order. The number of messages, their content and their destination may depend on the ordering of their delivery. Nevertheless, for most applications, the computation results should be the same for all possible orderings. Finding an ordering that produces a different outcome or that prevents the execution from terminating reveals a message race or a deadlock. Starting from the initial application state, we dynamically build an acyclic message-passing state graph such that each path within the graph represents one possible message ordering. All paths lead to the same final state if no deadlock or message race exists. If multiple final states are reached, we reveal message orderings that produce the different outcomes. The corresponding executions may then be replayed for debugging purposes. We reduce the number of states to be explored by using previously acquired knowledge about communication patterns and about how operations read and modify local process variables. We also describe a heuristic that tests a subset of orderings that are likely to reveal existing message races or deadlocks. We applied our approach on several applications developed using the Dynamic Parallel Schedules (DPS) parallelization framework. Compared to the naive execution of all message orderings, the use of a message-passing state graph reduces the cost of testing all orderings by several orders of magnitude. The use of prior information further reduces the number of visited states by a factor of up to fifty in our tests. The heuristic relying on a subset of orderings was able to reveal race conditions in all tested cases. We finally present a first step in generalizing the approach to MPI applications.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Bouteiller, G. Bosilca, J. Dongarra, Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science (LNCS), Vol. 4757, pp. 297--306, Springer Verlag, 2007, doi:10.1007/978-3-540-75416-9_41
 
2
F. Chen, G. Roşu, Parametric and Sliced Causality, Proceedings of the 19th Conference on Computer Aided Verification (CAV'07), Lecture Notes In Computer Science (LNCS), Vol. 4590, Springer, pp 240--253, 2007
 
3
4
 
5
 
6
 
7
G. H. Golub, C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, pp. 94--116, 1996
 
8
 
9
10
11
 
12
13
 
14
M. Otta, S. Racek, A method for testing and debugging distributed applications, Int'l Conference on Trends in Communications (EUROCON'2001), vol. 2, pp. 548--551, July 2001
 
15
16
 
17
J. F. Ruscio, M. A. Heffner, S. Varadarajan, DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems, Proc. 21st Int'l Parallel and Distributed Processing Symposium (IPDPS'07), pp. 1--8, Long Beach, USA, March 2007.
 
18
S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, J. Duell, P. Hargrove, E. Roman}, The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing, Int'l Journal of High Performance Computing Applications, Vol. 19, No. 4, pp. 479--493, 2005
 
19
B. Schaeli, S. Gerlach, R. D. Hersch, Decomposing Partial Order Execution Graphs to Improve Message Race Detection, Proc. 21st Int'l Parallel and Distributed Processing Symposium (IPDPS'07), pp. 1--8, Long Beach, USA, March 2007.
20
 
21
S. F. Siegel, G. S. Avrunin, Verification of halting properties for MPI programs using nonblocking operations, Proc. 14th European PVM/MPI Users' Group Conference, 2007.
22
 
23

Collaborative Colleagues:
Basile Schaeli: colleagues
Roger D. Hersch: colleagues