| Automated, scalable debugging of MPI programs with Intel® Message Checker |
| Full text |
Pdf
(240 KB)
|
| Source
|
International Conference on Software Engineering
archive
Proceedings of the second international workshop on Software engineering for high performance computing system applications
table of contents
St. Louis, Missouri
WORKSHOP SESSION: Verification
table of contents
Pages: 78 - 82
Year of Publication: 2005
ISBN:1-59593-117-1
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 55, Citation Count: 6
|
|
|
ABSTRACT
The trend towards many-core multi-processor systems and clusters will make systems with tens and hundreds of processors more widely available. Current manual debugging techniques do not scale well to such large systems. Advanced automated debugging tools are needed for standard programming models based on commodity computing, such as threads and MPI. We surveyed MPI users to identify the kinds of MPI errors that they encounter, and classify the errors into several types. We describe how automated tools can detect such errors and present the Intel® Message Checker (IMC) technology being developed at the Intel Advanced Computing Center. IMC's unique technology automatically detects several kinds of MPI errors such as various types of mismatches, race conditions, deadlocks and potential deadlocks, and resource misuse. Finally, we review the usability and uniqueness of IMC and discuss our future plans.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Bettina Krammer, Matthias S. Müller, Michael M. Resch. MPI I/O Analysis and Error Detection with MARMOT. Proceedings of EuroPVM/MPI 2004, Budapest, Hungary, September 19-22, 2004. Published in Lecture Notes in Computer Science Vol. 3241, pp. 242--250, Springer, 2004.
|
| |
4
|
Luecke, G., Chen, H., Coyle, J., Hoekstra, J., Kraeva, M., and Zou, Y. MPI-CHECK: a Tool for Checking Fortran 90 MPI Programs. Concurrency and Computation: Practice and Experience. 2003, vol. 15, pp 93--100.
|
CITED BY 6
|
|
Richard Vuduc , Martin Schulz , Dan Quinlan , Bronis de Supinski , Andreas Sæbjørnsen, Improving distributed memory applications testing by message perturbation, Proceeding of the 2006 workshop on Parallel and distributed systems: testing and debugging, July 17-17, 2006, Portland, Maine, USA
|
|
|
|
|
|
|
|
|
|
|
|
Xuezheng Liu , Zhenyu Guo , Xi Wang , Feibo Chen , Xiaochen Lian , Jian Tang , Ming Wu , M. Frans Kaashoek , Zheng Zhang, D3S: debugging deployed distributed systems, Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, p.423-437, April 16-18, 2008, San Francisco, California
|
|
|
Ruini Xue , Xuezheng Liu , Ming Wu , Zhenyu Guo , Wenguang Chen , Weimin Zheng , Zheng Zhang , Geoffrey Voelker, MPIWiz: subgroup reproducible replay of mpi applications, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|