| Lessons learned at 208K: towards debugging millions of cores |
| Full text |
Pdf
(330 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2008 ACM/IEEE conference on Supercomputing - Volume 00
table of contents
Austin, Texas
Article No. 26
Year of Publication: 2008
ISBN:978-1-4244-2835-9
|
|
Authors
|
|
Gregory L. Lee
|
Lawrence Livermore National Laboratory, Livermore, CA
|
|
Dong H. Ahn
|
Lawrence Livermore National Laboratory, Livermore, CA
|
|
Dorian C. Arnold
|
University of Wisconsin, Madison, WI
|
|
Bronis R. de Supinski
|
Lawrence Livermore National Laboratory, Livermore, CA
|
|
Matthew Legendre
|
University of Wisconsin, Madison, WI
|
|
Barton P. Miller
|
University of Wisconsin, Madison, WI
|
|
Martin Schulz
|
Lawrence Livermore National Laboratory, Livermore, CA
|
|
Ben Liblit
|
University of Wisconsin, Madison, WI
|
|
| Publisher |
IEEE Press
Piscataway, NJ, USA
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 184, Citation Count: 0
|
|
|
ABSTRACT
Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
"Top 500 Supercomputer Sites," http://www.top500.org/.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
D. C. Arnold, D. H. Ahn, B. R. de Supinski, G. L. Lee, B. P. Miller, and M. Schulz, "Stack Trace Analysis for Large Scale Debugging," in The International Parallel and Distributed Processing Symposium, Long Beach, CA, 2007.
|
| |
6
|
R. Bell, A. Malony, and S. Shende, "ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis," in Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par 2003), Aug. 2003, pp. 17--26.
|
| |
7
|
Martin Schulz , Jim Galarowicz , Don Maghrak , William Hachfeld , David Montoya , Scott Cranford, Open | SpeedShop: An open source infrastructure for parallel performance analysis, Scientific Programming, v.16 n.2-3, p.105-121, April 2008
|
| |
8
|
A. Nataraj, M. Sottile, A. Morrisd, A. Malony, and S. Shende, "TAUover-Supermon: Low-Overhead Online Parallel Performance Monitoring," in Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par 2007), Aug. 2007, pp. 85--96.
|
| |
9
|
G. L. Lee, D. H. Ahn, D. C. Arnold, B. R. de Supinski, B. P. Miller, and M. Schulz, "Benchmarking the Stack Trace Analysis Tool for BlueGene/L," in Parallel Computing: Architectures, Algorithms and Applications (Proceedings of the International Conference ParCo 2007), Julich/Aachen, Germany, 2007.
|
| |
10
|
Dong H. Ahn , Dorian C. Arnold , Bronis R. de Supinski , Gregory L. Lee , Barton P. Miller , Martin Schulz, Overcoming Scalability Challenges for Tool Daemon Launching, Proceedings of the 2008 37th International Conference on Parallel Processing, p.578-585, September 09-11, 2008
[doi> 10.1109/ICPP.2008.63]
|
| |
11
|
Allinea Software, "Allinea DDT the Distributed Debugging Tool," http://www.allinea.com/index.php?page=48.
|
| |
12
|
|
| |
13
|
W. Yu, R. Nononha, S. Liang, and D. K. Panda, "Benefits of High Speed Interconnects to Cluster File Systems: A Case Study with Lustre," in The International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, 2006.
|
| |
14
|
TotalView Technologies, "TotalView Debugger," http://www.totalviewtech.com/productsTV.htm.
|
| |
15
|
J. Vetter and C. Chambreau, "mpiP: Lightweight, Scalable MPI Profiling," http://mpip.sourceforge.net.
|
| |
16
|
M. Geimer, F. Wolf, B. J. N. Wylie, and B. Mohr, "Scalable Parallel Trace-Based Performance Analysis," in Proceedings of the 13th European Parallel Virtual Machine and Message Passing Interface Conference, Germany, 2006.
|
| |
17
|
IBM, "High Performance Computing Toolkit," https://domino.research. ibm.com/comm/research_projects.nsf/pages/actc.index.html.
|
| |
18
|
M. Geimer, B. Kuhlmann, F. Pulatova, F. Wolf, and B. J. N. Wylie, "Scalable Collation and Presentation of Call-Path Profile Data with CUBE," in Parallel Computing: Architectures, Algorithms and Applications (Proceedings of the International Conference ParCo 2007), Julich/Aachen, Germany, 2007.
|
| |
19
|
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach, "VAMPIR: Visualization and analysis of MPI resources," Supercomputer, vol. 12, no. 1, pp. 69--80, 1996.
|
 |
20
|
|
 |
21
|
Martin Schulz , Dong Ahn , Andrew Bernat , Bronis R. de Supinski , Steven Y. Ko , Gregory Lee , Barry Rountree, Scalable dynamic binary instrumentation for Blue Gene/L, ACM SIGARCH Computer Architecture News, v.33 n.5, December 2005
[doi> 10.1145/1127577.1127581]
|
| |
22
|
|
|