ACM Home Page
Please provide us with feedback. Feedback
Lessons learned at 208K: towards debugging millions of cores
Full text PdfPdf (330 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2008 ACM/IEEE conference on Supercomputing - Volume 00 table of contents
Austin, Texas
SECTION: Papers table of contents
Article No. 26  
Year of Publication: 2008
ISBN:978-1-4244-2835-9
Authors
Gregory L. Lee  Lawrence Livermore National Laboratory, Livermore, CA
Dong H. Ahn  Lawrence Livermore National Laboratory, Livermore, CA
Dorian C. Arnold  University of Wisconsin, Madison, WI
Bronis R. de Supinski  Lawrence Livermore National Laboratory, Livermore, CA
Matthew Legendre  University of Wisconsin, Madison, WI
Barton P. Miller  University of Wisconsin, Madison, WI
Martin Schulz  Lawrence Livermore National Laboratory, Livermore, CA
Ben Liblit  University of Wisconsin, Madison, WI
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 184,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks.

In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
"Top 500 Supercomputer Sites," http://www.top500.org/.
 
2
 
3
 
4
 
5
D. C. Arnold, D. H. Ahn, B. R. de Supinski, G. L. Lee, B. P. Miller, and M. Schulz, "Stack Trace Analysis for Large Scale Debugging," in The International Parallel and Distributed Processing Symposium, Long Beach, CA, 2007.
 
6
R. Bell, A. Malony, and S. Shende, "ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis," in Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par 2003), Aug. 2003, pp. 17--26.
 
7
 
8
A. Nataraj, M. Sottile, A. Morrisd, A. Malony, and S. Shende, "TAUover-Supermon: Low-Overhead Online Parallel Performance Monitoring," in Proceedings of the International Conference on Parallel and Distributed Computing (Euro-Par 2007), Aug. 2007, pp. 85--96.
 
9
G. L. Lee, D. H. Ahn, D. C. Arnold, B. R. de Supinski, B. P. Miller, and M. Schulz, "Benchmarking the Stack Trace Analysis Tool for BlueGene/L," in Parallel Computing: Architectures, Algorithms and Applications (Proceedings of the International Conference ParCo 2007), Julich/Aachen, Germany, 2007.
 
10
 
11
Allinea Software, "Allinea DDT the Distributed Debugging Tool," http://www.allinea.com/index.php?page=48.
 
12
 
13
W. Yu, R. Nononha, S. Liang, and D. K. Panda, "Benefits of High Speed Interconnects to Cluster File Systems: A Case Study with Lustre," in The International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, 2006.
 
14
TotalView Technologies, "TotalView Debugger," http://www.totalviewtech.com/productsTV.htm.
 
15
J. Vetter and C. Chambreau, "mpiP: Lightweight, Scalable MPI Profiling," http://mpip.sourceforge.net.
 
16
M. Geimer, F. Wolf, B. J. N. Wylie, and B. Mohr, "Scalable Parallel Trace-Based Performance Analysis," in Proceedings of the 13th European Parallel Virtual Machine and Message Passing Interface Conference, Germany, 2006.
 
17
IBM, "High Performance Computing Toolkit," https://domino.research. ibm.com/comm/research_projects.nsf/pages/actc.index.html.
 
18
M. Geimer, B. Kuhlmann, F. Pulatova, F. Wolf, and B. J. N. Wylie, "Scalable Collation and Presentation of Call-Path Profile Data with CUBE," in Parallel Computing: Architectures, Algorithms and Applications (Proceedings of the International Conference ParCo 2007), Julich/Aachen, Germany, 2007.
 
19
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach, "VAMPIR: Visualization and analysis of MPI resources," Supercomputer, vol. 12, no. 1, pp. 69--80, 1996.
20
21
 
22

Collaborative Colleagues:
Gregory L. Lee: colleagues
Dong H. Ahn: colleagues
Dorian C. Arnold: colleagues
Bronis R. de Supinski: colleagues
Matthew Legendre: colleagues
Barton P. Miller: colleagues
Martin Schulz: colleagues
Ben Liblit: colleagues