ACM Home Page
Please provide us with feedback. Feedback
MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters
Full text PdfPdf (570 KB)
Source International Conference on Supercomputing archive
Proceedings of the 20th annual international conference on Supercomputing table of contents
Cairns, Queensland, Australia
SESSION: Scheduling and mapping table of contents
Pages: 353 - 360  
Year of Publication: 2006
ISBN:1-59593-282-8
Authors
Hu Chen  Intel China Research Center
Wenguang Chen  Tsinghua University
Jian Huang  Advanced Parallel Software Platforms, Intel Corp.
Bob Robert  Advanced Parallel Software Platforms, Intel Corp.
H. Kuhn  Advanced Parallel Software Platforms, Intel Corp.
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 60,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1183401.1183451
What is a DOI?

ABSTRACT

SMP clusters and multiclusters are widely used to execute message-passing parallel applications. The ways to map parallel processes to processors (or cores) could affect the application performance significantly due to the non-uniform communicating cost in such systems. It is desired to have a tool to map parallel processes to processors (or cores) automatically.Although there have been various efforts to address this issue, the existing solutions either require intensive user intervention, or can not be able to handle the situation of multiclusters well.In this paper, we propose a profile-guided approach to find the optimized mapping automatically to minimize the cost of point-to-point communications for arbitrary message passing applications. The implemented toolset is called MPIPP (MPI Process Placement toolset), and it includes several components:1) A tool to get the communication profile of MPI applications2) A tool to get the network topology of target clusters3) An algorithm to find optimized mapping, which is especially more effective than existing graph partition algorithms for multiclusters.We evaluated the performance of our tool with the NPB benchmarks and three other applications in several clusters. Experimental results show that the optimized process placement generated by our tools can achieve significant speedup.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Dec. 1995.
 
2
ChARMM Team. CHARMM. http://www.charmm.org
3
 
4
M. Deilmann, B. Roberg, M. Baum, and V. Scherer. Numerical simulation of evaporation and ignition of non-premixed n-heptane flames. In 5th International Conference on Parallel Processing and Applied Mathematics, Czestochowa, Poland, 2003.
5
6
 
7
 
8
B. Hendrickson and R. Leland. The Chaco User's Guide Version 2. Sandia National Laboratories, Albuquerque NM, 1995.
 
9
Intel Ltd. Intel® Trace Analyzer & Collector. http://www.intel.com/cd/software/products/asmo-na/eng/cluster/tanalyzer/index.htm
 
10
Intel Ltd. Intel® MPI library. http://www.intel.com/cd/software/products/asmo-na/eng/cluster/mpi/index.htm
 
11
G. Karypis and V. Kumar. METIS, Unstructured Graph Partitioning and Sparse Matrix Ordering System. Version 2.0. Technical report, University of Minnesota, Department of Computer Science, Minneapolis, MN 55455, Aug. 1995.
 
12
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2):291--307, Feb. 1970.
13
 
14
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, Number 4598, 13 May 1983, 220, 4598:671--680, 1983.
 
15
 
16
 
17
 
18
 
19
LS DYNA Team. LS DYNA. http://www.dynalook.com/
 
20
 
21
 
22
 
23
S. Sanyal, A. Jain, S. K. Das, and R. Biswas. A hierarchical and distributed approach for mapping large applications to heterogeneous grids using genetic algorithms. In CLUSTER, pages 496--499. IEEE Computer Society, 2003.
 
24
ScaliMPI Team. ScaliMPI. http://www.scali.no/download/doc/Scali_MPI_ Connect_FF_4_3_6_121104_EXT.pdf
 
25
 
26
 
27
L. Weijian, C. Wenguang, L. Zhiguang, and Z. Weimin. Communication optimization for smp clusters. Tsinghua Science and Technology, 6(1):18--23, 2001.
 
28
WRF Team. WRF. http://www.wrf-model.org/

Collaborative Colleagues:
Hu Chen: colleagues
Wenguang Chen: colleagues
Jian Huang: colleagues
Bob Robert: colleagues
H. Kuhn: colleagues