|
ABSTRACT
SMP clusters and multiclusters are widely used to execute message-passing parallel applications. The ways to map parallel processes to processors (or cores) could affect the application performance significantly due to the non-uniform communicating cost in such systems. It is desired to have a tool to map parallel processes to processors (or cores) automatically.Although there have been various efforts to address this issue, the existing solutions either require intensive user intervention, or can not be able to handle the situation of multiclusters well.In this paper, we propose a profile-guided approach to find the optimized mapping automatically to minimize the cost of point-to-point communications for arbitrary message passing applications. The implemented toolset is called MPIPP (MPI Process Placement toolset), and it includes several components:1) A tool to get the communication profile of MPI applications2) A tool to get the network topology of target clusters3) An algorithm to find optimized mapping, which is especially more effective than existing graph partition algorithms for multiclusters.We evaluated the performance of our tool with the NPB benchmarks and three other applications in several clusters. Experimental results show that the optimized process placement generated by our tools can achieve significant speedup.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow. The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Dec. 1995.
|
| |
2
|
ChARMM Team. CHARMM. http://www.charmm.org
|
 |
3
|
David Culler , Richard Karp , David Patterson , Abhijit Sahay , Klaus Erik Schauser , Eunice Santos , Ramesh Subramonian , Thorsten von Eicken, LogP: towards a realistic model of parallel computation, Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.1-12, May 19-22, 1993, San Diego, California, United States
|
| |
4
|
M. Deilmann, B. Roberg, M. Baum, and V. Scherer. Numerical simulation of evaporation and ignition of non-premixed n-heptane flames. In 5th International Conference on Parallel Processing and Applied Mathematics, Czestochowa, Poland, 2003.
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
B. Hendrickson and R. Leland. The Chaco User's Guide Version 2. Sandia National Laboratories, Albuquerque NM, 1995.
|
| |
9
|
Intel Ltd. Intel® Trace Analyzer & Collector. http://www.intel.com/cd/software/products/asmo-na/eng/cluster/tanalyzer/index.htm
|
| |
10
|
Intel Ltd. Intel® MPI library. http://www.intel.com/cd/software/products/asmo-na/eng/cluster/mpi/index.htm
|
| |
11
|
G. Karypis and V. Kumar. METIS, Unstructured Graph Partitioning and Sparse Matrix Ordering System. Version 2.0. Technical report, University of Minnesota, Department of Computer Science, Minneapolis, MN 55455, Aug. 1995.
|
| |
12
|
B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2):291--307, Feb. 1970.
|
 |
13
|
Thilo Kielmann , Rutger F. H. Hofman , Henri E. Bal , Aske Plaat , Raoul A. F. Bhoedjang, MagPIe: MPI's collective communication operations for clustered wide area systems, Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, p.131-140, May 04-06, 1999, Atlanta, Georgia, United States
|
| |
14
|
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, Number 4598, 13 May 1983, 220, 4598:671--680, 1983.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
LS DYNA Team. LS DYNA. http://www.dynalook.com/
|
| |
20
|
Richard Martin , Amin Vahdat , David Culler , Thomas Anderson, Effect of Communication Latency, Overhead, and Bandwidth on a Cluster, University of California at Berkeley, Berkeley, CA, 1998
|
| |
21
|
|
| |
22
|
|
| |
23
|
S. Sanyal, A. Jain, S. K. Das, and R. Biswas. A hierarchical and distributed approach for mapping large applications to heterogeneous grids using genetic algorithms. In CLUSTER, pages 496--499. IEEE Computer Society, 2003.
|
| |
24
|
ScaliMPI Team. ScaliMPI. http://www.scali.no/download/doc/Scali_MPI_ Connect_FF_4_3_6_121104_EXT.pdf
|
| |
25
|
|
| |
26
|
|
| |
27
|
L. Weijian, C. Wenguang, L. Zhiguang, and Z. Weimin. Communication optimization for smp clusters. Tsinghua Science and Technology, 6(1):18--23, 2001.
|
| |
28
|
WRF Team. WRF. http://www.wrf-model.org/
|
|