|
ABSTRACT
Load imbalance cause significant performance degradation in High Performance Computing applications. In our previous work we showed that load imbalance can be alleviated by modern MT processors that provide mechanisms for controlling the allocation of processors internal resources. In that work, we applied static, hand-tuned resource allocations to balance HPC applications, providing improvements for benchmarks and real applications. In this paper we propose a dynamic process scheduler for the Linux kernel that automatically and transparently balances HPC applications according to their behavior. We tested our new scheduler on an IBM POWER5 machine, which provides a software-controlled prioritization mechanism that allows us to bias the processor resource allocation. Our experiments show that the scheduler reduces the imbalance of HPC applications, achieving results similar to the ones obtained by hand-tuning the applications (up to 16%). Moreover, our solution reduces the application's execution time combining effect of load balance and high responsive scheduling.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Metis - family of multilevel partitioning algorithms. http://www.cs.umn.edu/metis.
|
| |
2
|
The Message Passing Interface (MPI) Standard. http://www-unix.mcs.anl.gov/mpi/.
|
| |
3
|
E. Ayguade, B. Blainey, A. Duran, J. Labarta, F. Martnez, X. Martorell, and R. Silvera. Is the schedule clause really necessary in openmp? In In Proceedings of the International Workshop of OpenMP Applications and Tools, Lecture Notes in Computer Science. Toronto, Canada., pages 147--159, Jun 2003.
|
 |
4
|
Carlos Boneti , Francisco J. Cazorla , Roberto Gioiosa , Alper Buyuktosunoglu , Chen-Yong Cher , Mateo Valero, Software-Controlled Priority Characterization of POWER5 Processor, Proceedings of the 35th International Symposium on Computer Architecture, p.415-426, June 21-25, 2008
|
| |
5
|
C. Boneti, R. Gioiosa, F. Cazorla, J. Corbalan, J. Labarta, and M. Valero. Balancing HPC Applications Through Smart Allocation of Resources in MT Processors. In International Parallel and Distributed Processing Symposium (IPDPS)., Miami, USA, April 2008.
|
| |
6
|
|
 |
7
|
|
| |
8
|
B. Gibbs, B. Atyam, F. Berres, B. Blanchard, L. Castillo, P. Coelho, N. Guerin, L. Liu, C. Diniz Maciel, and C. Thirumalai. Advanced POWER Virtualization on IBM eServer p5 Servers: Architecture and Performance Considerations. IBM Redbook, 2005.
|
| |
9
|
R. Gioiosa, F. Petrini, K. Davis, and F. Lebaillif-Delamare. Analysis of System Overhead on Parallel Computers. In The 4th IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2004), Rome, Italy, December 2004. Available from http://bravo.ce.uniroma2.it/home/gioiosa/pub/isspit04.pdf.
|
| |
10
|
|
| |
11
|
W. Huang and D. Tafti. A parallel computing framework for dynamic power balancing in adaptive mesh refinement applications. In Proceedings of Parallel Computational Fluid Dynamics 99.
|
| |
12
|
IBM. Cell broadband engine architecture.
|
| |
13
|
IBM. Cell broadband engine programming handbook.
|
| |
14
|
IBM. PowerPC Architecture book: Book I: User Instruction Set Architecture.
|
| |
15
|
IBM. PowerPC Architecture book: Book II: PowerPC Virtual Environment Architecture.
|
| |
16
|
IBM. PowerPC Architecture book: Book III: PowerPC Operating Environment Architecture.
|
 |
17
|
Ravi Iyer , Li Zhao , Fei Guo , Ramesh Illikkal , Srihari Makineni , Don Newell , Yan Solihin , Lisa Hsu , Steve Reinhardt, QoS policies and architecture for cache/memory in CMP platforms, Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, June 12-16, 2007, San Diego, California, USA
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
H. Q. Le , W. J. Starke , J. S. Fields , F. P. O'Connell , D. Q. Nguyen , B. J. Ronchetti , W. M. Sauer , E. M. Schwarz , M. T. Vaden, IBM POWER6 microarchitecture, IBM Journal of Research and Development, v.51 n.6, p.639-662, November 2007
|
 |
22
|
Aroon Nataraj , Alan Morris , Allen D. Malony , Matthew Sottile , Pete Beckman, The ghost in the machine: observing the effects of kernel operation on parallel application performance, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
[doi> 10.1145/1362622.1362662]
|
| |
23
|
Kyle J. Nesbit, Miquel Moreto, Francisco J. Cazorla, Alex Ramirez, Mateo Valero, and J. E. Smith. A framework for managing multicore resources. To appear in IEEE Micro special issue on Interaction of Computer Architecture and Operating System in the Many-core Era. Available at http://www.ece.wisc.edu/~nesbit/papers/VPM_ieee_micro08_draft.pdf, 2008.
|
| |
24
|
|
| |
25
|
K. Schloegel, G. Karypis, and V. Kumar. Parallel multilevel algorithms for multi-constraint graph partitioning. Technical report.
|
| |
26
|
Siesta-Project. Siesta: A linear-scaling density-functional method. http://www.uam.es/siesta/.
|
| |
27
|
J. M. Soler, E. Artacho, J. D. Gale, A. Garca, J. Junquera, P. Ordejn, and D. Snchez-Portal. The siesta method for ab initio order-n materials simulation. Journal of Physics: Condensed Matter, 14(11), 2002.
|
 |
28
|
Dan Tsafrir , Yoav Etsion , Dror G. Feitelson , Scott Kirkpatrick, System noise, OS clock ticks, and fine-grained parallel applications, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088190]
|
| |
29
|
|
| |
30
|
C. Walshaw and M. Cross. Dynamic mesh partitioning and load-balancing for parallel computational mechanics codes. 2002.
|
|