| Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System |
| Full text |
Pdf
(227 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
table of contents
Page: 10
Year of Publication: 2003
ISBN:1-58113-695-1
|
|
Authors
|
|
Terry Jones
|
Lawrence Livermore National Laboratory, Livermore, CA
|
|
Shawn Dawson
|
Lawrence Livermore National Laboratory, Livermore, CA
|
|
Rob Neely
|
Lawrence Livermore National Laboratory, Livermore, CA
|
|
William Tuel
|
International Business Machines Corporation, Armonk, NY
|
|
Larry Brenner
|
International Business Machines Corporation, Armonk, NY
|
|
Jeffrey Fier
|
International Business Machines Corporation, Armonk, NY
|
|
Robert Blackmore
|
International Business Machines Corporation, Armonk, NY
|
|
Patrick Caffrey
|
International Business Machines Corporation, Armonk, NY
|
|
Brian Maskell
|
Atomic Weapons Establishment, Aldermaston Reading, UK
|
|
Paul Tomlinson
|
Atomic Weapons Establishment, Aldermaston Reading, UK
|
|
Mark Roberts
|
Atomic Weapons Establishment, Aldermaston Reading, UK
|
|
| Sponsor |
|
| Publisher |
IEEE Computer Society
Washington, DC, USA
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 59, Citation Count: 15
|
|
|
ABSTRACT
A parallel application benefits from scheduling policies that include a global perspective of the application's process working set. As the interactions among cooperating processes increase, mechanisms to ameliorate waiting within one or more of the processes become more important. In particular, collective operations such as barriers and reductions are extremely sensitive to even usually harmless events such as context switches among members of the process working set. For the last 18 months, we have been researching the impact of random short-lived interruptions such as timer-decrement processing and periodic daemon activity, and developing strategies to minimize their impact on large processor-count SPMD bulk-synchronous programming styles. We present a novel co-scheduling scheme for improving performance of fine-grain collective activities such as barriers and reductions, describe an implementation consisting of operating system kernel modifications and run-time system, and present a set of empirical results comparing the technique with traditional operating system scheduling. Our results indicate a speedup of over 300% on synchronizing collectives.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[ASCI_White] ASCI White Information. http://www.llnl.gov/asci/platforms/white
|
| |
2
|
[Barbosa da Silva99] Barbosa da Silva FA, Scherson ID. Concurrent gang: Towards a flexible and scalable gang scheduler. Proceedings 11th Symposium on Computer Architecture and High Performance Computing. Univ. Federal do Rio Grande do Sul. 1999, pp. 243-7. Porto Alegre, Brazil.
|
| |
3
|
|
| |
4
|
Douglas C. Burger , Rahmat S. Hyder , Barton P. Miller , David A. Wood, Paging tradeoffs in distributed-shared-memory multiprocessors, Proceedings of the 1994 conference on Supercomputing, p.590-599, December 1994, Washington, D.C., United States
|
| |
5
|
[Dawson03] Shawn Dawson, Mike Collette, Performance of Ares. LLNL Technical Report UCRL-, January 21, 2003.
|
 |
6
|
Andrea C. Dusseau , Remzi H. Arpaci , David E. Culler, Effective distributed scheduling of parallel workloads, Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.25-36, May 23-26, 1996, Philadelphia, Pennsylvania, United States
|
| |
7
|
[Feitelson89] Dror G. Feitelson and Larry Rudolph. Gang Scheduling Performance Benefits for Fine-Grain Synchronization. Journal of Parallel and Distributed Computing, 16(4), 1992.
|
| |
8
|
[Feitelson97] Feitelson, D.: Job Scheduling in Multiprogrammed Parallel Systems IBM Research Report RC 19970, Second Revision (1997).
|
 |
9
|
Anoop Gupta , Andrew Tucker , Shigeru Urushibara, The impact of operating system scheduling policies and synchronization methods of performance of parallel applications, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.120-132, May 21-24, 1991, San Diego, California, United States
|
| |
10
|
|
| |
11
|
[Hoisie03] Adolfy Hoisie, Darren Kerbyson, Scott Pakin, Fabrizio Petrini, Harvey Wasserman, Juan Fernandez-Peinador, Identifying and Eliminating the performance Variability on the ASCI Q Machine, LANL Technical Report UCRL-yyyyy. January 2, 2003.
|
| |
12
|
[IBM 01] IBM Corp - IBM Parallel Environment for AIX: Installation, GA22-7418
|
| |
13
|
[IBM 03] IBM Cluster solutions. http://www-1.ibm.com/servers/eserver/clusters
|
| |
14
|
[Jones02] Terry Jones, "A Scaling Investigation on IBM SPs", ScicomP 6, Aug. 2002, Berkeley, CA.
|
| |
15
|
[Jones03] Terry Jones, Jeff Fier, Larry Brenner, Observed Impacts of Operating Systems on the Scalability of Applications. LLNL Technical Report UCRL-, March 5, 2003.
|
| |
16
|
[Karl97] Holger Karl - Co-scheduling through synchronized Scheduling servers - A prototype and experiments (unpublished).
|
 |
17
|
|
| |
18
|
[Litzkow88] M. Litzkow, M. Livny, and M. Mutka. Condor - a hunter of idle workstations. In Proceedings of the 8th International Conference on Distributed Computing Systems (ICDCS'88), pages 108--111. IEEE Computer Society Press, June 1988.
|
| |
19
|
[MHPCC] MHPCC, The Maui Scheduler, http://www.hpc2n.umu.se/doc/maui/index.html
|
| |
20
|
José E. Moreira , Hubertus Franke , Waiman Chan , Liana L. Fong , Morris A. Jette , Andy Yoo, A Gang-Scheduling System for ASCI Blue-Pacific, Proceedings of the 7th International Conference on High-Performance Computing and Networking, p.831-840, April 12-14, 1999
|
| |
21
|
[MPICH02] Abstract Device Interface Version 3.3 Reference Manual Draft of October 17, 2002.
|
| |
22
|
[Mraz94] R. Mraz, Reducing the Variance of Point to Point Transfers in the IBM 9076 Parallel Computer, July, 1994. IBM Research Report RC-19675.
|
 |
23
|
Shailabh Nagar , Ajit Banerjee , Anand Sivasubramaniam , Chita R. Das, A closer look at coscheduling approaches for a network of workstations, Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, p.96-105, June 27-30, 1999, Saint Malo, France
[doi> 10.1145/305619.305630]
|
| |
24
|
[Ousterhout82] John K. Ousterhout - Scheduling Techniques for Concurrent Systems. In Third International Conference on Distributed Computing Systems, pp. 22-30, May 1982.
|
| |
25
|
[Sobalvarro97] Patrick G. Sobalvarro, Scott Pakin, William E. Weihl, and Andrew A. Chien - Dynamic Co-scheduling on Workstation Clusters. Digital Systems Research Center Technical Note 1997-017, March, 1997.
|
 |
26
|
|
| |
27
|
[Top500] Top 500 Supercomputer sites. http://www.top500.org/lists/2002/11/
|
CITED BY 15
|
|
Patrick G. Bridges , Arthur B. MacCabe, IMPuLSE: integrated monitoring and profiling for large-scale environments, Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems, p.1-5, October 22-23, 2004, Houston, Texas
|
|
|
Dan Tsafrir , Yoav Etsion , Dror G. Feitelson , Scott Kirkpatrick, System noise, OS clock ticks, and fine-grained parallel applications, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
Sayantan Chakravorty , Celso L. Mendes , Laxmikant V. Kalé , Terry Jones , Andrew Tauferner , Todd Inglett , José Moreira, HPC-Colony: services and interfaces for very large systems, ACM SIGOPS Operating Systems Review, v.40 n.2, April 2006
|
|
|
|
|
|
|
|
|
Jean-Charles Tournier , Patrick G. Bridges , Arthur B. MacCabe , Patrick M. Widener , Zaid Abudayyeh , Ron Brightwell , Rolf Riesen , Trammel Hudson, Towards a framework for dedicated operating systems development in high-end computing systems, ACM SIGOPS Operating Systems Review, v.40 n.2, April 2006
|
|
|
|
|
|
Aroon Nataraj , Alan Morris , Allen D. Malony , Matthew Sottile , Pete Beckman, The ghost in the machine: observing the effects of kernel operation on parallel application performance, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|