ACM Home Page
Please provide us with feedback. Feedback
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
Full text PdfPdf (227 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2003 ACM/IEEE conference on Supercomputing table of contents
Page: 10  
Year of Publication: 2003
ISBN:1-58113-695-1
Authors
Terry Jones  Lawrence Livermore National Laboratory, Livermore, CA
Shawn Dawson  Lawrence Livermore National Laboratory, Livermore, CA
Rob Neely  Lawrence Livermore National Laboratory, Livermore, CA
William Tuel  International Business Machines Corporation, Armonk, NY
Larry Brenner  International Business Machines Corporation, Armonk, NY
Jeffrey Fier  International Business Machines Corporation, Armonk, NY
Robert Blackmore  International Business Machines Corporation, Armonk, NY
Patrick Caffrey  International Business Machines Corporation, Armonk, NY
Brian Maskell  Atomic Weapons Establishment, Aldermaston Reading, UK
Paul Tomlinson  Atomic Weapons Establishment, Aldermaston Reading, UK
Mark Roberts  Atomic Weapons Establishment, Aldermaston Reading, UK
Sponsor
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 59,   Citation Count: 15
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

A parallel application benefits from scheduling policies that include a global perspective of the application's process working set. As the interactions among cooperating processes increase, mechanisms to ameliorate waiting within one or more of the processes become more important. In particular, collective operations such as barriers and reductions are extremely sensitive to even usually harmless events such as context switches among members of the process working set. For the last 18 months, we have been researching the impact of random short-lived interruptions such as timer-decrement processing and periodic daemon activity, and developing strategies to minimize their impact on large processor-count SPMD bulk-synchronous programming styles. We present a novel co-scheduling scheme for improving performance of fine-grain collective activities such as barriers and reductions, describe an implementation consisting of operating system kernel modifications and run-time system, and present a set of empirical results comparing the technique with traditional operating system scheduling. Our results indicate a speedup of over 300% on synchronizing collectives.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
[ASCI_White] ASCI White Information. http://www.llnl.gov/asci/platforms/white
 
2
[Barbosa da Silva99] Barbosa da Silva FA, Scherson ID. Concurrent gang: Towards a flexible and scalable gang scheduler. Proceedings 11th Symposium on Computer Architecture and High Performance Computing. Univ. Federal do Rio Grande do Sul. 1999, pp. 243-7. Porto Alegre, Brazil.
 
3
 
4
 
5
[Dawson03] Shawn Dawson, Mike Collette, Performance of Ares. LLNL Technical Report UCRL-, January 21, 2003.
6
 
7
[Feitelson89] Dror G. Feitelson and Larry Rudolph. Gang Scheduling Performance Benefits for Fine-Grain Synchronization. Journal of Parallel and Distributed Computing, 16(4), 1992.
 
8
[Feitelson97] Feitelson, D.: Job Scheduling in Multiprogrammed Parallel Systems IBM Research Report RC 19970, Second Revision (1997).
9
 
10
 
11
[Hoisie03] Adolfy Hoisie, Darren Kerbyson, Scott Pakin, Fabrizio Petrini, Harvey Wasserman, Juan Fernandez-Peinador, Identifying and Eliminating the performance Variability on the ASCI Q Machine, LANL Technical Report UCRL-yyyyy. January 2, 2003.
 
12
[IBM 01] IBM Corp - IBM Parallel Environment for AIX: Installation, GA22-7418
 
13
[IBM 03] IBM Cluster solutions. http://www-1.ibm.com/servers/eserver/clusters
 
14
[Jones02] Terry Jones, "A Scaling Investigation on IBM SPs", ScicomP 6, Aug. 2002, Berkeley, CA.
 
15
[Jones03] Terry Jones, Jeff Fier, Larry Brenner, Observed Impacts of Operating Systems on the Scalability of Applications. LLNL Technical Report UCRL-, March 5, 2003.
 
16
[Karl97] Holger Karl - Co-scheduling through synchronized Scheduling servers - A prototype and experiments (unpublished).
17
 
18
[Litzkow88] M. Litzkow, M. Livny, and M. Mutka. Condor - a hunter of idle workstations. In Proceedings of the 8th International Conference on Distributed Computing Systems (ICDCS'88), pages 108--111. IEEE Computer Society Press, June 1988.
 
19
[MHPCC] MHPCC, The Maui Scheduler, http://www.hpc2n.umu.se/doc/maui/index.html
 
20
 
21
[MPICH02] Abstract Device Interface Version 3.3 Reference Manual Draft of October 17, 2002.
 
22
[Mraz94] R. Mraz, Reducing the Variance of Point to Point Transfers in the IBM 9076 Parallel Computer, July, 1994. IBM Research Report RC-19675.
23
 
24
[Ousterhout82] John K. Ousterhout - Scheduling Techniques for Concurrent Systems. In Third International Conference on Distributed Computing Systems, pp. 22-30, May 1982.
 
25
[Sobalvarro97] Patrick G. Sobalvarro, Scott Pakin, William E. Weihl, and Andrew A. Chien - Dynamic Co-scheduling on Workstation Clusters. Digital Systems Research Center Technical Note 1997-017, March, 1997.
26
 
27
[Top500] Top 500 Supercomputer sites. http://www.top500.org/lists/2002/11/

CITED BY  15
Collaborative Colleagues:
Terry Jones: colleagues
Shawn Dawson: colleagues
Rob Neely: colleagues
William Tuel: colleagues
Larry Brenner: colleagues
Jeffrey Fier: colleagues
Robert Blackmore: colleagues
Patrick Caffrey: colleagues
Brian Maskell: colleagues
Paul Tomlinson: colleagues
Mark Roberts: colleagues