|
ABSTRACT
We present a distributed algorithm for time-sharing parallel workloads that is competitive with coscheduling. Implicit scheduling allows each local scheduler in the system to make independent decisions that dynamically coordinate the scheduling of cooperating processes across processors. Of particular importance is the blocking algorithm which decides the action of a process waiting for a communication or synchronization event to complete. Through simulation of bulk-synchronous parallel applications, we find that a simple two-phase fixed-spin blocking algorithm performs well; a two-phase adaptive algorithm that gathers run-time data on barrier wait-times performs slightly better. Our results hold for a range of machine parameters and parallel program characteristics. These findings are in direct contrast to the literature that states explicit coscheduling is necessary for fine-grained programs. We show that the choice of the local scheduler is crucial, with a priority-based scheduler performing two to three times better than a round-robin scheduler. Overall, we find that the performance of implicit scheduling is near that of coscheduling (+/- 35%), without the requirement of explicit, global coordination.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Remzi H. Arpaci , David E. Culler , Arvind Krishnamurthy , Steve G. Steinberg , Katherine Yelick, Empirical evaluation of the CRAY-T3D: a compiler perspective, Proceedings of the 22nd annual international symposium on Computer architecture, p.320-331, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
2
|
Remzi H. Arpaci , Andrea C. Dusseau , Amin M. Vahdat , Lok T. Liu , Thomas E. Anderson , David A. Patterson, The interaction of parallel and sequential workloads on a network of workstations, Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.267-278, May 15-19, 1995, Ottawa, Ontario, Canada
|
| |
3
|
|
| |
4
|
M. J. Atallah, C. L. Black, D. C. Marinescu, H. J. Siegel, and T. L. Casavant. Models and Algorithms for Co-scheduling Compute-Intensive Tasks on a Network of Workstations Journal of Parallel and Distrbuted Computing, 16-319-327, 1992.
|
 |
5
|
Rohit Chandra , Scott Devine , Ben Verghese , Anoop Gupta , Mendel Rosenblum, Scheduling and page migration for multiprocessor compute servers, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.12-24, October 05-07, 1994, San Jose, California, United States
|
 |
6
|
J. Chapin , M. Rosenblum , S. Devine , T. Lahiri , D. Teodosiu , A. Gupta, Hive: fault containment for shared-memory multiprocessors, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.12-25, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
7
|
Su-Hui Chiang , Rajesh K. Mansharamani , Mary K. Vernon, Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.33-44, May 16-20, 1994, Nashville, Tennessee, United States
|
| |
8
|
M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos. Multiprogramming on Multiprocessors. In Proceed,ngs of the Third IEEE Symposium on Parallel and D~stmbuted Processzng, pages 590-597, Dec. 1991.
|
| |
9
|
|
 |
10
|
R. Cypher , A. Ho , S. Konstantinidou , P. Messina, Architectural requirements of parallel scientific applications with explicit communication, Proceedings of the 20th annual international symposium on Computer architecture, p.2-13, May 16-19, 1993, San Diego, California, United States
|
| |
11
|
J. J. Dongarra, J. R. Bunch, C. B. Moler, and G W. Stewart LINPACK User's Guide. SIAM, Philadelphia, Penn, 1979.
|
| |
12
|
|
| |
13
|
S. Evans, K. Clarke, D. Singleton, and B. Smaalders. Optimizing Unix Resource Scheduling for User Interaction. In 1993 Summer Usen,x, pages 205-218. USENIX, June 1993.
|
| |
14
|
|
| |
15
|
D. G. Feltelson and L Rudolph. Gang Scheduling Performance Benefits for Fine-Grained Synchronization Journal of Parallel and D,stmbuted Computing, 16(4):306-318, Dec. 1992
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
 |
19
|
Anoop Gupta , Andrew Tucker , Shigeru Urushibara, The impact of operating system scheduling policies and synchronization methods of performance of parallel applications, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.120-132, May 21-24, 1991, San Diego, California, United States
|
| |
20
|
High Performance Fortran Forum. High Performance Fortran language specification version 1 0 Draft, Jan. 1993
|
 |
21
|
|
 |
22
|
Anna R. Karlin , Kai Li , Mark S. Manasse , Susan Owicki, Empirical studies of competitve spinning for a shared-memory multiprocessor, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.41-55, October 13-16, 1991, Pacific Grove, California, United States
|
| |
23
|
K. Keeton, T. Anderson, and D. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Proceedings of Hot Interconnects II{, Aug. 1995
|
 |
24
|
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
R. P. Martin HPAM An Active Message Layer for a Network of Workstations. In Proceedzngs of Hot Interconnects {I, July 1994.
|
 |
29
|
|
 |
30
|
|
 |
31
|
|
| |
32
|
L McVoy and C. Staelin. lmbench: Portable Tools for Performance Analysis. In Proceedings of the i996 W, nter USENIX, Jan. 1996.
|
 |
33
|
|
| |
34
|
|
| |
35
|
J. K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Th,rd International Conference on D~stmbuted Comput,ng Systems, pages 22-30, May 1982
|
 |
36
|
Vinod G. J. Peris , Mark S. Squillante , Vijay K. Naik, Analysis of the impact of memory in distributed parallel processing systems, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.5-18, May 16-20, 1994, Nashville, Tennessee, United States
|
 |
37
|
|
 |
38
|
|
| |
39
|
|
 |
40
|
T. von Eicken , A. Basu , V. Buch , W. Vogels, U-Net: a user-level network interface for parallel and distributed computing (includes URL), Proceedings of the fifteenth ACM symposium on Operating systems principles, p.40-53, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
41
|
Thorsten von Eicken , David E. Culler , Seth Copen Goldstein , Klaus Erik Schauser, Active messages: a mechanism for integrated communication and computation, Proceedings of the 19th annual international symposium on Computer architecture, p.256-266, May 19-21, 1992, Queensland, Australia
|
 |
42
|
Robert W. Wisniewski , Leonidas I. Kontothanassis , Michael L. Scott, High performance synchronization algorithms for multiprogrammed multiprocessors, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.199-206, July 19-21, 1995, Santa Barbara, California, United States
|
| |
43
|
J. Zahor9an and E. D. Lazowska. Spinning Versus Blocking m Parallel ~y~tems with Uncertain~y. In Proceedings of the IFIP International Seminar on Performance of D,stmbuted and Parallel Systems, pages 455-472, Dec. 1988.
|
CITED BY 28
|
|
|
|
|
Yanyong Zhang , Anand Sivasubramaniam , Jose Moreira , Hubertus Franke, A simulation-based study of scheduling mechanisms for a dynamic cluster environment, Proceedings of the 14th international conference on Supercomputing, p.100-109, May 08-11, 2000, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sunil Nakrani , Craig Tovey, On Honey Bees and Dynamic Server Allocation in Internet Hosting Centers, Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, v.12 n.3-4, p.223-240, September-December 2004
|
|
|
Rajesh Balan , Jason Flinn , M. Satyanarayanan , Shafeeq Sinnamohideen , Hen-I Yang, The case for cyber foraging, Proceedings of the 10th workshop on ACM SIGOPS European workshop: beyond the PC, July 01-01, 2002, Saint-Emilion, France
|
|
|
Douglas P. Ghormley , David Petrou , Steven H. Rodrigues , Thomas E. Anderson, SLIC: an extensibility system for commodity operating systems, Proceedings of the Annual Technical Conference on USENIX Annual Technical Conference, 1998, p.4-4, June 15-19, 1998, New Orleans, Louisiana
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Terry Jones , Shawn Dawson , Rob Neely , William Tuel , Larry Brenner , Jeffrey Fier , Robert Blackmore , Patrick Caffrey , Brian Maskell , Paul Tomlinson , Mark Roberts, Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.10, November 15-21, 2003
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Danhua Guo , Guangdeng Liao , Laxmi N. Bhuyan , Bin Liu , Jianxun Jason Ding, A scalable multithreaded L7-filter design for multi-core servers, Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, November 06-07, 2008, San Jose, California
|
|