|
ABSTRACT
This paper addresses the problem of scheduling concurrent jobs on clusters where application data is stored on the computing nodes. This setting, in which scheduling computations close to their data is crucial for performance, is increasingly common and arises in systems such as MapReduce, Hadoop, and Dryad as well as many grid-computing environments. We argue that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures. The problem of scheduling with locality and fairness constraints has not previously been extensively studied under this resource-sharing model. We introduce a powerful and flexible new framework for scheduling concurrent distributed jobs with fine-grain resource sharing. The scheduling problem is mapped to a graph datastructure, where edge weights and capacities encode the competing demands of data locality, fairness, and starvation-freedom, and a standard solver computes the optimal online schedule according to a global cost model. We evaluate our implementation of this framework, which we call Quincy, on a cluster of a few hundred computers using a varied workload of data-and CPU-intensive jobs. We evaluate Quincy against an existing queue-based algorithm and implement several policies for each scheduler, with and without fairness constraints. Quincy gets better fairness when fairness is requested, while substantially improving data locality. The volume of data transferred across the cluster is reduced by up to a factor of 3.9 in our experiments, leading to a throughput increase of up to 40%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The hadoop fair scheduler. https://issues.apache.org/jira/browse/HADOOP-3746.
|
| |
2
|
Open MPI. http://www.open-mpi.org/.
|
| |
3
|
Hadoop wiki. http://wiki.apache.org/hadoop/, April 2008.
|
| |
4
|
P. Agrawal, D. Kifer, and C. Olston. Scheduling Shared Scans of Large Data Files. In Proc. VLDB, pages 958--969, 2008.
|
| |
5
|
K. Amiri, D. Petrou, G. Ganger, and G. Gibson. Dynamic function placement for data-intensive cluster computing. In Usenix Annual Technical Conference, 2000.
|
| |
6
|
J. Bent, D. Thain, A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, and M. Livny. Explicit Control in a Batch-Aware Distributed File System. In Proc. NSDI, March 2004.
|
| |
7
|
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. OSDI, pages 137--150, December 2004.
|
| |
8
|
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
|
| |
9
|
A.C. Dusseau, R.H. Arpaci, and D.E. Culler. Effective distributed scheduling of parallel workloads. SIGMETRICS Performance Evaluation Review, 24(1):25--36, 1996.
|
| |
10
|
D. Feitelson and L. Rudolph. Gang scheduling performance benefits for finegrained synchronization. Journal of Parallel and Distributed Computing, 16(4):306--18, 1992.
|
| |
11
|
L.R. Ford, Jr. and D.R. Fulkerson. Flows in Networks. Princeton Univ. Press, Princeton, NJ, 1962.
|
| |
12
|
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), pages 29--43, October 2003.
|
| |
13
|
A.V. Goldberg. An Efficient Implementation of a Scaling Minimum-Cost Flow Algorithm. J. Alg., 22:1--29, 1997.
|
| |
14
|
A.V. Goldberg and R.E. Tarjan. Finding Minimum-Cost Circulations by Successive Approximation. Math. Oper. Res., 15:430--466, 1990.
|
| |
15
|
A. Gulati, I. Ahmad, and C.A. Waldspurger. PARDA: Proportional Allocation of Resources for Distributed Storage Access. In Proceedings of the Seventh USENIX Conference on File and Storage Technologies (FAST'09), pages 85--98, February 2009.
|
| |
16
|
D.S. Hochbaum, editor. Approximation algorithms for NP-hard problems. PWS Publishing Co., Boston, MA, USA, 1997.
|
| |
17
|
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proc. Eurosys, pages 59--72, March 2007.
|
| |
18
|
R. Jain, D. Chiu, and W. Hawe. A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems. Technical Report TR-301, DEC Research Report, September 1984.
|
| |
19
|
J. Kay and P. Lauder. A fair share scheduler. Communications of the ACM, 31(1):44--55, 1988.
|
| |
20
|
B.W. Lampson. A scheduling philosophy for multi-processing systems. In Proceedings of the first ACM symposium on Operating System Principles (SOSP'67), pages 8.1--8.24, 1967.
|
| |
21
|
V.K. Naik, S.K. Setia, and M.S. Squillante. Performance analysis of job scheduling policies in parallel supercomputing environments. In Proceedings of Supercomputing, pages 824--833, November 1993.
|
| |
22
|
M.G. Norman and P. Thanisch. Models of machines and computation for mapping in multicomputers. ACM Comput. Surv., 25(3):263--302, 1993.
|
| |
23
|
J.B. Orlin. A Faster Strongly Polynomial Minimum Cost Flow Algorithm. J. Oper. Res., 41:338--350, 1993.
|
| |
24
|
J.K. Ousterhout. Scheduling Techniques for Concurrent Systems. In Proceedings of the Third International Conference on Distributed Computing Systems (ICDCS'82), pages 22--30, January 1982.
|
| |
25
|
R. Raman, M. Livny, and M. Solomon. Matchmaking: Distributed Resource Management for High Throughput Computing. In Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing (HPDC 7), July 1998.
|
| |
26
|
R. Raman, M. Livny, and M. Solomon. Policy driven heterogeneous resource co-allocation with gangmatching. In Proc. High Performance Distributed Computing, pages 80--89, 2003.
|
| |
27
|
B. Schroeder and M. Harchol-Balter. Evaluation of task assignment policies for supercomputing servers: The case for load unbalancing and fairness. In Proceedings of High Performance Distributed Computing (HPDC'00), pages 211--219, 2000.
|
| |
28
|
H. Stone. Multiprocessor scheduling with the aid of network flow algorithms. IEEE Transactions on Software Engineering, SE-3(1):85--93, 1977.
|
| |
29
|
D. Thain, T. Tannenbaum, and M. Livny. Distributed Computing in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience, 17(2):323--356, February 2005.
|
| |
30
|
D. Wright. Cheap cycles from the desktop to the dedicated cluster: combining opportunisitc and dedicated scheduling with Condor. In Conference on Linux Clusters: The HPC Revolution, 2001.
|
| |
31
|
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P.K. Gunda, and J. Currey. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. In Proc. OSDI, pages 1--14, San Diego, CA, December 2008.
|
| |
32
|
M. Zaharia, D. Borthakur, J.S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Job Scheduling for Multi-User MapReduce Clusters. Technical Report UCB/EECS-2009-55, University of California at Berkeley, April 2009.
|
| |
33
|
M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, and I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proc. OSDI, pages 29--42, San Diego, CA, December 2008.
|
| |
34
|
X. Zhang, S. Dwarkadas, G. Folkmanis, and K. Shen. Processor Hardware Counter Statistics As A First-Class System Resource. In Proceedings of 11th Workshop on Hot Topics in Operating Systems (HOTOS'07), 2007.
|
|