|
ABSTRACT
We present a system for allocating resources in shared data and compute clusters that improves MapReduce job scheduling in three ways. First, the system uses regulated and user-assigned priorities to offer different service levels to jobs and users over time. Second, the system dynamically adjusts resource allocations to fit the requirements of different job stages. Finally, the system automatically detects and eliminates bottlenecks within a job. We show experimentally using real applications that users can optimize not only job execution time but also the cost-benefit ratio or prioritization efficiency of a job using these three strategies. Our approach relies on a proportional share mechanism that continuously allocates virtual machine resources. Our experimental results show a 11-31% improvement in completion time and 4-187% improvement in prioritization efficiency for different classes of MapReduce jobs. We further show that delay intolerant users gain even more from our system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
K. Arrow. Aspects of the theory of risk-bearing. Helsinki: Yrjo Jahnsson Lectures, 1965.
|
| |
2
|
A. AuYoung, L. Grit, J. Wiener, and J. Wilkes. Service contracts and aggregate utility functions. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC), June 2006.
|
 |
3
|
|
 |
4
|
Paul Barham , Boris Dragovic , Keir Fraser , Steven Hand , Tim Harris , Alex Ho , Rolf Neugebauer , Ian Pratt , Andrew Warfield, Xen and the art of virtualization, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
| |
5
|
R.E. Bryant. Data-intensive supercomputing: The case for DISC. Technical Report CMU-CS-07-128, Carnegie Mellon University, 2007.
|
| |
6
|
K. Cardona, J. Secretan, M. Georgiopoulos, and G. Anagnostopoulos. A grid based system for data mining using MapReduce. Technical Report TR-2007-02, AMALTHEA, 2007.
|
| |
7
|
B. N. Chun , P. Buonadonna , A. AuYoung , Chaki Ng , D. C. Parkes , J. Shneidman , A. C. Snoeren , A. Vahdat, Mirage: a microeconomic resource allocation system for sensornet testbeds, Proceedings of the 2nd IEEE workshop on Embedded Networked Sensors, p.19-28, April 30-May 01, 2005
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
 |
11
|
Michal Feldman , Kevin Lai , Li Zhang, A price-anticipating resource allocation mechanism for distributed shared clusters, Proceedings of the 6th ACM conference on Electronic commerce, p.127-136, June 05-08, 2005, Vancouver, BC, Canada
[doi> 10.1145/1064009.1064023]
|
| |
12
|
|
| |
13
|
G. Hardin. The tragedy of the commons. Science, 162:1243--1248, 1968.
|
 |
14
|
Bingsheng He , Wenbin Fang , Qiong Luo , Naga K. Govindaraju , Tuyong Wang, Mars: a MapReduce framework on graphics processors, Proceedings of the 17th international conference on Parallel architectures and compilation techniques, October 25-29, 2008, Toronto, Ontario, Canada
[doi> 10.1145/1454115.1454152]
|
| |
15
|
|
 |
16
|
Michael Isard , Mihai Budiu , Yuan Yu , Andrew Birrell , Dennis Fetterly, Dryad: distributed data-parallel programs from sequential building blocks, Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, March 21-23, 2007, Lisbon, Portugal
|
 |
17
|
|
| |
18
|
E. Jensen, C. Locke, and H. Tokuda. A time-driven scheduling model for real-time operating systems. In IEEE Real-Time Systems Symposium , pages 112--122, 1985.
|
| |
19
|
Kevin Lai , Lars Rasmusson , Eytan Adar , Li Zhang , Bernardo A. Huberman, Tycoon: An implementation of a distributed, market-based resource allocation system, Multiagent and Grid Systems, v.1 n.3, p.169-182, August 2005
|
| |
20
|
|
| |
21
|
N. Moroney, P. Obrador, and G. Beretta. Lexical image processing. In Proceedings of the 16th IS&T/SID Color Imaging Conference, pages 268--273, 2008.
|
| |
22
|
C. Olston. Pig: Web-scale processing. http://www.cs.cmu.edu/~olston/pig.ppt, 2008.
|
| |
23
|
Christopher Olston , Benjamin Reed , Adam Silberstein , Utkarsh Srivastava, Automatic optimization of parallel dataflow programs, USENIX 2008 Annual Technical Conference on Annual Technical Conference, p.267-273, June 22-27, 2008, Boston, Massachusetts
|
 |
24
|
Christopher Olston , Benjamin Reed , Utkarsh Srivastava , Ravi Kumar , Andrew Tomkins, Pig latin: a not-so-foreign language for data processing, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
[doi> 10.1145/1376616.1376726]
|
 |
25
|
|
| |
26
|
L. Peterson, T. Anderson, D. Culler, and T. Roscoe. Blueprint for Introducing Disruptive Technology into the Internet. In First Workshop on Hot Topics in Networking, 2002.
|
| |
27
|
|
| |
28
|
|
| |
29
|
J. Pratt. Risk aversion in the small and in the large. Econometrica, 32:122--136, 1964.
|
| |
30
|
|
| |
31
|
T. Sandholm. Statistical methods for computational markets.Doctoral Thesis ISRN SU-KTH/DSV/R-08/6-SE. Royal Institute of Technology, Stockholm, 2008.
|
 |
32
|
|
| |
33
|
T. Sandholm, K. Lai, J. Andrade, and J. Odeberg. Market-based resource allocation using price prediction in a high performance computing grid for scientific applications. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC), June 2006.
|
 |
34
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
| |
35
|
|
| |
36
|
Michael Stonebraker , Paul M. Aoki , Witold Litwin , Avi Pfeffer , Adam Sah , Jeff Sidell , Carl Staelin , Andrew Yu, Mariposa: a wide-area distributed database system, The VLDB Journal — The International Journal on Very Large Data Bases, v.5 n.1, p.048-063, January 1996
[doi> 10.1007/s007780050015]
|
| |
37
|
|
| |
38
|
Matthew Wachs , Michael Abd-El-Malek , Eno Thereska , Gregory R. Ganger, Argon: performance insulation for shared storage servers, Proceedings of the 5th USENIX conference on File and Storage Technologies, p.5-5, February 13-16, 2007, San Jose, CA
|
| |
39
|
|
 |
40
|
|
 |
41
|
|
| |
42
|
M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In OSDI'08: 8th USENIX Symposium on Operating Systems Design and Implementation, 2008.
|
| |
43
|
L. Zhang. The efficiency and fairness of a fixed budget resource allocation game. In International Colloquium on Automata, Languages and Programming, pages 485--496, 2005.
|
|