|
ABSTRACT
The emerging class of adaptive, real-time, data-driven applications are a significant problem for today's HPC systems. In general, it is extremely difficult for queuing-system-controlled HPC resources to make and guarantee a tightly-bounded prediction regarding the time at which a newly-submitted application will execute. While a reservation-based approach partially addresses the problem, it can create severe resource under-utilization (unused reservations, necessary scheduled idle slots, underutilized reservations, etc.) that resource providers are eager to avoid. In contrast, this paper presents a fundamentally different approach to guarantee predictable execution. By creating a virtualized application layer called the performance container, and opportunistically multiplexing concurrent performance containers through the application of formal feedback control theory, we regulate the job's progress such that the job meets its deadline without requiring exclusive access to resources even in the presence of a wide class of unexpected disturbances. Our evaluation using two widely-used applications, WRF and BLAST, on an 8-core server show our approach is predictable and meets deadlines with 3.4 % of errors on average while achieving high overall utilization.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
J. Michalakes, S. Chen, J. Dudhia, L. Hart, J. Klemp, J. Middleco, W. Skamarock. Development of a Next-generation Regional Weather Research and Forecast Model. ECMWF Workshop on the use of Parallel Processors in Meteorology, Reading, U.K., November 2000.
|
| |
3
|
BLAST: Basic Local Alignment and Search Tool (http://www.ncbi.nlm.nih.gov/blast/)
|
| |
4
|
I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, A. Roy. A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation. International Workshop on Quality of Service, 1999.
|
| |
5
|
R. J. Al-ali, K. Amin, G. V. Laszewski, O. F. Lana, D. W. Walker, M. Hategan, N. Zaluzec. Analysis and Provision of QoS for Distributed Grid Applications. Journal of Grid Computing, 2004.
|
| |
6
|
Andrew Stephen Mcgough , Ali Afzal , John Darlington , Nathalie Furmento , Anthony Mayer , Laurie Young, Making the Grid Predictable through Reservations and Performance Modelling, The Computer Journal, v.48 n.3, p.358-368, May 2005
[doi> 10.1093/comjnl/bxh091]
|
| |
7
|
|
| |
8
|
G. Singh, C. Kesselman, E. Deelman. Adaptive Pricing for Resource Reservations in Shared Environments. IEEE/ACM International Conference on Grid Computing, 2007.
|
| |
9
|
P. Beckman, S. Nadella, N. Trebon, I. Beschastnikh. SPRUCE: A System for Supporting Urgent High-Performance Computing. Pg 295--316 in Grid-Based Problem Solving Environments by Springer Press, 2007.
|
| |
10
|
|
| |
11
|
Windows Server 2008 Hyper-V. http://www.microsoft.com/windowsserver2008/en/us/virtualizati on-consolidation.aspx
|
 |
12
|
Paul Barham , Boris Dragovic , Keir Fraser , Steven Hand , Tim Harris , Alex Ho , Rolf Neugebauer , Ian Pratt , Andrew Warfield, Xen and the art of virtualization, Proceedings of the nineteenth ACM symposium on Operating systems principles, October 19-22, 2003, Bolton Landing, NY, USA
|
| |
13
|
|
| |
14
|
T. Freeman, K. Keahey, I. Foster, A. Rana, B. Sotomayor, F. Wuerthwein. Division of Labor: Tools for Growth and Scalability of Grids. International Conference on Service Oriented Computing, Chicago, IL. December 2006.
|
| |
15
|
I. Foster , T. Freeman , K. Keahy , D. Scheftner , B. Sotomayer , X. Zhang, Virtual Clusters for Grid Communities, Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid, p.513-520, May 16-19, 2006
[doi> 10.1109/CCGRID.2006.108]
|
| |
16
|
L. Youseff, R. Wolski, B. Gorda, C. Krintz. Paravirtualization for HPC Systems. Workshop on XEN in HPC Cluster and Grid Computing Environments (XHPC), held in conjunction with ISPA, December 2006.
|
 |
17
|
|
| |
18
|
David Irwin , Jeffrey Chase , Laura Grit , Aydan Yumerefendi , David Becker , Kenneth G. Yocum, Sharing networked resources with brokered leases, Proceedings of the annual conference on USENIX '06 Annual Technical Conference, p.18-18, May 30-June 03, 2006, Boston, MA
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Chenyang Lu , John A. Stankovic , Sang H. Son , Gang Tao, Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms*, Real-Time Systems, v.23 n.1-2, p.85-126, July-September 2002
[doi> 10.1023/A:1015398403337]
|
 |
23
|
|
| |
24
|
B. Plale, D. Gannon, D. Reed, S. Graves, K. Droegemeier, B. Wilhelmson, M. Ramamurthy. Towards Dynamically Adaptive Weather Analysis and Forecasting in LEAD. ICCS workshop on Dynamic Data Driven Applications, Atlanta, Georgia, May 2005.
|
| |
25
|
SURA Coastal Ocean Observing and Prediction (SCOOP): http://www.scoop.lsu.edu/gridsphere/gridsphere
|
| |
26
|
Grid Enabled Neurosurgical Imaging Using Simulation: http://wiki.realitygrid.org/wiki/GENIUS
|
| |
27
|
ShakeMovie: Caltech's Near Real Time Simulation of Southern California Seismic Events Portal. http://shakemovie.caltech.edu/
|
| |
28
|
S. Browne , J. Dongarra , N. Garner , K. London , P. Mucci, A scalable cross-platform infrastructure for application performance tuning using hardware counters, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.42-es, November 04-10, 2000, Dallas, Texas, United States
|
 |
29
|
Aravind Menon , Jose Renato Santos , Yoshio Turner , G. (John) Janakiraman , Willy Zwaenepoel, Diagnosing performance overheads in the xen virtual machine environment, Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments, June 11-12, 2005, Chicago, IL, USA
[doi> 10.1145/1064979.1064984]
|
| |
30
|
|
| |
31
|
|
| |
32
|
S-M. Park and M. Humphrey. Data Throttling for Data-Intensive Workflows. IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008). April 14--18, 2008.
|
|