|
ABSTRACT
Data-intensive applications frequently transfer large amounts of data over wide-area networks. The performance achieved in such settings can often be improved by routing data via intermediate nodes chosen to increase aggregate bandwidth. We explore the benefits of overlay network approaches by designing and implementing a service-oriented architecture that incorporates two key optimizations -- multi-hop path splitting and multi-pathing - within the GridFTP file transfer protocol. We develop a file transfer scheduling algorithm that incorporates the two optimizations in conjunction with the use of available file replicas. The algorithm makes use of information from past GridFTP transfers to estimate network bandwidths and resource availability. The effectiveness of these optimizations is evaluated using several application file transfer patterns: one-to-all broadcast, all-to-one gather, and data redistribution, on a wide-area testbed. The experimental results show that our architecture and algorithm achieve significant performance improvement.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
GT C WS core. http://www.globus.org/toolkit/docs/4.0/common/cwscore/.
|
| |
2
|
The Large Haldron Collider (LHC) http://Ihc.web.cern.ch/Ihc/.
|
| |
3
|
Globus FTP Client API, 2002. http://www.globus.org/api/c/globus_ftp_client/html/index.html.
|
| |
4
|
|
| |
5
|
W. Allcock. Gridftp: Protocol extensions to ftp for the grid. In Global Grid ForumGFD-R-P.020, 2003.
|
| |
6
|
|
| |
7
|
|
| |
8
|
M. Beck, T. Moore, J. S. Plank, and M. Swany. Logistical networking: Sharing more than the wires. In S. Hariri, C. A. Lee, and C. S. Raghavendra, editors, Active Middleware Services, Norwell, MA, 2000. Kluwer Academic.
|
 |
9
|
|
| |
10
|
Z. Cai, V. Kumar, and K. Schwan. Iq-paths: Self-regulating data streams across network overlays. In Proceedings of the The 15th IEEE International Symposium on High Performance Distributed Computing (HPDC '06), 2006.
|
| |
11
|
A. Giersch, Y. Robert, and F. Vivien. Scheduling tasks sharing files from distributed repositories. In Euro-Par 2004: Parallel Processing: 10th International Euro-Par Conference, volume 3149 of LNCS, pages 246--253, Sept. 2004.
|
| |
12
|
K. Holtman. Cms data grid system overview and requirements. In Computing in High Energy and Nuclear Physics (CHEP), 2001.
|
 |
13
|
|
| |
14
|
G. Khanna, U. Catalyurek, T. Kurc, P. Sadayappan, and J. Saltz. Scheduling file transfers for data-intensive jobs on heterogeneous clusters. In A.-M. Kermarrec, L. Bougé, and T. Priol, editors, Euro-Par, volume 4641 of Lecture Notes in Computer Science, pages 214--223. Springer, 2007.
|
| |
15
|
G. Khanna, T. Kurc, U. Catalyurek, R. Kettimuthu, P. Sadayappan, and J. Saltz. A dynamic scheduling approach for coordinated wide-area data transfers using gridftp. In Proc. of 22th International Parallel and Distributed Processing Symposium (IPDPS), Miami, Florida, 2008.
|
| |
16
|
Gaurav Khanna , Nagavijayalakshmi Vydyanathan , T. Kurc , U. Catalyurek , P. Wyckoff , J. Saltz , P. Sadayappan, A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O, Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2, p.792-799, May 09-12, 2005
|
 |
17
|
|
| |
18
|
|
| |
19
|
H. Pucha and Y. C. Hu. Overlay tcp: ending end-to-end transport for higher throughput. In Poster in ACM SIGCOMM, Philadelphia, PA, 2005.
|
| |
20
|
P. Rizk, C. Kiddle, and R. Simmonds. A gridftp overlay network service. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, Baercelona, Spain, 2007.
|
 |
21
|
Dan Rubenstein , Jim Kurose , Don Towsley, Detecting shared congestion of flows via end-to-end measurement, Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.145-155, June 18-21, 2000, Santa Clara, California, United States
|
| |
22
|
|
| |
23
|
TeraGrid. http://www.teragrid.org.
|
 |
24
|
Brian Tierney , Jason Lee , Ling Tony Chen , Hanan Herzog , Gary Hoo , Guojun Jin , William E. Johnston, Distributed parallel data storage systems: a scalable approach to high speed image servers, Proceedings of the second ACM international conference on Multimedia, p.399-405, October 15-20, 1994, San Francisco, California, United States
[doi> 10.1145/192593.192709]
|
| |
25
|
Brian L. Tierney , Jason Lee , Brian Crowley , Mason Holding , Jeremy Hylton , Fred L. Drake Jr, A Network-Aware Distributed Storage Cache for Data Intensive Environments, Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, p.33, August 03-06, 1999
|
| |
26
|
|
| |
27
|
|
|