ACM Home Page
Please provide us with feedback. Feedback
A fast topology inference: a building block for network-aware parallel processing
Full text PdfPdf (750 KB)
Source
High Performance Distributed Computing archive
Proceedings of the 16th international symposium on High performance distributed computing table of contents
Monterey, California, USA
SESSION: Networking table of contents
Pages: 11 - 22  
Year of Publication: 2007
ISBN:978-1-59593-673-8
Authors
Tatsuya Shirai  University of Tokyo
Hideo Saito  University of Tokyo
Kenjiro Taura  University of Tokyo
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 72,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1272366.1272369
What is a DOI?

ABSTRACT

Adapting to the network is the key to achieving high performance for communication-intensive applications, including scientific computing,data intensive computing, and multicast, especially in Grid environments. This paper investigates an approach of representing network as a tree of participating hosts and switches matching or approximating their physical topology, and describes a fast, non-intrusive, and portable algorithm for inferring such a topology. This representation and the proposed inference algorithm serves as a key to building network-aware applications in a portable manner. The algorithm is based solely on RTTs of small packets between end hosts; it does not rely on popular but not universally available protocols such as trace route and SNMP. Another benefit is that it can handle all layers of network uniformly without any a priori knowledge of cluster configurations. The required number of measurements is O(Nd) in certain idealizing assumptions made for the purpose of analysis, where N is the number of participating processes and d the diameter of the network, which is usually small in real networks. In our experimental environment, the inference algorithm built a topology of 64 hosts in a single cluster in 4 seconds and and that of 256 hosts across 4 clusters in 15 seconds. It is able to not only identify clusters within a Grid, but also to partially identify the Layer 2 topology within a cluster. This is important for optimizing bandwidth-limited operations such as broadcast. We built several network-aware applications upon the inference system, including efficient bandwidth measurements and long message broadcasts. The topology is used to schedule as many measurements as possible in parallel without competing on shared links. We were able to build a bandwidth map of 256 hosts across 4 clusters in 27 seconds.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
5
6
 
7
 
8
9
 
10
N. Duffield, J. Horowitz, F. L. Presti, and D. Towsley. Multicast topology inference from measured end-to-end loss. IEEE Transactions in Information Theory, 48: 26--45, 2002.
 
11
 
12
 
13
14
15
 
16
 
17
G. Shao, F. Berman, and R. Wolski. Using Effective Network Views to Promote Distributed Application Performance. In Proceedings of the 1999 International Conference on Parallel and Distributed Processing Techniques and Applications, pages 2649--2656, 1999.
 
18
Skitter. http://www.caida.org/tools/measurement/skitter.
 
19


Collaborative Colleagues:
Tatsuya Shirai: colleagues
Hideo Saito: colleagues
Kenjiro Taura: colleagues