ACM Home Page
Please provide us with feedback. Feedback
Improving communication-phase completion times in HPC clusters through congestion mitigation
Full text PdfPdf (345 KB)
Source ACM International Conference Proceeding Series archive
Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference table of contents
Haifa, Israel
SESSION: Performance optimization and testing table of contents
Article No. 16  
Year of Publication: 2009
ISBN:978-1-60558-623-6
Authors
Yitzhak Birk  Israel Institute of Technology, Haifa, Israel
Vladimir Zdornov  Israel Institute of Technology, Haifa, Israel
Sponsors
: Melanox Technologies
: Hebrew University of Jerusalem
IBM : IBM
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 40,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1534530.1534552
What is a DOI?

ABSTRACT

Congestion arises in cluster-based supercomputers due to contention for links, spreads due to oversubscription of communication resources, and reduces performance. We mitigate it using efficient, scalable adaptive routing and explicit rate calculation. We use virtual circuits for in-order packet delivery; path setup is performed by switches locally with no blocking or backtracking. For random permutations in a slightly enriched fat-tree topology, maximum contention is reduced by up to 50% relative to static routing, but only rate control can translate this into actual gain. Unfortunately, TCP's window-based rate control fails because of the low bandwidth-delay product, and small buffers moreover cause congestion spreading even with a single-packet window. InfiniBand's CCA employs multiple parameters, which must apparently be tuned per topology and traffic pattern. Focusing on phase-based applications, we present a distributed explicit rate-assignment algorithm for completion-time minimization of the communication phase (min-max flow completion). Also, a generally applicable packet-injection scheme for a source with different-rate flows that realizes desired rates even with very small switch buffers. Simulations show that adaptive routing alone is ineffective, rate control's effectiveness is limited, yet together they shorten the communication phase by tens of percents. Finally, our explicit rate-calculation algorithm is faster than current reactive schemes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
"http://www.top500.org."
 
2
InfiniBand#8482; Trade Association, "Infiniband#8482; architecture specification, release 1.2," InfiniBand Trade Association, Oct 2004.
 
3
IEEE Computer Society, "IEEE Std 802.3#8482;-2005," Dec 2005.
 
4
G. Pfister and V. Norton, "Hot spot contention and combining in multistage interconnection networks," IEEE Trans. on Computers, vol. 34, no. 10, pp. 943--948, 1985.
 
5
V. Zdornov and Y. Birk, "Mitigating congestion in high-speed interconnects for computer clusters," Technion -- Israel Institute of Technology, CCIT 723, Mar 2009.
 
6
 
7
"http://www.omnetpp.org."
 
8
 
9
 
10
 
11
 
12
 
13
A. Mejia, J. Flich, J. Duato, S. A. Reinemo, and T. Skeie, "Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori," in Proc. 20th Int'l Parallel and Distributed Processing Symp. (IPDPS), Apr 2006, p. 10pp.
 
14
 
15
C. Gomez, F. Gilabert, M. E. Gomez, P. Lopez, and J. Duato, "Deterministic versus adaptive routing in fat-trees," in Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS), Mar 2007, pp. 1--8.
16
 
17
 
18
 
19
X.-Y. Lin, Y.-C. Chung, and T.-Y. Huang, "A multiple lid routing scheme for fat-tree-based infiniband networks," in Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2004.
 
20
 
21
 
22
Y. Turner and Y. Tamir, "Connection-based adaptive routing using dynamic virtual circuits," in Proc. International Conference on Parallel and Distributed Computing and Systems, 1998, pp. 379--384.
 
23
 
24
L. S. Brakmo and L. L. Peterson, "TCP vegas: End to end congestion avoidance on a global internet," IEEE Journal on Selected Areas in Communications, vol. 13, no. 8, pp. 1465--1480, 1995.
 
25
26
 
27
L. Xu, K. Harfoush, and I. Rhee, "Binary increase congestion control for fast long-distance networks," in Proc. IEEE INFOCOM, vol. 4, 2004, pp. 2514--2524.
28
 
29
J. R. Santos, Y. Turner, and G. J. Janakiraman, "End-to-end congestion control for infiniband," in Proc. IEEE INFOCOM, vol. 2, 2003, pp. 1123--1133.
 
30
 
31
A. Charny, D. Clark, and R. Jain, "Congestion control with explicit rate indication," in Proc. IEEE Int'l Conf. on Communications (ICC), 1995, pp. 1954--1963.
 
32
 
33
 
34
R. Jain, S. Kalyanaraman, and R. Viswanathan, "The OSU scheme for congestion avoidance in ATM networks using explicit rate indication," in Proc. WATM First Workshop on ATM Traffic Management, 1995.
 
35
H. Tzeng and K. Siu, "Adaptive proportional rate control for abr service in atm networks," in Proc. IEEE Computers and Communications, 1995, pp. 529--535.
36
 
37
38
 
39
 
40
L. Mamatas, T. Harks, and V. Tsaoussidis, "Approaches to congestion control in packet networks," Journal of Internet Engineering, vol. 1, no. 1, pp. 22--33, 2007.
 
41
J. Turner, "New directions in communications (or which way to the information age?)," vol. 24, no. 10, pp. 8--15, Oct 1986.

Collaborative Colleagues:
Yitzhak Birk: colleagues
Vladimir Zdornov: colleagues