|
ABSTRACT
Congestion arises in cluster-based supercomputers due to contention for links, spreads due to oversubscription of communication resources, and reduces performance. We mitigate it using efficient, scalable adaptive routing and explicit rate calculation. We use virtual circuits for in-order packet delivery; path setup is performed by switches locally with no blocking or backtracking. For random permutations in a slightly enriched fat-tree topology, maximum contention is reduced by up to 50% relative to static routing, but only rate control can translate this into actual gain. Unfortunately, TCP's window-based rate control fails because of the low bandwidth-delay product, and small buffers moreover cause congestion spreading even with a single-packet window. InfiniBand's CCA employs multiple parameters, which must apparently be tuned per topology and traffic pattern. Focusing on phase-based applications, we present a distributed explicit rate-assignment algorithm for completion-time minimization of the communication phase (min-max flow completion). Also, a generally applicable packet-injection scheme for a source with different-rate flows that realizes desired rates even with very small switch buffers. Simulations show that adaptive routing alone is ineffective, rate control's effectiveness is limited, yet together they shorten the communication phase by tens of percents. Finally, our explicit rate-calculation algorithm is faster than current reactive schemes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
"http://www.top500.org."
|
| |
2
|
InfiniBand#8482; Trade Association, "Infiniband#8482; architecture specification, release 1.2," InfiniBand Trade Association, Oct 2004.
|
| |
3
|
IEEE Computer Society, "IEEE Std 802.3#8482;-2005," Dec 2005.
|
| |
4
|
G. Pfister and V. Norton, "Hot spot contention and combining in multistage interconnection networks," IEEE Trans. on Computers, vol. 34, no. 10, pp. 943--948, 1985.
|
| |
5
|
V. Zdornov and Y. Birk, "Mitigating congestion in high-speed interconnects for computer clusters," Technion -- Israel Institute of Technology, CCIT 723, Mar 2009.
|
| |
6
|
|
| |
7
|
"http://www.omnetpp.org."
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
A. Mejia, J. Flich, J. Duato, S. A. Reinemo, and T. Skeie, "Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori," in Proc. 20th Int'l Parallel and Distributed Processing Symp. (IPDPS), Apr 2006, p. 10pp.
|
| |
14
|
|
| |
15
|
C. Gomez, F. Gilabert, M. E. Gomez, P. Lopez, and J. Duato, "Deterministic versus adaptive routing in fat-trees," in Proc. IEEE Int'l Parallel and Distributed Processing Symp. (IPDPS), Mar 2007, pp. 1--8.
|
 |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
X.-Y. Lin, Y.-C. Chung, and T.-Y. Huang, "A multiple lid routing scheme for fat-tree-based infiniband networks," in Proc. 18th Int'l Parallel and Distributed Processing Symp. (IPDPS), 2004.
|
| |
20
|
A. Vishnu , M. Koop , A. Moody , A. R. Mamidala , S. Narravula , D. K. Panda, Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective, Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, p.479-486, May 14-17, 2007
[doi> 10.1109/CCGRID.2007.60]
|
| |
21
|
|
| |
22
|
Y. Turner and Y. Tamir, "Connection-based adaptive routing using dynamic virtual circuits," in Proc. International Conference on Parallel and Distributed Computing and Systems, 1998, pp. 379--384.
|
| |
23
|
|
| |
24
|
L. S. Brakmo and L. L. Peterson, "TCP vegas: End to end congestion avoidance on a global internet," IEEE Journal on Selected Areas in Communications, vol. 13, no. 8, pp. 1465--1480, 1995.
|
| |
25
|
|
 |
26
|
Dina Katabi , Mark Handley , Charlie Rohrs, Congestion control for high bandwidth-delay product networks, Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, August 19-23, 2002, Pittsburgh, Pennsylvania, USA
|
| |
27
|
L. Xu, K. Harfoush, and I. Rhee, "Binary increase congestion control for fast long-distance networks," in Proc. IEEE INFOCOM, vol. 4, 2004, pp. 2514--2524.
|
 |
28
|
|
| |
29
|
J. R. Santos, Y. Turner, and G. J. Janakiraman, "End-to-end congestion control for infiniband," in Proc. IEEE INFOCOM, vol. 2, 2003, pp. 1123--1133.
|
| |
30
|
M. Gusat , D. Craddock , W. Denzel , T. Engbersen , N. Ni , G. Pfister , W. Rooney , J. Duato, Congestion Control in InfiniBand Networks, Proceedings of the 13th Symposium on High Performance Interconnects, p.158-159, August 17-19, 2005
[doi> 10.1109/CONECT.2005.14]
|
| |
31
|
A. Charny, D. Clark, and R. Jain, "Congestion control with explicit rate indication," in Proc. IEEE Int'l Conf. on Communications (ICC), 1995, pp. 1954--1963.
|
| |
32
|
|
| |
33
|
|
| |
34
|
R. Jain, S. Kalyanaraman, and R. Viswanathan, "The OSU scheme for congestion avoidance in ATM networks using explicit rate indication," in Proc. WATM First Workshop on ATM Traffic Management, 1995.
|
| |
35
|
H. Tzeng and K. Siu, "Adaptive proportional rate control for abr service in atm networks," in Proc. IEEE Computers and Communications, 1995, pp. 529--535.
|
 |
36
|
Yehuda Afek , Yishay Mansour , Zvi Ostfeld, Phantom: a simple and effective flow control scheme, Conference proceedings on Applications, technologies, architectures, and protocols for computer communications, p.169-182, August 28-30, 1996, Palo Alto, California, United States
|
| |
37
|
|
 |
38
|
|
| |
39
|
|
| |
40
|
L. Mamatas, T. Harks, and V. Tsaoussidis, "Approaches to congestion control in packet networks," Journal of Internet Engineering, vol. 1, no. 1, pp. 22--33, 2007.
|
| |
41
|
J. Turner, "New directions in communications (or which way to the information age?)," vol. 24, no. 10, pp. 8--15, Oct 1986.
|
|