|
ABSTRACT
Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On benign (load-balanced) traffic, the flattened butterfly approaches the cost/performance of a butterfly network and has roughly half the cost of a comparable performance Clos network.The advantage over the Clos is achieved by eliminating redundant hopswhen they are not needed for load balance. On adversarial traffic, the flattened butterfly matches the cost/performance of a folded-Clos network and provides an order of magnitude better performance than a conventional butterfly.In this case, global adaptive routing is used to switchthe flattened butterfly from minimal to non-minimal routing - usingredundant hops only when they are needed. Minimal and non-minimal, oblivious and adaptive routing algorithms are evaluated on the flattened butterfly.We show that load-balancing adversarial traffic requires non-minimalglobally-adaptive routing and show that sequential allocators are required to avoid transient load imbalance when using adaptive routing algorithms.We also compare the cost of the flattened butterfly to folded-Clos, hypercube,and butterfly networks with identical capacityand show that the flattened butterfly is more cost-efficient thanfolded-Clos and hypercube topologies.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Amphenol. http://www.amphenol.com/.
|
| |
2
|
L. N. Bhuyan and D. P. Agrawal. Generalized hypercube and hyperbus structures for a computer network. IEEE Trans. Computers, 33(4):323--333, 1984.
|
| |
3
|
K.-Y. K. Chang et al. A 0.4-4-Gb/s CMOS Quad Transceiver Cell Using On-Chip Regulated Dual-Loop PLLs. IEEE Journal of Solid--State Circuits, 38(5):747--754, 2003.
|
| |
4
|
C. Clos. A Study of Non-Blocking Switching Networks. The Bell System technical Journal, 32(2):406--424, March 1953.
|
| |
5
|
Cray XT3. http://www.cray.com/products/systems/xt3/.
|
| |
6
|
|
| |
7
|
|
| |
8
|
W. J. Dally, P. P. Carvey, and L. R. Dennison. The Avici Terabit Switch/Router. In Proc. of Hot Interconnects, pages 41--50, August 1998.
|
| |
9
|
|
| |
10
|
|
| |
11
|
Gore. http://www.gore.com/electronics.
|
 |
12
|
E. J. Kim , K. H. Yum , G. M. Link , N. Vijaykrishnan , M. Kandemir , M. J. Irwin , M. Yousif , C. R. Das, Energy optimization techniques in cluster interconnects, Proceedings of the 2003 international symposium on Low power electronics and design, August 25-27, 2003, Seoul, Korea
[doi> 10.1145/871506.871620]
|
 |
13
|
|
 |
14
|
|
| |
15
|
C. P. Kruskal and M. Snir. The performance of multistage interconnection networks for multiprocessors. IEEE Trans. Computers, 32(12):1091--1098, 1983.
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
Microprocessor Report. http://www.mdronline.com/.
|
| |
20
|
|
| |
21
|
G. Pautsch. Thermal Challenges in the Next Generation of Supercomputers. CoolCon, 2005.
|
| |
22
|
G. Pfister. An Introduction to the InfiniBand Arechitecture (http://www.infinibandta.org). IEEE Press, 2001.
|
 |
23
|
|
| |
24
|
S. Scott and G. Thorson. The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus. In Hot Chips 4, Stanford, CA, Aug. 1996.
|
| |
25
|
|
| |
26
|
H. J. Siegel. A model of simd machines and a comparison of various interconnection networks. IEEE Trans. Computers, 28(12):907--917, 1979.
|
| |
27
|
A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.
|
 |
28
|
|
| |
29
|
|
| |
30
|
L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350--361, 1982.
|
| |
31
|
|
| |
32
|
K.-L. J. Wong, H. Hatamkhani, M. Mansuri, and C.-K. K. Yang. A 27-mW 3.6-Gb/s I/O Transceiver. IEEE Journal of
|
| |
33
|
S. Young and S. Yalamanchili. Adaptive routing in generalized hypercube architectures. In Proc. of the IEEE Symposium on Parallel and Distributed Processing, pages 564--571, Dallas, TX, Dec. 1991.
|
CITED BY 5
|
|
|
|
|
|
|
|
|
|
|
Radu Marculescu , Umit Y. Ogras , Li-Shiuan Peh , Natalie Enright Jerger , Yatin Hoskote, Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, v.28 n.1, p.3-21, January 2009
|
|
|
Jung Ho Ahn , Nathan Binkert , Al Davis , Moray McLaren , Robert S. Schreiber, HyperX: topology, routing, and packaging of efficient large-scale networks, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, November 14-20, 2009, Portland, Oregon
|
INDEX TERMS
Primary Classification:
C.
Computer Systems Organization
C.1
PROCESSOR ARCHITECTURES
C.1.2
Multiple Data Stream Architectures (Multiprocessors)
Subjects:
Interconnection architectures (e.g., common bus, multiport memory, crossbar switch)
Additional Classification:
B.
Hardware
B.4
INPUT/OUTPUT AND DATA COMMUNICATIONS
B.4.3
Interconnections (subsystems)
Subjects:
Topology (e.g., bus, point-to-point)
General Terms:
Design,
Performance
Keywords:
cost model,
flattened butterfly,
global adaptive routing,
high-radix routers,
interconnection networks,
topology
|