|
ABSTRACT
An interconnection pattern of processing elements, the cube-connected cycles (CCC), is introduced which can be used as a general purpose parallel processor. Because its design complies with present technological constraints, the CCC can also be used in the layout of many specialized large scale integrated circuits (VLSI). By combining the principles of parallelism and pipelining, the CCC can emulate the cube-connected machine and the shuffle-exchange network with no significant degradation of performance but with a more compact structure. We describe in detail how to program the CCC for efficiently solving a large class of problems that include Fast Fourier transform, sorting, permutations, and derived algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Barnes, G.H., Brown, R.M., Kato, M., Kuck, D.J,, Slotnick, DL, and Stokes, R.A. The ILLIAC IV computer. IEEE Trans. Comput. C-17, 8, (Aug. 1968), 746-757.
|
| |
3
|
Batcher, K.E., Sorting networks and their applications. Proc. AF1PS Spring Joint Computer Conf. 32, Atlantic City, N J, April 1968, 307-314.
|
 |
4
|
|
 |
5
|
|
| |
6
|
Guibas, L.J., Kung, H.T., and Thompson, C.D., Direct VSLI implementation of combinatorial algorithms. Res. Rept., Dept. Comp. Sci., Carnegie-Mellon Univ., Pittsburgh, PA, March 1979.
|
| |
7
|
Heller, D., A survey of parallel algorithms in numerical linear algebra. Dept. Comp. Sci., Carnegie-Mellon Univ., Pittsburgh, PA, Feb. 1976.
|
 |
8
|
|
| |
9
|
Hoey, D., and Leiserson, C.L., A layout for the shuffle-exchange network. Proc. 1980 Int'l Conf. on Parallel Processing, Boyne, MI, Aug. 1980, 327-336.
|
| |
10
|
|
| |
11
|
Kung, H.T., and Leiserson, C.E., Algorithms for VSLI processor arrays. Symp. on Sparse Matrix Computations, Knoxville, TN, Nov. 1978.
|
 |
12
|
|
| |
13
|
|
| |
14
|
Nassimi, D., and Sahni, S., Bitonic sort on a mesh-connected parallel computer. IEEE Trans. Comput. C-28, 1, (Jan. 1979) 2-7.
|
| |
15
|
Pease, M.C., The indirect binary n-cube microprocessor array. 1EEE Trans. Comput. C-26, 5, (May 1977) 458-473.
|
 |
16
|
|
| |
17
|
Preparata, F.P., and Vuillemin, J., Area time optimal VLSI networks based in the cube-connected cycles. Tech. Rept. INRIA n.13, Rocquencourt, France and ACT-.21, Coord. Sci. Lab., Univ. Illinois, Urbana, IL, 1980.
|
| |
18
|
Preparata, F.P., New parallel sorting schemes. 1EEE Trans. Comput. C-27, 7, (July 1978) 669-673.
|
| |
19
|
Steinberg, D., and Rodeh, M., A layout for the shuffle-exchange network with O(A T/log3/2N) area. (Submitted for publication).
|
| |
20
|
Stone, H.S,, Parallel processing with the perfect shuffle. IEEE Trans. Comput. C-20, 2, (Feb. 1971) 153-161.
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
Valiant, L.G., Parallelism in comparison problems. SlAM J. Comput., 4, 3, (Sept. 1975) 348-355.
|
| |
25
|
Vuillemin, J., A combinatorial limit to the computing power of VSLI circuits. Proc. 21st Symp. on Foundations of Computer Science. Syracuse, NY, Oct. 1980, 294-300
|
 |
26
|
|
CITED BY 159
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sandeep N. Bhatt , Fan R. K. Chung , Jia-Wei Hong , F. Thomson Leighton , Bojana Obrenic , Arnold L. Rosenberg , Eric J. Schwabe, Optimal emulations by butterfly-like networks, Journal of the ACM (JACM), v.43 n.2, p.293-330, March 1996
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sandeep Bhatt , Fan Chung , Jia-Wei Hong , Arnold Rosenberg, Optimal simulations by Butterfly Networks, Proceedings of the twentieth annual ACM symposium on Theory of computing, p.192-204, May 02-04, 1988, Chicago, Illinois, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yulu Yang , Akira Funahashi , Akiya Jouraku , Hiroaki Nishi , Hideharu Amano , Toshinori Sueyoshi, Recursive Diagonal Torus: An Interconnection Network for Massively Parallel Computers, IEEE Transactions on Parallel and Distributed Systems, v.12 n.7, p.701-715, July 2001
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Gene Eu Jan , Shao-Wei Leu , Cheng-Hung Li , Xiaoshe Dong, A perfect load balancing algorithm on cube-connected cycles, Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics, p.345-350, November 20-22, 2006, Venice, Italy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
G. Bilardi , S. W. Hornick , M. Sarrafzadeh, Optimal VLSI architectures for multidimensional DFT, Proceedings of the first annual ACM symposium on Parallel algorithms and architectures, p.265-272, June 18-21, 1989, Santa Fe, New Mexico, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|