|
ABSTRACT
The design of high-throughput large-state Viterbi decoders relies on the use of multiple arithmetic units. The global communication channels among these parallel processors often consist of long interconnect wires, resulting in large area and high power consumption. In this paper, we propose a data-transfer oriented design methodology to implement a low-power 256-state rate-1/3 IS95 Viterbi decoder. Our architectural level scheme uses operation partitioning, packing, and scheduling to analyze and optimize interconnect effects in early design stages. In comparison with other published Viterbi decoders, our approach reduces the global data transfers by up to 75% and decreases the amount of global buses by up to 48%, while enabling the use of deeply pipelined datapaths with no data forwarding. In the RTL implementation of the individual processors, we apply precomputation in conjunction with saturation arithmetic to further reduce power dissipation with provably no coding performance degradation. Designed using a 0.25 &mgr; standard cell library, our decoder achieves a throughput of 20 Mbps in simulation and dissipates only 450 mW.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. J. Black and T. H. Meng. A 1-Gb/s four-state sliding block Viterbi decoder. IEEE J. of Solid-State Circuits, 32(6):797--805, June 1997.
|
| |
2
|
M. Bóo, F. Argüello, J. D. Bruguera, R. Doallo, and E. L. Zapata. High performance VLSI architecture for the Viterbi algorithm. IEEE Trans. Communications, 45(2):168--176, Feb. 1997.
|
| |
3
|
Y. Chang, H. Suzuki, and K. K. Parhi. A 2-Mb/s 256-state 10-mW rate-1/3 Viterbi decoder. IEEE J. of Solid-State Circuits, 35(6):826--834, June 2000.
|
| |
4
|
P. Chau and K. Stephen. Scaling and folding the Viterbi algorithm trellis. In Workshop on VLSI Signal Processing, pages 479--489, 1992.
|
| |
5
|
F. Daneshgaran and K. Yao. The iterative collapse algorithm: A novel approach for the design of long constraint length Viterbi decoders - Part I. IEEE Trans. Communications, 43(2):1409--1418, Feb. 1995.
|
| |
6
|
H. Dawid, S. Bitterlich, and H. Meyr. Trellis pipeline-interleaving: a novel method for efficient Viterbi decoder implementation. In IEEE International Symposium on Circuits and Systems, May 1992.
|
| |
7
|
H. De Man, F. Catthoor, G. Goossens, J. Vanhoof, J. V. Meerbergen, S. Note, and J. Huisken. Architecture-driven synthesis techniques for VLSI implementation of DSP algorithms. Proc. of the IEEE, 78(2):319--335, Feb. 1990.
|
| |
8
|
G. Fettweis and H. Meyr. High-speed parallel Viterbi decoding: Algorithm and VLSI-architecture. IEEE Communications Magazine, pages 46--55, May 1991.
|
| |
9
|
P. G. Gulak and T. Kailath. Locally connected VLSI architectures for the Viterbi algorithm. IEEE J. on Selected Areas in Communications, 6(3):527--537, Apr. 1988.
|
| |
10
|
|
| |
11
|
H. Li and C. Chakrabarti. A new architecture for the Viterbi decoder for code rate k/n. IEEE Trans. Communications, 44(2):158--164, Feb. 1996.
|
| |
12
|
H. Lin and C. B. Shung. General in-place scheduling for the Viterbi algorithm. In International Conf. on Acoustics, Speech, and Signal Processing, pages 1577--1580, 1991.
|
| |
13
|
|
| |
14
|
S. R. Meier. A Viterbi decoder architecture based on parallel processing elements. In Global Telecommunications Conference, pages 1323--1327, 1990.
|
| |
15
|
B. K. Min and N. Demassieux. A versatile architecture for VLSI implementation of the Viterbi algorithm. In International Conf. on Acoustics, Speech, and Signal Processing, pages 1101--1104, 1991.
|
| |
16
|
|
| |
17
|
J. G. Proakis. Digital Communications. McGraw-Hill Inc., New York, 1995.
|
| |
18
|
C. M. Rader. Memory management in a Viterbi decoder. IEEE Trans. Communications, 29(9), Sept. 1981.
|
| |
19
|
|
| |
20
|
C. B. Shung, H. Lin, R. Cypher, P. H. Siegel, and H. K. Thapar. Area efficient architectures for the Viterbi algorithm- part I: Theory. IEEE Trans. Communications, 41(4):636--644, Apr. 1993.
|
| |
21
|
J. Sparsø, H. N. Jørgensen, E. Paaske, S. Pedersen, and T. Rubner Petersen. An area-efficient topology for VLSI implementation of Viterbi decoders and other shuffle-exchange type structures. IEEE J. of Solid-State Circuits, 26(2):90--96, Feb. 1991.
|
| |
22
|
|
| |
23
|
C. Wang and K. K. Parhi. High-level DSP synthesis using concurrent transformations, scheduling, and allocation. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 14(3):274--295, Mar. 1995.
|
 |
24
|
Jens Peter Wittenburg , Willm Hinrichs , Johannes Kneip , Martin Ohmacht , Mladen Bereković , Hanno Lieske , Helge Kloos , Peter Pirsch, Realization of a programmable parallel DSP for high performance image processing applications, Proceedings of the 35th annual conference on Design automation, p.56-61, June 15-19, 1998, San Francisco, California, United States
[doi> 10.1145/277044.277055]
|
| |
25
|
C.-M. Wu, M. Shieh, C.-H. W, and M. Sheu. An efficient approach for in-place scheduling for path metric update in Viterbi decoders. In International Symposium on Circuits and Systems, pages 61--64, May 2000.
|
| |
26
|
|
| |
27
|
A. K. Yeung and J. M. Rabaey. A 210Mb/s radix-4 bit-level pipelined Viterbi decoder. In IEEE International Solid-State Circuits Conference, pages 88--90, 1995.
|
|