|
ABSTRACT
A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme [1], that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file or the instruction scheduler and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units", IBM Journal on Research and Development, Vol. 11, no. 1, January 1967, pp. 25--33.
|
| |
2
|
Alan Allan , Don Edenfeld , William H. Joyner, Jr. , Andrew B. Kahng , Mike Rodgers , Yervant Zorian, 2001 Technology Roadmap for Semiconductors, Computer, v.35 n.1, p.42-53, January 2002
[doi> 10.1109/2.976918]
|
| |
3
|
|
| |
4
|
M. H. Lipasti and J. P. Shen "Modern Processor Design", McGrawHill, 2002.
|
 |
5
|
Subbarao Palacharla , Norman P. Jouppi , J. E. Smith, Complexity-effective superscalar processors, Proceedings of the 24th annual international symposium on Computer architecture, p.206-218, June 01-04, 1997, Denver, Colorado, United States
|
 |
6
|
Hiroaki Hirata , Kozo Kimura , Satoshi Nagamine , Yoshiyuki Mochizuki , Akio Nishimura , Yoshimori Nakase , Teiji Nishizawa, An elementary processor architecture with simultaneous instruction issuing from multiple threads, Proceedings of the 19th annual international symposium on Computer architecture, p.136-145, May 19-21, 1992, Queensland, Australia
|
 |
7
|
|
 |
8
|
|
 |
9
|
Dean M. Tullsen , Susan J. Eggers , Joel S. Emer , Henry M. Levy , Jack L. Lo , Rebecca L. Stamm, Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor, Proceedings of the 23rd annual international symposium on Computer architecture, p.191-202, May 22-24, 1996, Philadelphia, Pennsylvania, United States
|
| |
10
|
R. P. Preston, et. al "Design of an 8-wide Superscalar RISC with Simultaneous Multithreading", Solid-State Circuits Conference (ISSCC2002), San-Francisco, Ca, Febr. 2002, pp.469--471.
|
| |
11
|
G. A. Kemp and M. Franklin, "PEW: A Decentralized Dynamic Scheduler for ILP Processing", Proc. Int'l, Conf. On Parallel Processing, Aug. 1996, pp. 239--246.
|
 |
12
|
|
| |
13
|
Keith I. Farkas , Paul Chow , Norman P. Jouppi , Zvonko Vranesic, The multicluster architecture: reducing cycle time through partitioning, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.149-159, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
14
|
|
| |
15
|
R. Ho, K. W. Mai, M. A. Horowitz, "The Future of wires", Proceedings of the IEEE, 89(4): 490--504, Apr. 2001.
|
| |
16
|
The International Technology Roadmap for Semiconductors. Semiconductor Industry Association. 1999.
|
 |
17
|
Vikas Agarwal , M. S. Hrishikesh , Stephen W. Keckler , Doug Burger, Clock rate versus IPC: the end of the road for conventional microarchitectures, Proceedings of the 27th annual international symposium on Computer architecture, p.248-259, June 2000, Vancouver, British Columbia, Canada
|
| |
18
|
ISO/IEC JTC/SC29/WG11 N4668, "Overview of the MPEG-4 standard," Jeju, March 2002.
|
| |
19
|
Texas Instruments, "TMS320DM642 Technical Overview," Application Report SPRU615, Sep. 2002.
|
| |
20
|
G. A. Slavenburg, S. Rathnam, H. Diskstra: "The Trimedia TM-1 PCI VLIW Media Processor," Proceedings Notebook for Hot Chips VIII, pp. 171--177, Stanford, 1996.
|
| |
21
|
|
| |
22
|
J. L. van Meerbergen, "Lecture slides: Complex Multiprocessor architectures," www.ics.ele.tue.nl/~jef/education/5p520/index.html
|
| |
23
|
|
| |
24
|
M. Berekovic, H.-J. Stolberg, P. Pirsch, "Multi-Core System-On-Chip Architecture for MPEG-4 Streaming Video," Transactions on Circuits and Systems for Video Technology (CSVT), Vol. 12, No. 8, August 2002, pp. 688--699.
|
| |
25
|
ARM AMBA Specification, www. ARM. com.
|
| |
26
|
|
| |
27
|
S. Ishiwata et. Al., "A Single-Chip MPEG-2 Codec Based on Customizable Media Embedded Processor," IEEE Journal of Solid-State Circuits, Vol. 38, no. 3, March 2003, 530--540.
|
 |
28
|
Paolo Faraboschi , Geoffrey Brown , Joseph A. Fisher , Giuseppe Desoli , Fred Homewood, Lx: a technology platform for customizable VLIW embedded processing, Proceedings of the 27th annual international symposium on Computer architecture, p.203-213, June 2000, Vancouver, British Columbia, Canada
|
| |
29
|
H. Zhang, J. M. Rabaey et al., " A 1 V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications," Proc. Int'l. Solid-State Circuits Conference (ISSCC), San Francisco, February 2000.
|
| |
30
|
|
| |
31
|
M. T. J. Strik, A. H. Timmer, J. L. van Meerbergen, and G.-J- van Rootselaar, "Heterogeneous Multiprocessor for the Management of Real-Time Video and Graphics Streams," IEEE Journal of Solid-State Circuits, Vol. 35, no. 11, November 2000, pp. 1722--1731.
|
| |
32
|
|
| |
33
|
G. Hinton et. al., "The Microarchitecture of the Pentium 4 Processor," Intel Technology Journal, 1st quarter 2001.http://www.intel.com
|
| |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
|
| |
38
|
|
| |
39
|
John Nickolls , L. J. Madar III , Scott Johnson , Viresh Rustagi , Ken Unger , Mustafiz Choudhury, Calisto: A Low-Power Single-Chip Multiprocessor Communications Platform, IEEE Micro, v.23 n.2, p.29-43, March 2003
[doi> 10.1109/MM.2003.1196113]
|
 |
40
|
|
| |
41
|
|
| |
42
|
S. Weiss, and J. E: Smith, "Instruction Issue Logic in Pipelined Supercomputers," IEEE Trans. on Comp., vol. C 33, No. 11, Nov. 1984, pp. 1013--1022
|
 |
43
|
|
| |
44
|
|
| |
45
|
S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. D. Owens, "Register Organisation for Media Processing," HPCA-6, 2000, pp. 375--386.
|
| |
46
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
| |
47
|
T. Sato, Y. Nakamura, and I. Arita, "Revisiting Direct Tag Search Algorithm on Superscalar Processors," in Workshop on Complexity-Effective Design, June 2001
|
| |
48
|
|
| |
49
|
|
| |
50
|
|
| |
51
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
[doi> 10.1109/MM.2002.997877]
|
| |
52
|
Brucek Khailany , William J. Dally , Ujval J. Kapasi , Peter Mattson , Jinyung Namkoong , John D. Owens , Brian Towles , Andrew Chang , Scott Rixner, Imagine: Media Processing with Streams, IEEE Micro, v.21 n.2, p.35-46, March 2001
[doi> 10.1109/40.918001]
|
| |
53
|
|
| |
54
|
|
| |
55
|
Mladen Berekovic , Hans-Joachim Stolberg , Mark B. Kulaczewski , Peter Pirsch , Henning Möller , Holger Runge , Johannes Kneip , Benno Stabernack, Instruction Set Extensions for MPEG-4 Video, Journal of VLSI Signal Processing Systems, v.23 n.1, p.27-49, Oct. 1 1999
[doi> 10.1023/A:1008188618930]
|
| |
56
|
M. Johnson, Superscalar Microprocessor Design, Prentice Hall, 1990.
|
| |
57
|
J. Leenstra, J. Pille, A. Müller, W. M. Sauer, R. Sautter, and D. F. Wendel, A 1.8 GHz Instruction Window Buffer for an out-of-order Microprocessor Core", IEEE Journal on Solid-State Circuits, V.36, No.11, Nov. 2001, pp. 1628--1635.
|
| |
58
|
S. Vangal et. al., "5-Ghz 32-bit Integer Execution Core in 130-nm Dual-VT CMOS," IEEE Journal of Solid-State Circuits, vol. 37, no. 11, November 2002.
|
| |
59
|
H.-J. Stolberg, M. Berekovic, P. Pirsch, "A Platform-Independent Methodology for Performance Estimation of Streaming Media Applications," Proc. 2002 IEEE International Conference on Multimedia and EXPO (ICME2002), August 2002, CD-ROM.
|
| |
60
|
|
 |
61
|
|
|