ACM Home Page
Please provide us with feedback. Feedback
A scalable, clustered SMT processor for digital signal processing
Full text PdfPdf (356 KB)
Source ACM SIGARCH Computer Architecture News archive
Volume 32 ,  Issue 3  (June 2004) table of contents
Special issue: MEDEA-2003 workshop
Pages: 62 - 69  
Year of Publication: 2004
ISSN:0163-5964
Also published in ...
Authors
Mladen Berekovic  University of Hannover, Germany
Sören Moch  University of Hannover, Germany
Peter Pirsch  University of Hannover, Germany
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 42,   Citation Count: 1
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1024295.1024304
What is a DOI?

ABSTRACT

A scalable, distributed, processor architecture is presented that emphasizes on high performance computing for digital signal processing applications by combining high frequency design techniques with a very high degree of parallel processing on a chip. The architecture is based on a superscalar processor model with a modified Tomasulo scheme [1], that was extended to eliminate all central control structures for the data flow and to support simultaneous instruction issue from multiple independent threads (SMT). Consequent application of fine clustering reduces the cycle-time for wire-sensitive building blocks of the processor like the register file or the instruction scheduler and leads to a distributed architecture model, where independent thread processing units, ALUs, registers files and memories are distributed across the chip and communicate with each other by special networks. The performance of the architecture is scalable with both the number of function units and the number of thread units without having any impact on the processors cycle-time.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. M. Tomasulo, "An efficient algorithm for exploiting multiple arithmetic units", IBM Journal on Research and Development, Vol. 11, no. 1, January 1967, pp. 25--33.
 
2
 
3
 
4
M. H. Lipasti and J. P. Shen "Modern Processor Design", McGrawHill, 2002.
5
6
7
8
9
 
10
R. P. Preston, et. al "Design of an 8-wide Superscalar RISC with Simultaneous Multithreading", Solid-State Circuits Conference (ISSCC2002), San-Francisco, Ca, Febr. 2002, pp.469--471.
 
11
G. A. Kemp and M. Franklin, "PEW: A Decentralized Dynamic Scheduler for ILP Processing", Proc. Int'l, Conf. On Parallel Processing, Aug. 1996, pp. 239--246.
12
 
13
 
14
 
15
R. Ho, K. W. Mai, M. A. Horowitz, "The Future of wires", Proceedings of the IEEE, 89(4): 490--504, Apr. 2001.
 
16
The International Technology Roadmap for Semiconductors. Semiconductor Industry Association. 1999.
17
 
18
ISO/IEC JTC/SC29/WG11 N4668, "Overview of the MPEG-4 standard," Jeju, March 2002.
 
19
Texas Instruments, "TMS320DM642 Technical Overview," Application Report SPRU615, Sep. 2002.
 
20
G. A. Slavenburg, S. Rathnam, H. Diskstra: "The Trimedia TM-1 PCI VLIW Media Processor," Proceedings Notebook for Hot Chips VIII, pp. 171--177, Stanford, 1996.
 
21
 
22
J. L. van Meerbergen, "Lecture slides: Complex Multiprocessor architectures," www.ics.ele.tue.nl/~jef/education/5p520/index.html
 
23
 
24
M. Berekovic, H.-J. Stolberg, P. Pirsch, "Multi-Core System-On-Chip Architecture for MPEG-4 Streaming Video," Transactions on Circuits and Systems for Video Technology (CSVT), Vol. 12, No. 8, August 2002, pp. 688--699.
 
25
ARM AMBA Specification, www. ARM. com.
 
26
 
27
S. Ishiwata et. Al., "A Single-Chip MPEG-2 Codec Based on Customizable Media Embedded Processor," IEEE Journal of Solid-State Circuits, Vol. 38, no. 3, March 2003, 530--540.
28
 
29
H. Zhang, J. M. Rabaey et al., " A 1 V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications," Proc. Int'l. Solid-State Circuits Conference (ISSCC), San Francisco, February 2000.
 
30
 
31
M. T. J. Strik, A. H. Timmer, J. L. van Meerbergen, and G.-J- van Rootselaar, "Heterogeneous Multiprocessor for the Management of Real-Time Video and Graphics Streams," IEEE Journal of Solid-State Circuits, Vol. 35, no. 11, November 2000, pp. 1722--1731.
 
32
 
33
G. Hinton et. al., "The Microarchitecture of the Pentium 4 Processor," Intel Technology Journal, 1st quarter 2001.http://www.intel.com
 
34
 
35
 
36
 
37
 
38
 
39
40
 
41
 
42
S. Weiss, and J. E: Smith, "Instruction Issue Logic in Pipelined Supercomputers," IEEE Trans. on Comp., vol. C 33, No. 11, Nov. 1984, pp. 1013--1022
43
 
44
 
45
S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. D. Owens, "Register Organisation for Media Processing," HPCA-6, 2000, pp. 375--386.
 
46
 
47
T. Sato, Y. Nakamura, and I. Arita, "Revisiting Direct Tag Search Algorithm on Superscalar Processors," in Workshop on Complexity-Effective Design, June 2001
 
48
 
49
 
50
 
51
 
52
 
53
 
54
 
55
 
56
M. Johnson, Superscalar Microprocessor Design, Prentice Hall, 1990.
 
57
J. Leenstra, J. Pille, A. Müller, W. M. Sauer, R. Sautter, and D. F. Wendel, A 1.8 GHz Instruction Window Buffer for an out-of-order Microprocessor Core", IEEE Journal on Solid-State Circuits, V.36, No.11, Nov. 2001, pp. 1628--1635.
 
58
S. Vangal et. al., "5-Ghz 32-bit Integer Execution Core in 130-nm Dual-VT CMOS," IEEE Journal of Solid-State Circuits, vol. 37, no. 11, November 2002.
 
59
H.-J. Stolberg, M. Berekovic, P. Pirsch, "A Platform-Independent Methodology for Performance Estimation of Streaming Media Applications," Proc. 2002 IEEE International Conference on Multimedia and EXPO (ICME2002), August 2002, CD-ROM.
 
60
61

Collaborative Colleagues:
Mladen Berekovic: colleagues
Sören Moch: colleagues
Peter Pirsch: colleagues