| Hybrid multithreading for VLIW processors |
| Full text |
Pdf
(542 KB)
|
Source
|
International Conference on Compilers, Architecture and Synthesis for Embedded Systems
archive
Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
table of contents
Grenoble, France
SESSION: Compiler techniques for performance
table of contents
Pages 37-46
Year of Publication: 2009
ISBN:978-1-60558-626-7
|
|
Authors
|
|
Manoj Gupta
|
Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Fermin Sanchez
|
Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Josep Llosa
|
Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 14, Citation Count: 0
|
|
|
ABSTRACT
Several multithreading techniques have been proposed to reduce resource underutilization in Very Long Instruction Word (VLIW) processors. Simultaneous MultiThreading (SMT) is a popular technique that improves processor performance by issuing multiple instructions from different threads. In VLIW processors, SMT requires extra hardware to merge instructions from different threads. The complexity of this hardware increases substantially with the number of threads. On the other hand, techniques like Interleaved MultiThreading (IMT) do not need any merging hardware, and support a larger number of threads at reasonable cost. In this paper, we propose Hybrid MultiThreading (HMT), a technique that at each cycle merges instructions from only a subset of threads. HMT supports a reasonable number of threads with a low merging hardware cost. For instance, it is possible to support 8 hardware threads with a merging hardware for only 2 threads. The experimental results show that using HMT improves the multithreading performance significantly. Further, supporting 8 hardware threads with HMT but using a 4-thread merging hardware achieves a performance similar to merging 8 threads simultaneously with a significantly lower merging hardware cost.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Colorspace Conversion Program Used in High Performance Printers, Personal Communication.
|
| |
2
|
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. J. Smith. The Tera computer system. In ICS, 1990.
|
| |
3
|
D. Barretta, W. Fornaciari, M. Sami, and D. Bagni. Multithreaded Extension to Multicluster VLIW Processors for Embedded Applications. In DATE, pages 748--749, 2005.
|
| |
4
|
P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood. Lx: A Technology Platform for Customizable VLIW Embedded Processing. In ISCA, pages 203--213, 2000.
|
| |
5
|
M. Fillo, S. Keckler, W. Dally, N. Carter, A. Chang, Y. Gurevich, and W. Lee. The M-Machine multicomputer. In MICRO, 1995.
|
| |
6
|
J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Computers, 30(7):478--490, 1981.
|
| |
7
|
M. Gupta, F. Sanchez, and J. Llosa. Hybrid Multithreading for VLIW Processors. Technical Report UPC-DAC-RR-CAP-2009-19.
|
| |
8
|
M. Gupta, F. Sanchez, and J. Llosa. Cluster-Level Simultaneous MultiThreading for VLIW Processors. In ICCD, 2007.
|
| |
9
|
M. Gupta, F. Sanchez, and J. Llosa. Merge Logic for Clustered Multithreaded VLIW Processors. In EUROMICRO Conf. on Digital System Design, 2007.
|
| |
10
|
J. L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium. IEEE Computer, 33(7):28--35, 2000.
|
| |
11
|
F. Homewood and P. Faraboschi. ST200: A VLIW Architecture for Media--Oriented Applications. Microprocessor Forum, 2000.
|
| |
12
|
J. Hoogerbrugge and A. Terechko. A multithreaded multicore system for embedded media processing. Transactions on HiPEAC, 3(2), 2008.
|
| |
13
|
Inverse discrete cosine transform, taken from mpeg. http://ffmpeg.org. last consult april 2008.
|
| |
14
|
R. Kumar, N. Jouppi, and D. Tullsen. Conjoined-Core Chip Multiprocessing. In MICRO, 2004.
|
| |
15
|
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In MICRO, pages 330--335, 1997.
|
| |
16
|
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow Trace Scheduling Compiler. The Journal of Supercomputing, 7(1--2):51--142, 1993.
|
| |
17
|
S. Rixner, W. J. Dally, B. Khailany, P. R. Mattson, U. J. Kapasi, and J. D. Owens. Register Organization for Media Processing. In HPCA, pages 375--386, 2000.
|
| |
18
|
N. Seshan. High VelociTI Processing. IEEE Signal Processing Magazine, 15(2):86--101, March 1998.
|
| |
19
|
B. J. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. In SPIE, pages 241--248, 1981.
|
| |
20
|
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In ISCA, 1995.
|
| |
21
|
E. Tune, R. Kumar, D. Tullsen, and B. Calder. Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy. In MICRO, 2004.
|
| |
22
|
VEX Toolchain. www.hpl.hp.com/downloads/vex/.
|
| |
23
|
W.-D. Weber and A. Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results. In ISCA, 1989.
|
| |
24
|
x264 -- a free h264/avc encoder. www.videolan.org/developers/x264.html. Last consult April 2008.
|
|