ACM Home Page
Please provide us with feedback. Feedback
Hybrid multithreading for VLIW processors
Full text PdfPdf (542 KB)
Source
International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems table of contents
Grenoble, France
SESSION: Compiler techniques for performance table of contents
Pages 37-46  
Year of Publication: 2009
ISBN:978-1-60558-626-7
Authors
Manoj Gupta  Universitat Politecnica de Catalunya, Barcelona, Spain
Fermin Sanchez  Universitat Politecnica de Catalunya, Barcelona, Spain
Josep Llosa  Universitat Politecnica de Catalunya, Barcelona, Spain
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 14,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629395.1629403
What is a DOI?

ABSTRACT

Several multithreading techniques have been proposed to reduce resource underutilization in Very Long Instruction Word (VLIW) processors. Simultaneous MultiThreading (SMT) is a popular technique that improves processor performance by issuing multiple instructions from different threads. In VLIW processors, SMT requires extra hardware to merge instructions from different threads. The complexity of this hardware increases substantially with the number of threads. On the other hand, techniques like Interleaved MultiThreading (IMT) do not need any merging hardware, and support a larger number of threads at reasonable cost. In this paper, we propose Hybrid MultiThreading (HMT), a technique that at each cycle merges instructions from only a subset of threads. HMT supports a reasonable number of threads with a low merging hardware cost. For instance, it is possible to support 8 hardware threads with a merging hardware for only 2 threads. The experimental results show that using HMT improves the multithreading performance significantly. Further, supporting 8 hardware threads with HMT but using a 4-thread merging hardware achieves a performance similar to merging 8 threads simultaneously with a significantly lower merging hardware cost.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Colorspace Conversion Program Used in High Performance Printers, Personal Communication.
 
2
R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. J. Smith. The Tera computer system. In ICS, 1990.
 
3
D. Barretta, W. Fornaciari, M. Sami, and D. Bagni. Multithreaded Extension to Multicluster VLIW Processors for Embedded Applications. In DATE, pages 748--749, 2005.
 
4
P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood. Lx: A Technology Platform for Customizable VLIW Embedded Processing. In ISCA, pages 203--213, 2000.
 
5
M. Fillo, S. Keckler, W. Dally, N. Carter, A. Chang, Y. Gurevich, and W. Lee. The M-Machine multicomputer. In MICRO, 1995.
 
6
J. A. Fisher. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. Computers, 30(7):478--490, 1981.
 
7
M. Gupta, F. Sanchez, and J. Llosa. Hybrid Multithreading for VLIW Processors. Technical Report UPC-DAC-RR-CAP-2009-19.
 
8
M. Gupta, F. Sanchez, and J. Llosa. Cluster-Level Simultaneous MultiThreading for VLIW Processors. In ICCD, 2007.
 
9
M. Gupta, F. Sanchez, and J. Llosa. Merge Logic for Clustered Multithreaded VLIW Processors. In EUROMICRO Conf. on Digital System Design, 2007.
 
10
J. L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium. IEEE Computer, 33(7):28--35, 2000.
 
11
F. Homewood and P. Faraboschi. ST200: A VLIW Architecture for Media--Oriented Applications. Microprocessor Forum, 2000.
 
12
J. Hoogerbrugge and A. Terechko. A multithreaded multicore system for embedded media processing. Transactions on HiPEAC, 3(2), 2008.
 
13
Inverse discrete cosine transform, taken from mpeg. http://ffmpeg.org. last consult april 2008.
 
14
R. Kumar, N. Jouppi, and D. Tullsen. Conjoined-Core Chip Multiprocessing. In MICRO, 2004.
 
15
C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems. In MICRO, pages 330--335, 1997.
 
16
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow Trace Scheduling Compiler. The Journal of Supercomputing, 7(1--2):51--142, 1993.
 
17
S. Rixner, W. J. Dally, B. Khailany, P. R. Mattson, U. J. Kapasi, and J. D. Owens. Register Organization for Media Processing. In HPCA, pages 375--386, 2000.
 
18
N. Seshan. High VelociTI Processing. IEEE Signal Processing Magazine, 15(2):86--101, March 1998.
 
19
B. J. Smith. Architecture and Applications of the HEP Multiprocessor Computer System. In SPIE, pages 241--248, 1981.
 
20
D. M. Tullsen, S. J. Eggers, and H. M. Levy. Simultaneous Multithreading: Maximizing On-Chip Parallelism. In ISCA, 1995.
 
21
E. Tune, R. Kumar, D. Tullsen, and B. Calder. Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy. In MICRO, 2004.
 
22
VEX Toolchain. www.hpl.hp.com/downloads/vex/.
 
23
W.-D. Weber and A. Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results. In ISCA, 1989.
 
24
x264 -- a free h264/avc encoder. www.videolan.org/developers/x264.html. Last consult April 2008.