ACM Home Page
Please provide us with feedback. Feedback
Instruction scheduling for clustered VLIW architectures
Full text PdfPdf (63 KB)
Source International Symposium on Systems Synthesis archive
Proceedings of the 13th international symposium on System synthesis table of contents
Madrid, Spain
SESSION: Code generation and scheduling table of contents
Pages: 41 - 46  
Year of Publication: 2000
ISBN:1080-1082
Authors
Jesús Sánchez  Universitat Politècnica de Catalunya, Dept. of Computer Architecture, Barcelona - SPAIN, E-mail: fran@ac.upc.es
Antonio González  Universitat Politècnica de Catalunya, Dept. of Computer Architecture, Barcelona - SPAIN, E-mail: antonio@ac.upc.es
Sponsors
IEEE : IEEE Computer Society Technical Committee on Design Automation
SIGDA: ACM Special Interest Group on Design Automation
Publisher
IEEE Computer Society  Washington, DC, USA
Bibliometrics
Downloads (6 Weeks): 0,   Downloads (12 Months): 25,   Citation Count: 12
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/501790.501801
What is a DOI?

ABSTRACT

Clustered VLIW organizations are nowadays a common trend in the design of embedded/DSP processors. In this work we propose a novel modulo scheduling approach for such architectures. The proposed technique performs the cluster assignment and the instruction scheduling in a single pass, which is more effective than doing first the assignment and latter the scheduling. We also show that loop unrolling significantly enhances the performance of the proposed scheduler, especially when the communication channel among clusters is the main performance bottleneck. By selectively unrolling some loops, we can obtain the best performance with the minimum increase in code size. Performance evaluation for the SPECfp95 shows that the clustered architecture achieves about the same IPC (Instructions Per Cycle) as a unified architecture with the same resources. Moreover, when the cycle time is taken into account, a 4-cluster configuration is 3.6 times faster than the unified architecture.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
E. Ayguade, C. Barrado, A. Gonz~lez, J. Labarta, D. L~pez, S. Moreno, D. Padua, F. Reig, Q. Riera and M. Valero, "Ictineo: a Tool for Research on ILP", in SC'96, Research Exhibit "Polaris at Work", 1996
2
 
3
 
4
 
5
P. Glaskowsky, "MAP1000 unfolds at Equator", Microprocessor Report vol 12, no 16. Dec. 1998
 
6
S. Jang, S. Carr, P. Sweany and D. Kuras, "A Code Generation Framework for VLIW Architectures with Partitioned Register Banks", in Procs. of 3rd. Int. Conf. on Massively Parallel Computing Systems, April 1998
7
 
8
 
9
 
10
11
 
12
13
 
14
 
15
Semiconductor Industry Association, "The National Technology Roadmap for Semiconductors: Technology Needs", 1997
 
16
Texas Instruments Inc., "TMS320C62x/67x CPU and Instruction Set Reference Guide", 1998
 
17
O. Wolfe and J. Bier, "TigerSharc Sinks Teeth Into VLIW", Microprocessor Report, vol. 12, no. 16, Dec. 1998.

CITED BY  12
Collaborative Colleagues:
Jesús Sánchez: colleagues
Antonio González: colleagues