ACM Home Page
Please provide us with feedback. Feedback
Optimizing scientific application loops on stream processors
Full text PdfPdf (360 KB)
Source
Language, Compiler and Tool Support for Embedded Systems archive
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems table of contents
Tucson, AZ, USA
SESSION: Register allocation table of contents
Pages 161-170  
Year of Publication: 2008
ISBN:978-1-60558-104-0
Also published in ...
Authors
Li Wang  NUDT, ChangSha, China
Xuejun Yang  NUDT, ChangSha, China
Jingling Xue  UNSW, Sydney, Australia
Yu Deng  NDUT, ChangSha, China
Xiaobo Yan  NUDT, ChangSha, China
Tao Tang  NUDT, ChangSha, China
Quan Hoang Nguyen  UNSW, Sydney, Australia
Sponsors
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
SIGDA: ACM Special Interest Group on Design Automation
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 98,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1375657.1375679
What is a DOI?

ABSTRACT

This paper describes a graph coloring compiler framework to allocate on-chip SRF(Stream Register File) storage for optimizing scientific applications on stream processors. Our framework consists of first applying enabling optimizations such as loop unrolling to expose stream reuse and opportunities for maximizing parallelism, i.e., overlapping kernel execution and memory transfers.Then the three SRF management tasks are solved in a unified manner via graph coloring: (1) placing streams in the SRF, (2) exploiting stream use, and (3) maximizing parallelism. We evaluate the performance of our compiler framework by actually running nine representative scientific computing kernels on our FT64 stream processor. Our preliminary results show that compiler management achieves an average speedup of 2.3x compared to First-Fit allocation. In comparison with the performance results obtained from running these benchmarks on Itanium 2, an average speedup of 2.1x is observed.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
5
 
6
7
8
9
 
10
11
 
12
 
13
14
 
15
 
16
V. Lefebvre and P. Feautrier. Storage management in parallel programs. Technical report, Laboratory PRiSM, University of Versailles, France, 1996.
 
17
18
19
 
20
Peter Raymond Mattson. phA programming system for the imagine media processor. PhD thesis, Stanford University, Stanford, CA, USA, 2002. Adviser-William J. Dally.
 
21
John D. Owens. phComputer Graphics on a Stream Architecture. PhD thesis, Stanford University, November 2002.
 
22
23
24
 
25
 
26
W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Ho, M. Brown, and S. Amarasinghe. StreamIt: A compiler for streaming applications, December 2001. MIT-LCS Technical Memo TM-622, Cambridge, MA.
 
27
28
 
29
Nan Wu, Mei Wen, Ju Ren, Yi He, and Chunyuan Zhang. Register allocation on stream processor with local register file. In phACSAC '06: Proceedings of the 11th Asia-Pacific Computer Systems Architecture Conference, pages 545--551, 2006.
 
30
31


Collaborative Colleagues:
Li Wang: colleagues
Xuejun Yang: colleagues
Jingling Xue: colleagues
Yu Deng: colleagues
Xiaobo Yan: colleagues
Tao Tang: colleagues
Quan Hoang Nguyen: colleagues