| Exploiting loop-dependent stream reuse for stream processors |
| Full text |
Pdf
(671 KB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: Compilation
table of contents
Pages 22-31
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
Xuejun Yang
|
National University of Defence Technology, ChangSha, China
|
|
Ying Zhang
|
National University of Defence Technology, Changsha, China
|
|
Jingling Xue
|
The University of New South Wales, Sydney, Australia
|
|
Ian Rogers
|
The University of Manchester, Manchester, United Kngdm
|
|
Gen Li
|
National University of Defence Technology, Changsha, China
|
|
Guibin Wang
|
National University of Defence Technology, Changsha, China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 11, Downloads (12 Months): 118, Citation Count: 1
|
|
|
ABSTRACT
The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory accesses can be reduced. In current stream compilers reuse is only attempted for simple stream references, those whose start and end are known. Compiler analysis from outside of stream processors does not directly enable the consideration of other complex stream references. In this paper we propose a transformation to automatically optimize stream programs to exploit the reuse supplied by loop-dependent stream references. The transformation is based on three results: algorithms to recognize the reuse supplied by stream references, a new abstract expression called the Stream Reuse Graph (SRG) to depict the reuse and the optimization of the SRG for the transformation. Both the reuse between whole sequences accessed by stream references and that between partial sequences are exploited in the paper. In particular, the problem of exploiting partial stream reuse does not have its parallel in the traditional data reuse exploitation setting (for scalars and arrays). Finally, we have implemented our techniques using the StreamC/KernelC compiler for Imagine. Experimental results show a resultant speedup of 1.14 to 2.54 times using a range of typical stream processing application kernels.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
 |
4
|
Ian Buck , Tim Foley , Daniel Horn , Jeremy Sugerman , Kayvon Fatahalian , Mike Houston , Pat Hanrahan, Brook for GPUs: stream computing on graphics hardware, ACM Transactions on Graphics (TOG), v.23 n.3, August 2004
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
William J. Dally , Francois Labonte , Abhishek Das , Patrick Hanrahan , Jung-Ho Ahn , Jayanth Gummaraju , Mattan Erez , Nuwan Jayasena , Ian Buck , Timothy J. Knight , Ujval J. Kapasi, Merrimac: Supercomputing with Streams, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.35, November 15-21, 2003
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
| |
14
|
|
| |
15
|
Ujval J. Kapasi , Scott Rixner , William J. Dally , Brucek Khailany , Jung Ho Ahn , Peter Mattson , John D. Owens, Programmable Stream Processors, Computer, v.36 n.8, p.54-62, August 2003
[doi> 10.1109/MC.2003.1220582]
|
| |
16
|
|
 |
17
|
Raymond Lo , Fred Chow , Robert Kennedy , Shin-Ming Liu , Peng Tu, Register promotion by sparse partial redundancy elimination of loads and stores, Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, p.26-37, June 17-19, 1998, Montreal, Quebec, Canada
|
 |
18
|
|
| |
19
|
|
| |
20
|
P. Mattson and et al. Imagine Programming System Developer's Guide, 2004.
|
| |
21
|
M. Narayanan, L. Oliker, A. Janin, P. Husbands, X Ye, and S. Li. Scientific kernels on viram and imagine media processors. Lawrence Berkeley National Laboratory, 2002.
|
| |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
R. Stephens. A survey of stream processing. Acta Informatica, 34(7):491--541, 1997.
|
| |
26
|
Michael Bedford Taylor , Jason Kim , Jason Miller , David Wentzlaff , Fae Ghodrat , Ben Greenwald , Henry Hoffman , Paul Johnson , Jae-Wook Lee , Walter Lee , Albert Ma , Arvind Saraf , Mark Seneski , Nathan Shnidman , Volker Strumpen , Matt Frank , Saman Amarasinghe , Anant Agarwal, The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs, IEEE Micro, v.22 n.2, p.25-35, March 2002
[doi> 10.1109/MM.2002.997877]
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
 |
30
|
Xuejun Yang , Xiaobo Yan , Zuocheng Xing , Yu Deng , Jiang Jiang , Ying Zhang, A 64-bit stream processor architecture for scientific applications, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
CITED BY
|
|
Xuejun Yang , Li Wang , Jingling Xue , Yu Deng , Ying Zhang, Comparability graph coloring for optimizing utilization of stream register files in stream processors, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
|