ACM Home Page
Please provide us with feedback. Feedback
Cache aware optimization of stream programs
Full text PdfPdf (219 KB)
Source ACM SIGPLAN Notices archive
Volume 40 ,  Issue 7  (July 2005) table of contents
Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
SESSION: Memory optimization table of contents
Pages: 115 - 126  
Year of Publication: 2005
ISSN:0362-1340
Also published in ...
Authors
Janis Sermulins  Massachusetts Institute of Technology
William Thies  Massachusetts Institute of Technology
Rodric Rabbah  Massachusetts Institute of Technology
Saman Amarasinghe  Massachusetts Institute of Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 13,   Downloads (12 Months): 81,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1070891.1065927
What is a DOI?

ABSTRACT

Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly prevalent in the embedded domain. We exploit the widespread parallelism and regular communication patterns in stream programs to formulate a set of cache aware optimizations that automatically improve instruction and data locality. Our work is in the context of the Synchronous Dataflow model, in which a program is described as a graph of independent actors that communicate over channels. The communication rates between actors are known at compile time, allowing the compiler to statically model the caching behavior.We present three cache aware optimizations: 1) execution scaling, which judiciously repeats actor executions to improve instruction locality, 2) cache aware fusion, which combines adjacent actors while respecting instruction cache constraints, and 3) scalar replacement, which converts certain data buffers into a sequence of scalar variables that can be register allocated. The optimizations are founded upon a simple and intuitive model that quantifies the temporal locality for a sequence of actor executions. Our implementation of cache aware optimizations in the StreamIt compiler yields a 249% average speedup (over unoptimized code) for our streaming benchmark suite on a StrongARM 1110 processor. The optimizations also yield a 154% speedup on a Pentium 3 and a 152% speedup on an Itanium 2.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
C. S. Bhattacharyya, P. K. Murthy, and E. A. Lee. APGAN and RPMC: Complementary Heuristics for Translating DSP Block Diagrams into Efficient Software Implementations. Journal of Design Automation for Embedded Systems., pages 33--60, January 1997.
 
4
 
5
 
6
I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream Computing on Graphics Hardware. In SIGGRAPH, 2004.
 
7
R. Govindarajan, G. Gao, and P. Desai. Minimizing memory requirements in rate-optimal schedules. In Proceedings of the 1994 Int. conference on Application Specific Array Processors, pages 75--86, August 1994.
 
8
N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud. The synchronous data-flow programming language LUSTRE. Proc. of the IEEE, 79(9), 1991.
 
9
10
 
11
S. Kohli. Cache aware scheduling of synchronous dataflow programs. Master's Report Technical Memorandum UCB/URL M04/03, UC Berkeley, 2004.
 
12
E. A. Lee. Overview of the Ptolemy Project. Technical report, Tech Memo UCB/ERL M03/25, UC Berkeley, 2003.
 
13
 
14
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 1970.
 
15
 
16
P. K. Murthy and S. S. Bhattacharyya. Buffer Merging --- A Powerful Technique for Reducing Memory Requirements of Synchronous Dataflow Specifications. Technical report, Inst. for Adv. Computer Studies, UMD College Park, 2000.
17
 
18
J. Sermulins. In preparation. Master's thesis, MIT CSAIL, 2005.
 
19
R. Stephens. A Survey of Stream Processing. Acta Informatica, 34(7), 1997.
 
20
D. Tennenhouse and V. Bose. The SpectrumWare Approach to Wireless Signal Processing. Wireless Networks, 1999.
 
21


Collaborative Colleagues:
Janis Sermulins: colleagues
William Thies: colleagues
Rodric Rabbah: colleagues
Saman Amarasinghe: colleagues