ACM Home Page
Please provide us with feedback. Feedback
Parallelizing load/stores on dual-bank memory embedded processors
Full text PdfPdf (747 KB)
Source ACM Transactions on Embedded Computing Systems (TECS) archive
Volume 5 ,  Issue 3  (August 2006) table of contents
Pages: 613 - 657  
Year of Publication: 2006
ISSN:1539-9087
Authors
Xiaotong Zhuang  Georgia Institute of Technology, Atlanta, Georgia
Santosh Pande  Georgia Institute of Technology, Atlanta, Georgia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1165780.1165784
What is a DOI?

ABSTRACT

Many modern embedded processors such as DSPs support partitioned memory banks (also called X--Y memory or dual-bank memory) along with parallel load/store instructions to achieve higher code density and performance. In order to effectively utilize the parallel load/store instructions, the compiler must partition the memory-resident values and assign them to X or Y bank. This paper gives a postregister allocation solution to merge the generated load/store instructions into their parallel counterparts. Simultaneously, our framework performs allocation of values to X or Y memory banks. We first remove as many load/stores and register--register moves as possible through an excellent iterated coalescing based register allocator by Appel and George [1996]. We then attempt to parallelize the generated load/stores using a multipass approach. The basic phase of our approach attempts the merger of load/stores without duplication and web splitting. We model this problem as a graph-coloring problem in which each value is colored as either X or Y. We then construct a motion scheduling graph (MSG), based on the range of motion for each load/store instruction. MSG reflects potential instructions that could be merged. We propose a notion of pseudofixed boundaries so that the load/store movement is less affected by register dependencies. We prove that the coloring problem for MSG is NP-complete and solve it with two different heuristic algorithms with different complexity. We then propose a two-level iterative process to attempt instruction duplication, variable duplication, web splitting, and local conflict elimination to effectively merge the remaining load/stores. Finally, we clean up some multiple-aliased load/stores. To improve the performance, we combine profiling information with each stage coupled with some modifications to the algorithm. We show that our framework results in parallelization of a large number of load/stores without much growth in data and code segments. The average speedup for our optimization pass reaches roughly 13% if no profile information is available and 17% with profile information. The average code and data segment growth is controlled within 13%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
Chaitin, G.J., Auslander, M. A., Chandra, A. K., Cocke, J., Hopkins, M. E., and Markstein, P. 1981. Register allocation via coloring. Computer Language, 6, 47--57.
5
6
7
8
 
9
George, L. and Appel, A. W. 1996. Iterated register coalescing. In Proc. SIGPLAN '96 Conf. Programming Language Design and Implementation.
 
10
11
 
12
Leupers, R. and Kotte, D. 2001. Variable partitioning for dual memory bank DSPs. ICASSP (May).
 
13
Mach-SUIF Backend Compiler, 2000. The Machine-SUIF 2.1 compiler documentation set. Harvard University, Sept. http://ececs.harvard.edu/hube/research/machsuif.html.
 
14
 
15
Powell, B., Lee, E.A., and Newman, W.C. 1992. Direct synthesis of optimized DSP assembly code from signal flow block diagrams. Proceedings International Conference on Acoustics, Speech, and Signal Processing. 553--556.
16
 
17
Stanford SUIF Compiler Infrastructure, 2000. The SUIF 2 Compiler Documentation Set, Stanford University, Sep. http://suif.stanford.edu/suif/index.html.
18
 
19

Collaborative Colleagues:
Xiaotong Zhuang: colleagues
Santosh Pande: colleagues