|
ABSTRACT
Low power design criteria for embedded systems have lead to many innovative architectures. One of the core architectural changes that have come in the recent past are streaming registers. These architectures have been shown to be both power efficient and performance efficient. However code has to be efficiently mapped on them to make maximal use of their potential. This paper introduces a novel technique for compiling C code on streaming registers. The proposed technique not only uses the temporal locality in arrays but also spatial locality to map code on streaming registers. The proposed Stream Register Allocation (SARA) technique is also shown to provide good mapping efficiency as well as it is shown to be scalable on realistic applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
P. Raghavan, A. Lambrechts, M.Jayapala, F.Catthoor, D.Verkest and H. Corporaal. Very Wide Register: An Asymmetric Register File Organization for Low Power Embedded Processors. In Proc of DATE, 2007.
|
| |
2
|
N. Jayasena, M. Erez, J.H. Anh, and W.J. Dally. Stream register files with indexed access. In HPCA, pages 60--72, February 2004.
|
| |
3
|
D. Nuzman, M. Namolaru, A. Zaks, and J.H. Derby. Compiling for an indirect vector register architecture. In Proc of CF, pages 199--205, May 2008.
|
| |
4
|
P. Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23--53, 1991.
|
| |
5
|
Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In PLDI '91: Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, pages 30--44, New York, NY, USA, 1991. ACM Press.
|
| |
6
|
U. Banerjee. Data Dependencies. Kluwer Aacdemic Publishers, 1988.
|
| |
7
|
GCC, the GNU Compiler Collection. http://gcc.gnu.org, 2007.
|
| |
8
|
Paul Feautrier. Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23--51, Feb 1991.
|
| |
9
|
Kathleen Knobe and Vivek Sarkar. Array SSA form and its use in parallelization. In Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 107--120, 1998.
|
| |
10
|
Carl Offner and Kathleen Knobe. Weak dynamic single assignment form. Technical Report TR-HPL-2003-169, HP Labs, Nov 2003.
|
| |
11
|
Peter Vanbroekhoven, Gerda Janssens, Maurice Bruynooghe, Henk Corporaal, and Francky Catthoor. A step towards a scalable dynamic single assignment conversion. Technical Report CW 360, Department of Computer Science, Katholieke Universiteit Leuven, Apr 2003.
|
| |
12
|
Wei Li and Keshav Pingali. Access normalization: loop restructuring for numa computers. ACM Trans. Comput. Syst., 11(4):353--375, 1993.
|
| |
13
|
Dattatraya Kulkarni and Michael Stumm. Loop and Data Transformations: A tutorial, 93.
|
| |
14
|
Mahmut Taylan Kandemir and J. Ramanujam. Data relation vectors: A new abstraction for data optimizations. volume 50, pages 798--810, August 2001.
|
| |
15
|
J.Absar, P.Raghavan, A.Lambrechts, M.Li, M.Jayapala and F.Catthoor. Locality Optimizations in a Compiler for Wireless Applications. In Journal of Design Automation of Embedded Systems (DAEM), April 2008.
|
| |
16
|
Javed Absar. PhD thesis, Locality Optimization in a Compiler for Embedded Systems. IMEC vzw, ESAT, KULeuven, July 2007
|
| |
17
|
Peter Marwedel. Embedded System Design. Kluwer Academic Publishers (Springer), Norwell, MA, USA, 2003.
|
| |
18
|
Preeti Ranjan Panda, Alexandru Nicolau, and Nikil Dutt. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration. Kluwer Academic Publishers, Norwell, MA, USA, 1998.
|
| |
19
|
Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In PLDI '91: Proceedings of the ACM USA, 1991. ACM Press.
|
| |
20
|
Janis Sermulins, William Thies, Rodric Rabbah, and Saman Amarasinghe. Cache aware optimization of stream programs. In LCTES'05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 115--126, New York, NY, USA, 2005. ACM Press.
|
| |
21
|
M. Bruynooghe, S. Verdoolaege, G. Janssens, and F. Catthoor. Multi-dimensional incremental loop fusion for data locality. In Proc of ASAP, pages 17--27, 2003.
|
| |
22
|
Dattatraya Kulkarni. Transformations for improving data access locality in non-perfectly nested loops. In Proc of Seventh International Conference on Parallel Architectures and Compilation Techniques, pages 314--321, 1998.
|
| |
23
|
Sylvain Girbal, Nicolas Vasilache, Cedric Bastoul, Albert Cohen, David Parello, March Sigler, and Olivier Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. In International Journal of Parallel Programming, pages 261--317, October 2006.
|
| |
24
|
C. Bastoul. Code generation in the polyhedral model is easier than you think. In PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pages 7--16, september 2004.
|
| |
25
|
G Chaitin. Register allocation and spilling via graph coloring. In Proc of Compiler Construction, 1982.
|
| |
26
|
Gregory Chaitin. Register allocation and spilling via graph coloring. SIGPLAN Not., 39(4):66--74, 2004.
|
| |
27
|
Yumin Zhang and Danny Z. Chen. Efficient global register allocation for minimizing energy consumption. SIGPLAN Not., 37(4):42--53, 2002.
|
| |
28
|
Fernando Magno Quintao Pereira and Jens Palsberg. Register allocation by puzzle solving. In PLDI '08: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, pages 216--226, New York, NY, USA, 2008. ACM.
|
| |
29
|
Guei-Yuan Lueh, Thomas Gross, and Ali-Reza Adl-Tabatabai. Fusion-based register allocation. ACM Trans. Program. Lang. Syst., 22(3):431--470, 2000.
|
| |
30
|
Preston Briggs, Keith D. Cooper, Ken Kennedy, and Linda Torczon. Coloring heuristics for register allocation. SIGPLAN Not., 39(4):283--294, 2004.
|
| |
31
|
Li Wang, Xuejun Yang, Jingling Xue, Yu Deng, Xiaobo Yan, Tao Tang, and Quan Hoang Nguyen. Optimizing scientific application loops on stream processors. In LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems, pages 161--170, New York, NY, USA, 2008. ACM.
|
| |
32
|
Abhishek Das, William J. Dally, and Peter Mattson. Compiling for stream processing. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 33--42, New York, NY, USA, 2006. ACM.
|
| |
33
|
Eddy De Greef. Storage Size Reduction for Multimedia Applications. PhD thesis, Department of Electrical Engineering (ESAT), KULeuven,Belgium, 1998.
|
| |
34
|
J.H. Derby, R.K.Montoye, and J. Moreira. Victoria -- vmx indirect compute technology oriented towareds in-line acceleration. In Proc of CF, pages 303--311, May 2006.
|
|