|
ABSTRACT
Nowadays, multimedia systems deal with huge amounts of memory accesses and large memory footprints. To alleviate the impact of these accesses and reduce the memory footprint, high-level memory exploration and optimization techniques have been proposed. These techniques try to more efficiently utilize the memory hierarchy. An important step in these optimization techniques are loop transformations (LT). They have a crucial effect on later data memory footprint optimization steps and code generation. However, the state-of-the-art work has focused only on individual objectives. The main one in literature involves improving the locality of data accesses, and thus reducing the data memory footprint. It does not consider the trade-offs in the LT step in relation to successive optimization steps. Therefore, it is not globally efficient in mapping the application on the target platform. In this article we will discuss several trade-offs during the loop transformations. To our knowledge, we are the first ones considering these global trade-offs. Previous work always gave mostly one solution, having the best locality and thus the optimized memory footprint, even though some research in two-dimensional trade-offs in this area exists as well. We start from this state-of-the-art solution with minimal footprint. We show that by sacrificing the footprint, we can obtain gains in data reuse (crucial for energy reduction) and reduce the control-flow complexity. We demonstrate our approach on a real-life application, namely the QSDPCM video coder. At the end, we show that considering trade-offs for this application leads to 16% energy reduction in a two-layer memory subsystem and 10% cycle reduction on the ARM platform.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Absar, J., Catthoor, F., and Das, K. 2003. Call-instance based function inlining for increasing data access related optimisation opportunities. Tech. rep., IMEC, Leuven, Belgium.
|
 |
2
|
|
| |
3
|
C. Ancourt , D. Barthou , C. Guettier , F. Irigoin , B. Jeannet , J. Jourdan , J. Mattioli, Automatic data mapping of signal processing applications, Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors, p.350, July 14-16, 1997
|
 |
4
|
Rajeshwari Banakar , Stefan Steinke , Bo-Sik Lee , M. Balakrishnan , Peter Marwedel, Scratchpad memory: design alternative for cache on-chip memory in embedded systems, Proceedings of the tenth international symposium on Hardware/software codesign, May 06-08, 2002, Estes Park, Colorado
[doi> 10.1145/774789.774805]
|
| |
5
|
Banerjee, U., Eigenmann, R., Nicolau, A., and Padua, D. 1993. Automatic program parallelization. Proc. IEEE 81, 2, 211--243.
|
| |
6
|
|
| |
7
|
Bastoul, C., Cohen, A., Girbal, S., Sharma, S., and Temam, O. 2003. Putting polyhedral loop transformations to work. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computers (LCPC'16). Lecture Notes in Computer Science, vol. 2958, 209--225.
|
| |
8
|
|
 |
9
|
Steve Carr , Kathryn S. McKinley , Chau-Wen Tseng, Compiler optimizations for improving data locality, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.252-262, October 05-07, 1994, San Jose, California, United States
|
| |
10
|
Catthoor, F. 2005. Meta model for human interaction and decision-making: meta-concepts and balloonist vs road-builder metaphore. slide set.
|
| |
11
|
Catthoor, F., Danckaert, K., Kulkarni, C., Brockmeyer, E., Kjeldsberg, P. G., Van Achteren, T., and Omnes, T. 2002. Data Access and Storage Management for Embedded Programmable Processors. Kluwer Amsterdam.
|
 |
12
|
|
| |
13
|
|
 |
14
|
Koen Danckaert , Francky Catthoor , Hugo De Man, A preprocessing step for global loop transformations for data transfer optimization, Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems, p.34-40, November 17-19, 2000, San Jose, California, United States
[doi> 10.1145/354880.354886]
|
| |
15
|
|
| |
16
|
|
| |
17
|
De Man, H., Catthoor, F., Goossens, G., Vanhoof, J., Van Meerbergen, J., Note, S., and Huisken, J. 1990. Architecture-driven synthesis techniques for vlsi implementation of dsp algorithms. Proc. IEEE, 78, 2. (Special issue on The Future of Computer-Aided Design) 319--335.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
|
| |
23
|
Hu, Q. 2007. Hierarchical memory size estimation for loop transformation and data memory platform exploration. Ph.D. thesis, Norway University of Science and Technology.
|
| |
24
|
|
| |
25
|
Kelly, W. and Pugh, W. 1993. A framework for unifying reordering transformations. Tech. rep. CS-TR-3193, Department of Computer Science, Univ. of Maryland, College Park MD.
|
| |
26
|
Kjeldsberg, P. G. 2001. Storage requirement estimation and optimization for data intensive applications. Ph.D. thesis, Norwegian University of Science and Technology.
|
| |
27
|
|
| |
28
|
Manjiakian, N. and Abdelrahman, T. 1995. Fusion of loops for parallelism and locality. Tech. rep. CSRI-315, Computer Systems Reserch Institute, University of Toronto.
|
| |
29
|
Meng, T. H., Gordon, B., Tsern, E., and Hung, A. 1995. Portable video-on-demand in wireless communication. Proc. IEEE 83, 4 (Special Issue on Low Power Electronics). 659--680.
|
| |
30
|
Olsen, R. and Gao, G. 1992. Collective analysis and transformation of loop clusters. Tech. rep. ACAPS Technical Memo 24, McGill University.
|
| |
31
|
Palkovic, M., Corporaal, H., and Catthoor, F. 2006. Dealing with data dependent conditions to enable general global source code transformations. Int. J. Embed. Syst. To appear.
|
| |
32
|
|
| |
33
|
Pugh, W. 1992. The omega test: A fast and practical integer programming algorithm for dependence analysis. Comm. ACM 35, 8.
|
| |
34
|
|
| |
35
|
|
 |
36
|
|
| |
37
|
Schuster, T., Bougard, B., Raghavan, P., Priewasser, R., Novo, D., der Perre, L. V., and Catthoor, F. 2007. Design of a low power pre-synchronization asip for multimode sdr terminals. In Proceedings of the International Symposium on Systems, Architectures, Modeling and Simulation (SAMOS). 322--332.
|
 |
38
|
|
 |
39
|
|
| |
40
|
Strobach, P. 1988. Qsdpcm: A new technique in scene adaptive coding. In Proceedings of the 4th European Signal Processing Conference (EUSIPCO-88). 1141--1144.
|
 |
41
|
William Thies , Frédéric Vivien , Jeffrey Sheldon , Saman Amarasinghe, A unified framework for schedule and storage optimization, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.232-242, June 2001, Snowbird, Utah, United States
|
| |
42
|
|
| |
43
|
van Swaaij, M., Franssen, F., Catthoor, F., and De Man, H. 1992. Modelling data and control flow for high-level memory management. In Proceedings of the 3rd ACM/IEEE European Design Automation Conference. 8--13.
|
 |
44
|
Peter Vanbroekhoven , Gerda Janssens , Maurice Bruynooghe , Henk Corporaal , Francky Catthoor, Advanced copy propagation for arrays, Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems, June 11-13, 2003, San Diego, California, USA
|
| |
45
|
Vander Aa, T., Corporaal, H., Catthoor, F., and Deconinck, G. 2005. Combining data and instruction memory energy optimizations for embedded applications. In Proceedings of the 3rd workshop on Embedded Systems for Real-Time Multimedia. 121--126.
|
| |
46
|
Verdoolaege, S., Catthoor, F., Bruynooghe, M., and Janssens, G. 2003. Multidimensional incremental loop fusion for data locality. In Proceedings of the IEEE 14th International Conference on Application-specific Systems, Architectures and Processors.
|
| |
47
|
Wilde, D. 1993. A library for doing polyhedral operations. M.S. thesis, Oregon State University.
|
 |
48
|
|
| |
49
|
Peng Yang , Chun Wong , Paul Marchal , Francky Catthoor , Dirk Desmet , Diederik Verkest , Rudy Lauwereins, Energy-Aware Runtime Scheduling for Embedded-Multiprocessor SOCs, IEEE Design & Test, v.18 n.5, p.46-58, September 2001
[doi> 10.1109/54.953271]
|
|