|
ABSTRACT
While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over complex single core based systems, compilation issues for these architectures have relatively received less attention. Programming MPSOCs can be challenging as several potentially conflicting issues such as data locality, parallelism and load balance across processors should be considered simultaneously. Most of the compilation techniques discussed in the literature for parallel architectures (not necessarily for MPSOCs) are loop based, i.e., they consider each loop nest in isolation. However, one key problem associated with such loop based techniques is that they fail to capture the interactions between the different loop nests in the application. This paper takes a more global approach to the problem and proposes a compiler-driven data locality optimization strategy in the context of embedded MPSOCs. An important characteristic of the proposed approach is that, in deciding the workloads of the processors (i.e., in parallelizing the application) it considers all the loop nests in the application simultaneously. Our experimental evaluation with eight embedded applications shows that the global scheme brings significant power/performance benefits over the conventional loop based scheme.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agarwal, D. Kranz, and V. Natarajan. Automatic partitioning of parallel loops and data arrays for distributed shared memory multiprocessors. In Proc. International Conference on Parallel Processing, 1993.
|
| |
2
|
S. P. Amarasinghe, J. M. Anderson, M. S. Lam, and C. W. Tseng. The SUIF compiler for scalable parallel machines. In Proc. SIAM Conference on Parallel Processing for Scientific Computing, February, 1995.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
Induprakas Kodukula , Nawaaz Ahmed , Keshav Pingali, Data-centric multi-level blocking, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.346-357, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
| |
13
|
MP98: a mobile processor. http://www.labs.nec.co.jp/MP98/top-e.htm.
|
 |
14
|
Kunle Olukotun , Basem A. Nayfeh , Lance Hammond , Ken Wilson , Kunyung Chang, The case for a single-chip multiprocessor, Proceedings of the seventh international conference on Architectural support for programming languages and operating systems, p.2-11, October 01-04, 1996, Cambridge, Massachusetts, United States
|
| |
15
|
The Omega Project. http://www.cs.umd.edu/projects/omega/
|
 |
16
|
|
 |
17
|
Wen-Tsong Shiue , Chaitali Chakrabarti, Memory exploration for low power, embedded systems, Proceedings of the 36th ACM/IEEE conference on Design automation, p.140-145, June 21-25, 1999, New Orleans, Louisiana, United States
[doi> 10.1145/309847.309902]
|
 |
18
|
|
 |
19
|
|
CITED BY 5
|
|
|
|
|
Youcef Bouchebaba , Bruno Girodias , Gabriela Nicolescu , El Mostapha Aboulhamid , Bruno Lavigueur , Pierre Paulin, MPSoC memory optimization using program transformation, ACM Transactions on Design Automation of Electronic Systems (TODAES), v.12 n.4, p.43-es, September 2007
|
|
|
Stefano Bertozzi , Andrea Acquaviva , Davide Bertozzi , Antonio Poggiali, Supporting task migration in multi-processor systems-on-chip: a feasibility study, Proceedings of the conference on Design, automation and test in Europe: Proceedings, March 06-10, 2006, Munich, Germany
|
|
|
|
|
|
B. Girodias , Y. Bouchebaba , G. Nicolescu , E. M. Aboulhamid , P. Paulin , B. Lavigueur, Multiprocessor, Multithreading and Memory Optimization for On-Chip Multimedia Applications, Journal of Signal Processing Systems, v.57 n.2, p.263-283, November 2009
|
|