ACM Home Page
Please provide us with feedback. Feedback
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Full text PdfPdf (1.36 MB)
Source Annual ACM IEEE Design Automation Conference archive
Proceedings of the 42nd annual Design Automation Conference table of contents
Anaheim, California, USA
SESSION: Embedded software table of contents
Pages: 95 - 100  
Year of Publication: 2005
ISBN:1-59593-058-2
Authors
Feihui Li  The Pennsylvania State University
Mahmut Kandemir  The Pennsylvania State University
Sponsors
ACM: Association for Computing Machinery
SIGDA: ACM Special Interest Group on Design Automation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 17,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1065579.1065609
What is a DOI?

ABSTRACT

While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over complex single core based systems, compilation issues for these architectures have relatively received less attention. Programming MPSOCs can be challenging as several potentially conflicting issues such as data locality, parallelism and load balance across processors should be considered simultaneously. Most of the compilation techniques discussed in the literature for parallel architectures (not necessarily for MPSOCs) are loop based, i.e., they consider each loop nest in isolation. However, one key problem associated with such loop based techniques is that they fail to capture the interactions between the different loop nests in the application. This paper takes a more global approach to the problem and proposes a compiler-driven data locality optimization strategy in the context of embedded MPSOCs. An important characteristic of the proposed approach is that, in deciding the workloads of the processors (i.e., in parallelizing the application) it considers all the loop nests in the application simultaneously. Our experimental evaluation with eight embedded applications shows that the global scheme brings significant power/performance benefits over the conventional loop based scheme.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Agarwal, D. Kranz, and V. Natarajan. Automatic partitioning of parallel loops and data arrays for distributed shared memory multiprocessors. In Proc. International Conference on Parallel Processing, 1993.
 
2
S. P. Amarasinghe, J. M. Anderson, M. S. Lam, and C. W. Tseng. The SUIF compiler for scalable parallel machines. In Proc. SIAM Conference on Parallel Processing for Scientific Computing, February, 1995.
 
3
 
4
5
6
 
7
8
 
9
 
10
 
11
12
 
13
MP98: a mobile processor. http://www.labs.nec.co.jp/MP98/top-e.htm.
14
 
15
The Omega Project. http://www.cs.umd.edu/projects/omega/
16
17
18
19


Collaborative Colleagues:
Feihui Li: colleagues
Mahmut Kandemir: colleagues