ACM Home Page
Please provide us with feedback. Feedback
Integrated code and data placement in two-dimensional mesh based chip multiprocessors
Full text PdfPdf (799 KB)
Source
International Conference on Computer Aided Design archive
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design table of contents
San Jose, California
SESSION: Advances in embedded systems table of contents
Pages 583-588  
Year of Publication: 2008
ISBN ~ ISSN:1092-3152 , 978-1-4244-2820-5
Authors
Taylan Yemliha  Syracuse University, Syracuse, NY
Shekhar Srikantaiah  Pennsylvania State University, University Park, PA
Mahmut Kandemir  Pennsylvania State University, University Park, PA
Mustafa Karakoy  Imperial College, London
Mary Jane Irwin  Pennsylvania State University, University Park, PA
Sponsors
: IEEE CASS/CANDE
: IEEE Council on Electronic Design Automation (CEDA)
SIGDA: ACM Special Interest Group on Design Automation
Publisher
IEEE Press  Piscataway, NJ, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 34,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  

ABSTRACT

As transistor sizes continue to shrink and the number of transistors per chip keeps increasing, chip multiprocessors (CMPs) are becoming a promising alternative to remain on the current performance trajectory for both high-end systems and embedded systems. Since future technologies offer the promise of being able to integrate billions of transistors on a chip, the prospects of having hundreds to thousands of processors on a single chip along with an underlying memory hierarchy and an interconnection system is entirely feasible. This paper proposes a compiler directed integrated code and data placement scheme for two-dimensional mesh based CMP architectures. The proposed approach uses a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data and then assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. During the mapping process, the on-chip memory capacity and load imbalance across different cores and the topology of the NoC are taken into account. In this paper, we present two variants of our approach: depth-first placement (DFP) and breadth-first placement (BFP), and compare them to three alternate code/data mapping schemes. The experimental evaluation shows that our CDAG based placement schemes are very successful in practice, achieving average performance improvements of 19.9% (DFP) and 16.8% (BFP), and average energy improvements of 29.7% (DFP) and 27.8% (BFP).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
S. P. Amarasinghe et al. The SUIF compiler for scalable parallel machines. In Proc. Seventh SIAM PP, Feb. 1995.
 
3
 
4
 
5
6
 
7
M. Brorsson. Performance Impact of Code and Data Placement on the IBM RP3. TR, IBM, 1989.
8
 
9
 
10
J. Hu and R. Marculescu. Energy- and performance-aware mapping for regular NoC architectures. IEEE TCAD, 24(4):551--562, Apr. 2005.
 
11
 
12
 
13
F. Kuijlman et al. A unified compiler framework for work and data placement. In Proc. ASCI 2002 Conference, pages 109--115, 2002.
14
 
15
 
16
 
17
 
18
P. Shivakumar and N. Jouppi. CACTI 3.0. http://research.compaq.com/wrl/people/jouppi/CACTI.html
 
19
 
20
 
21
Collaborative Colleagues:
Taylan Yemliha: colleagues
Shekhar Srikantaiah: colleagues
Mahmut Kandemir: colleagues
Mustafa Karakoy: colleagues
Mary Jane Irwin: colleagues