ACM Home Page
Please provide us with feedback. Feedback
Cache-aware iteration space partitioning
Full text PdfPdf (250 KB)
Source
Principles and Practice of Parallel Programming archive
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming table of contents
Salt Lake City, UT, USA
POSTER SESSION: Poster session table of contents
Pages 269-270  
Year of Publication: 2008
ISBN:978-1-59593-795-7
Authors
Arun Kejariwal  UC Irvine, Irvine, USA
Alexandru Nicolau  UC, Irvine, Irvine, USA
Utpal Banerjee  Intel, Santa Clara, USA
Alexander V. Veidenbaum  UC, Irvine, Irvine, USA
Constantine D. Polychronopoulos  UIUC, Urbana Champaign, USA
Sponsors
SIGPLAN: ACM Special Interest Group on Programming Languages
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 106,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1345206.1345250
What is a DOI?

ABSTRACT

The need for high performance per watt has led to the development of multi-core systems such as the Intel Core 2 Duo processor and the Intel quad-core Kentsfield processor. Maximal exploitation of the hardware parallelism supported by such systems necessitates the development of concurrent software. This, in part, entails program parallelization and efficient mapping of the parallelized program onto the different cores. The latter affects the load balance between the different cores which in turn has a direct impact on performance. In light of the fact that parallel loops, such as a parallel DO loop in Fortran, account for a large percentage of the total execution time, we focus on the problem of how to efficiently partition the iteration space of (possibly) nested perfect/non-perfect parallel loops. In this regard, one of the key aspects is how to efficiently capture the cache behavior as the cache subsystem is often the main performance bottleneck in multi-core systems. In this paper, we present a novel profile-guided compiler technique for cache-aware partitioning of iteration spaces of parallel loops. We present a case study using a kernel from the industry-standard SPEC CPU benchmark suite.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
R. Sakellariou. On the Quest for Perfect Load Balance in Loop-Based Parallel Computations. PhD thesis, Department of Computer Science, University of Manchester, October 1996.
 
6
C. Polychronopoulos, D. J. Kuck, and D. A. Padua. Execution of parallel loops on parallel processor systems. In Proceedings of the 1986 International Conference on Parallel Processing, pages 519--527, August 1986.
 
7
8
9
 
10
A. Kejariwal, P. D'Alberto, A. Nicolau, and C. D. Polychronopoulos. A geometric approach for partitioning N-dimensional non-rectangular iteration spaces. In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing, pages 102--116, West Lafayette, IN, 2004.
 
11
SPEC CINT2006. http://www.spec.org/cpu2006/CINT2006.
 
12
Intel R VTune TM Performance Analyzer 8.0.1 for Windows. http://www.intel.com/cd/software/products/asmo-na/eng/vtune/219898.htm.
 
13
 
14
C. Polychronopoulos. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the 1987 International Conference on Parallel Processing, pages 235--242, August 1987.
15
 
16
17
 
18

Collaborative Colleagues:
Arun Kejariwal: colleagues
Alexandru Nicolau: colleagues
Utpal Banerjee: colleagues
Alexander V. Veidenbaum: colleagues
Constantine D. Polychronopoulos: colleagues