| Cache-aware iteration space partitioning |
| Full text |
Pdf
(250 KB)
|
Source
|
Principles and Practice of Parallel Programming
archive
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
table of contents
Salt Lake City, UT, USA
POSTER SESSION: Poster session
table of contents
Pages 269-270
Year of Publication: 2008
ISBN:978-1-59593-795-7
|
|
Authors
|
|
Arun Kejariwal
|
UC Irvine, Irvine, USA
|
|
Alexandru Nicolau
|
UC, Irvine, Irvine, USA
|
|
Utpal Banerjee
|
Intel, Santa Clara, USA
|
|
Alexander V. Veidenbaum
|
UC, Irvine, Irvine, USA
|
|
Constantine D. Polychronopoulos
|
UIUC, Urbana Champaign, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 106, Citation Count: 0
|
|
|
ABSTRACT
The need for high performance per watt has led to the development of multi-core systems such as the Intel Core 2 Duo processor and the Intel quad-core Kentsfield processor. Maximal exploitation of the hardware parallelism supported by such systems necessitates the development of concurrent software. This, in part, entails program parallelization and efficient mapping of the parallelized program onto the different cores. The latter affects the load balance between the different cores which in turn has a direct impact on performance. In light of the fact that parallel loops, such as a parallel DO loop in Fortran, account for a large percentage of the total execution time, we focus on the problem of how to efficiently partition the iteration space of (possibly) nested perfect/non-perfect parallel loops. In this regard, one of the key aspects is how to efficiently capture the cache behavior as the cache subsystem is often the main performance bottleneck in multi-core systems. In this paper, we present a novel profile-guided compiler technique for cache-aware partitioning of iteration spaces of parallel loops. We present a case study using a kernel from the industry-standard SPEC CPU benchmark suite.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
Arun Kejariwal , Xinmin Tian , Wei Li , Milind Girkar , Sergey Kozhukhov , Hideki Saito , Utpal Banerjee , Alexandru Nicolau , Alexander V. Veidenbaum , Constantine D. Polychronopoulos, On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
[doi> 10.1145/1183401.1183407]
|
 |
4
|
|
| |
5
|
R. Sakellariou. On the Quest for Perfect Load Balance in Loop-Based Parallel Computations. PhD thesis, Department of Computer Science, University of Manchester, October 1996.
|
| |
6
|
C. Polychronopoulos, D. J. Kuck, and D. A. Padua. Execution of parallel loops on parallel processor systems. In Proceedings of the 1986 International Conference on Parallel Processing, pages 519--527, August 1986.
|
| |
7
|
|
 |
8
|
Arun Kejariwal , Alexandru Nicolau , Hideki Saito , Xinmin Tian , Milind Girkar , Utpal Banerjee , Constantine D. Polychronopoulos, A general approach for partitioning N-dimensional parallel nested loops with conditionals, Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, July 30-August 02, 2006, Cambridge, Massachusetts, USA
[doi> 10.1145/1148109.1148117]
|
 |
9
|
Arun Kejariwal , Alexandru Nicolau , Utpal Banerjee , Constantine D. Polychronopoulos, A novel approach for partitioning iteration spaces with variable densities, Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 15-17, 2005, Chicago, IL, USA
[doi> 10.1145/1065944.1065962]
|
| |
10
|
A. Kejariwal, P. D'Alberto, A. Nicolau, and C. D. Polychronopoulos. A geometric approach for partitioning N-dimensional non-rectangular iteration spaces. In Proceedings of the 17th International Workshop on Languages and Compilers for Parallel Computing, pages 102--116, West Lafayette, IN, 2004.
|
| |
11
|
SPEC CINT2006. http://www.spec.org/cpu2006/CINT2006.
|
| |
12
|
Intel R VTune TM Performance Analyzer 8.0.1 for Windows. http://www.intel.com/cd/software/products/asmo-na/eng/vtune/219898.htm.
|
| |
13
|
|
| |
14
|
C. Polychronopoulos. Loop coalescing: A compiler transformation for parallel machines. In Proceedings of the 1987 International Conference on Parallel Processing, pages 235--242, August 1987.
|
 |
15
|
|
| |
16
|
|
 |
17
|
Siddhartha Chatterjee , Erin Parker , Philip J. Hanlon , Alvin R. Lebeck, Exact analysis of the cache behavior of nested loops, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.286-297, June 2001, Snowbird, Utah, United States
|
| |
18
|
|
|