ACM Home Page
Please provide us with feedback. Feedback
Slicing based code parallelization for minimizing inter-processor communication
Full text PdfPdf (586 KB)
Source
International Conference on Compilers, Architecture and Synthesis for Embedded Systems archive
Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems table of contents
Grenoble, France
SESSION: Parallelism, streams, and spilling table of contents
Pages 87-96  
Year of Publication: 2009
ISBN:978-1-60558-626-7
Authors
Mahmut Kandemir  The Pennsylvania State University, State College, PA, USA
Yuanrui Zhang  The Pennsylvania State University, State College, PA, USA
Sai Prasanth Muralidhara  The Pennsylvania State University, State College, PA, USA
Ozcan Ozturk  Bilkent University, Ankara, Turkey
Sri Hari Krishna Narayanan  The Pennsylvania State University, State College, PA, USA
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 14,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1629395.1629409
What is a DOI?

ABSTRACT

One of the critical problems in distributed memory multi-core architectures is scalable parallelization that minimizes inter-processor communication. Using the concept of iteration space slicing, this paper presents a new code parallelization scheme for data-intensive applications. This scheme targets distributed memory multi-core architectures, and formulates the problem of data-computation distribution (partitioning) across parallel processors using slicing such that, starting with the partitioning of the output arrays, it iteratively determines the partitions of other arrays as well as iteration spaces of the loop nests in the application code. The goal is to minimize inter-processor data communications. Based on this iteration space slicing based formulation of the problem, we also propose a solution scheme. The proposed data-computation scheme is evaluated using six data-intensive benchmark programs. In our experimental evaluation, we also compare this scheme against three alternate data-computation distribution schemes. The results obtained are very encouraging, indicating around 10% better speedup, with 16 processors, over the next-best scheme when averaged over all benchmark codes we tested.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
H. Agrawal and J. R. Horgan. Dynamic program slicing. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1990.
 
2
S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In ACM SIGPLAN Notices, 1993.
 
3
J. M. Anderson. Automatic computation and data decomposition for multiprocessors. Ph.D Thesis, Stanford University, March 1997.
 
4
P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su. The paradigm compiler for distributed-memory multicomputers. In IEEE Computer, 1995.
 
5
A. Basumallik, S.-J. Min, and R. Eigenmann. Programming distributed systems using Openmp. In Proceedings of HIPS, 2007.
 
6
R. Brightwell, R. Riesen, K. D. Underwood. Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. In International Journal of High Performance Computing Applications, May 2005.
 
7
S. Chakrabarti, M. Gupta, and J. deok Choi. Global communication analysis and optimization. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1996.
 
8
S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, S. Hua Teng, D. John, and R. Gilbert. Generating local addresses and communication sets for data-parallel programs. In Journal of Parallel and Distributed Computing, 1995.
 
9
A. Danalis , K.-Y. Kim , L. Pollock , M. Swany. Transformations to parallel codes for communication-computation overlap. In Proceedings of the ACM/IEEE Conference on Supercomputing, 2005.
 
10
P. Feautrier, J.F. Collard, C. Bastoul. Solving systems of affine (in)equalities. Technical Report, PRiSM,Versailles University, France, 2002.
 
11
K. B. Gallagher and J. R. Lyle. Using program slicing in software maintenance. In IEEE Transactions on Software Engineering, 17:751--761, 1991.
 
12
C. Gong, R. Gupta, and R. Melhem. Compilation techniques for optimizing communication on distributed-memory systems. In In Proc. International Conference on Parallel Processing, Volume II, 1993.
 
13
M. Gupta and P. Banerjee. Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers. IEEE Transactions on Parallel and Distributed Systems, 3:179--193, 1992.
 
14
M. W. Hall, M. W. Hall, S. Hiranandani, S. Hiranandani, K. Kennedy, K. Kennedy, C. wen Tseng, and C. wen Tseng. Compiling Fortran D for MIMD distributed--memory machines. Communications of the ACM, 35:66--80, 1992.
 
15
M. J. Harrold and N. Ci. Reuse-driven interprocedural slicing. In In Proceedings of the 20th International Conference on Software Engineering, 1998.
 
16
High Performance Fortran. http://www.netlib.org/hpf/
 
17
S. Horwitz, T. Reps, and D. Binkley. Interprocedural slicing using dependence graphs. In ACM Transactions on Programming Languages and Systems, 12:26--60, 1990.
 
18
K. Kennedy and K. Kennedy. Combining dependence and data-flow analyses to optimize communication. In Proceedings of the 9th International Parallel Processing Symposium, 1995.
 
19
K. Kennedy and A. Sethi. Resource-based communication placement analysis. In Proceedings of the 9th Workshop on Language and Compilers for Parallel Computing, 1996.
 
20
R. Komondoor and S. Horwitz. Using slicing to identify duplication in source code. In Proceedings of the 8th International Symposium on Static Analysis, 2001.
 
21
D. Liang and M. J. Harrold. Reuse-driven interprocedural slicing in the presence of pointers and recursion. In Proceedings of International Conference on Software Maintenance, 1999.
 
22
MPI Standard. http://www-unix.mcs.anl.gov/mpi/
 
23
The Omega Project. http://www.cs.umd.edu/projects/omega/
 
24
L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In Proceedings of CGO, pp. 144--156, 2007.
 
25
E. Rosser and W. Pugh. Iteration space slicing and its application to communication optimization. In Proceedings of the International Conference on Supercomputing,,1997.
 
26
T. Reps and W. Yang. The semantics of program slicing. Technical Report, University of Wisconsin, 1988.
 
27
 
28
 
29
F. Tip. A survey of program slicing techniques. Journal of Programming Languages, 3:121--189, 1995.
 
30
S. Verdoolaege, K. Beyls, M. Bruynooghe, and F. Catthoor. Experiences with Enumeration of Integer Projections of Parametric Polytopes. In Proceedings of the Compiler Construction Symposium, pp. 91--105, 2005.
 
31
M. Weiser. Program slicing. In Proceedings of the 5th international conference on Software engineering, 1981.
 
32
X. Zhang and R. Gupta. Cost effective program slicing. In Proceedings of the Programming Languages Design and Implementation, 2004.
 
33
J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, pp.112--125, 1993.
 
34
M. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2, 4:452--471,1991.
 
35
K. Bondalapati. Parallelizing DSP nested loops on reconfigurable architectures using data context switching. In Proc. of the 38th Design Automation Conference, pp.273--276, 2001.
 
36
G. Goumas, N. Drosinos, M. Athanasaki and N. Koziris. Automatic parallel code generation for tiled nested loops. In Proc. of the ACM Symposium on Applied Computing, pp. 1412--1419, 2004.
 
37
B. Mei, S. Vernalde, D. Verkest, H. Man and R. Lauwereins. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In Proc. of the Conference on Design, Automation and Test in Europe, pp. 10296--10301, 2003.
 
38
K. Hogstedt, L. Carter and J. Ferrante. On the parallel execution time of tiled loops. IEEE Transactions on Parallel Distributed Systems 14, 3:307--321, 2003.
 
39
A. Navarro, E. Zapata and D. Padua. Compiler techniques for the distribution of data and computation. IEEE Transactions on Parallel Distributed Systems 14, 6:545--562, 2003.