ACM Home Page
Please provide us with feedback. Feedback
Mapping parallelism to multi-cores: a machine learning based approach
Full text PdfPdf (839 KB)
Source
Principles and Practice of Parallel Programming archive
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming table of contents
Raleigh, NC, USA
SESSION: Task mapping and scheduling table of contents
Pages 75-84  
Year of Publication: 2009
ISBN:978-1-60558-397-6
Also published in ...
Authors
Zheng Wang  The University of Edinburgh, Edinburgh, United Kingdom
Michael F.P. O'Boyle  The University of Edinburgh, Edinburgh, United Kingdom
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 42,   Downloads (12 Months): 357,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1504176.1504189
What is a DOI?

ABSTRACT

The efficient mapping of program parallelism to multi-core processors is highly dependent on the underlying architecture. This paper proposes a portable and automatic compiler-based approach to mapping such parallelism using machine learning. It develops two predictors: a data sensitive and a data insensitive predictor to select the best mapping for parallel programs. They predict the number of threads and the scheduling policy for any given program using a model learnt off-line. By using low-cost profiling runs, they predict the mapping for a new unseen program across multiple input data sets. We evaluate our approach by selecting parallelism mapping configurations for OpenMP programs on two representative but different multi-core platforms (the Intel Xeon and the Cell processors). Performance of our technique is stable across programs and architectures. On average, it delivers above 96% performance of the maximum available on both platforms. It achieve, on average, a 37% (up to 17.5 times) performance improvement over the OpenMP runtime default scheme on the Cell platform. Compared to two recent prediction models, our predictors achieve better performance with a significant lower profiling cost.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. H. Bailey, E. Barszcz, et al. The NAS parallel benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, 1991.
2
3
 
4
 
5
F. Blagojevic, X. Feng, et al. Modeling multi-grain parallelism on heterogeneous multicore processors: A case study of the Cell BE. In HiPEAC'08, 2008.
 
6
7
 
8
 
9
10
11
 
12
M. R. Guthaus, J. S. Ringenberg, et al. Mibench: A free, commercially representative embedded benchmark suite, 2001.
 
13
H. Hofstee. Future microprocessors and off-chip SOP interconnect. Advanced Packaging, IEEE Transactions on, 27(2):301--303, May 2004.
14
 
15
E. Ipek, B. R. de Supinski, et al. An approach to performance prediction for parallel applications. In Euro-Par'05, 2005.
16
 
17
 
18
C. Lee. UTDSP benchmark suite, http://www.eecg.toronto.edu/~corinna/DSP/infrastructure/UTDSP.html.
19
 
20
C. Liao and B. Chapman. A compile-time cost model for OpenMP. In IPDPS'07, 2007.
21
 
22
S. Long, G. Fursin, et al. A cost-aware parallel workload allocation approach based on machine learning. In NPC '07, 2007.
23
 
24
25
26
 
27
 
28

Collaborative Colleagues:
Zheng Wang: colleagues
Michael F.P. O'Boyle: colleagues