ACM Home Page
Please provide us with feedback. Feedback
Dynamic parallelization and mapping of binary executables on hierarchical platforms
Full text PdfPdf (241 KB)
Source Conference On Computing Frontiers archive
Proceedings of the 3rd conference on Computing frontiers table of contents
Ischia, Italy
SESSION: Compilation and dynamic optimization table of contents
Pages: 127 - 138  
Year of Publication: 2006
ISBN:1-59593-302-6
Authors
Efe Yardimci  University of California, Irvine, Irvine, CA
Michael Franz  University of California, Irvine, Irvine, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 47,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1128022.1128040
What is a DOI?

ABSTRACT

As performance improvements are being increasingly sought via coarse-grained parallelism, established expectations of continued sequential performance increases are not being met. Current trends in computing point towards platforms seeking performance improvements through various degrees of parallelism, with coarse-grained parallelism features becoming commonplace in even entry-level systems.Yet the broad variety of multiprocessor configurations that will be available that differ in the number of processing elements will make it difficult to statically create a single parallel version of a program that performs well on the whole range of such hardware. As a result, there will soon be a vast number of multiprocessor systems that are significantly under-utilized for lack of software that harnesses their power effectively. This problem is exacerbated by the growing inventory of legacy programs in binary executable form with possibly unreachable source code.We present a system that improves the performance of optimized sequential binaries through dynamic recompilation. Leveraging observations made at runtime, a thin software layer recompiles executing code compiled for a uniprocessor and generates parallelized and/or vectorized code segments that exploit available parallel resources. Among the techniques employed are control speculation, loop distribution across several threads, and automatic parallelization of recursive routines.Our solution is entirely software-based and can be ported to existing hardware platforms that have parallel processing capabilities. Our performance results are obtained on real hardware without using simulation.In preliminary benchmarks on only modestly parallel (2-way) hardware, our system already provides speedups of upto 40% on SpecCPU benchmarks, and near-optimal speedups on more obviously parallelizable benchmarks.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
M. de Alba and D. Kaeli. Runtime predictability of loops, 2001.
6
 
7
 
8
J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478--490, July 1981.
9
 
10
11
 
12
 
13
 
14
A. Klaiber. The technology behind Crusoetexttrademark processors. Transmeta Technical Brief, Jan. 2000.
15
16
 
17
18
19
 
20
21
 
22
E. Yardimci, C. Fensch, N. Dalton, and M. Franz. Azure: A virtual machine for improving execution of sequential programs on throughput-oriented explicitly parallel processors. In 11th International Workshop on Compilers for Parallel Computers, July 2004.
 
23


Collaborative Colleagues:
Efe Yardimci: colleagues
Michael Franz: colleagues