ACM Home Page
Please provide us with feedback. Feedback
Mostly static program partitioning of binary executables
Full text PdfPdf (2.23 MB)
Source
ACM Transactions on Programming Languages and Systems (TOPLAS) archive
Volume 31 ,  Issue 5  (June 2009) table of contents
Article No. 17  
Year of Publication: 2009
ISSN:0164-0925
Authors
Efe Yardimci  University of California, Irvine
Michael Franz  University of California, Irvine
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1538917.1538918
What is a DOI?

ABSTRACT

We have built a runtime compilation system that takes unmodified sequential binaries and improves their performance on off-the-shelf multiprocessors using dynamic vectorization and loop-level parallelization techniques. Our system, Azure, is purely software based and requires no specific hardware support for speculative thread execution, yet it is able to break even in most cases; that is, the achieved speedup exceeds the cost of runtime monitoring and compilation, often by significant amounts.

Key to this remarkable performance is an offline preprocessing step that extracts a mostly correct control flow graph (CFG) from the binary program ahead of time. This statically obtained CFG is incomplete in that it may be missing some edges corresponding to computed branches. We describe how such additional control flow edges are discovered and handled at runtime, so that an incomplete static analysis never leads to an incorrect optimization result.

The availability of a mostly correct CFG enables us to statically partition a binary executable into single-entry multiple-exit regions and to identify potential parallelization candidates ahead of execution. Program regions that are not candidates for parallelization can thereby be excluded completely from runtime monitoring and dynamic recompilation. Azure's extremely low overhead is a direct consequence of this design.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
Balakrishnan, G. and Reps, T. 2004. Analyzing memory accesses in x86 executables. In Proceedings of the Conference on Compiler Construction. Lecture Notes in Computer Science, vol. 2985, Springer Verlag, 5--23.
 
4
 
5
 
6
 
7
8
 
9
 
10
11
 
12
13
 
14
 
15
16
 
17
 
18
 
19
Kagan, M., Gochman, S., Orenstien, D., and Lin, D. 1997. MMX micro-architecture of Pentium processors with MMX technology and Pentium II micro-processors. Intel Techn. J. 8.
 
20
21
 
22
Klaiber, A. 2000. The technology behind Crusoe processors. White Paper, Transmeta Corp.
23
 
24
 
25
Krishnan, V. S. 1998. Speculative multithreading architectures. Tech. rep. UIUCDCS-R-98-2048, UIUC.
 
26
27
 
28
29
 
30
31
32
 
33
 
34
 
35
Skoglund, J. and Felsberg, M. 2005. Fast image processing using SSE2. In Proceedings of the SSBA Symposium on Image Analysis.
36
37
 
38
Thakkar, S. T. and Huff, T. 1999. The Internet Streaming SIMD Extensions. Intel Tech. J., 8.
 
39
 
40
41
 
42
43
 
44
45
 
46

Collaborative Colleagues:
Efe Yardimci: colleagues
Michael Franz: colleagues