| Dynamic parallelization and mapping of binary executables on hierarchical platforms |
| Full text |
Pdf
(241 KB)
|
| Source
|
Conference On Computing Frontiers
archive
Proceedings of the 3rd conference on Computing frontiers
table of contents
Ischia, Italy
SESSION: Compilation and dynamic optimization
table of contents
Pages: 127 - 138
Year of Publication: 2006
ISBN:1-59593-302-6
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 47, Citation Count: 1
|
|
|
ABSTRACT
As performance improvements are being increasingly sought via coarse-grained parallelism, established expectations of continued sequential performance increases are not being met. Current trends in computing point towards platforms seeking performance improvements through various degrees of parallelism, with coarse-grained parallelism features becoming commonplace in even entry-level systems.Yet the broad variety of multiprocessor configurations that will be available that differ in the number of processing elements will make it difficult to statically create a single parallel version of a program that performs well on the whole range of such hardware. As a result, there will soon be a vast number of multiprocessor systems that are significantly under-utilized for lack of software that harnesses their power effectively. This problem is exacerbated by the growing inventory of legacy programs in binary executable form with possibly unreachable source code.We present a system that improves the performance of optimized sequential binaries through dynamic recompilation. Leveraging observations made at runtime, a thin software layer recompiles executing code compiled for a uniprocessor and generates parallelized and/or vectorized code segments that exploit available parallel resources. Among the techniques employed are control speculation, loop distribution across several threads, and automatic parallelization of recursive routines.Our solution is entirely software-based and can be ported to existing hardware platforms that have parallel processing capabilities. Our performance results are obtained on real hardware without using simulation.In preliminary benchmarks on only modestly parallel (2-way) hardware, our system already provides speedups of upto 40% on SpecCPU benchmarks, and near-optimal speedups on more obviously parallelizable benchmarks.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Vasanth Bala , Evelyn Duesterwald , Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, p.1-12, June 18-21, 2000, Vancouver, British Columbia, Canada
|
| |
2
|
Leonid Baraz , Tevi Devor , Orna Etzion , Shalom Goldenberg , Alex Skaletsky , Yun Wang , Yigel Zemach, IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.191, December 03-05, 2003
|
| |
3
|
|
 |
4
|
|
| |
5
|
M. de Alba and D. Kaeli. Runtime predictability of loops, 2001.
|
 |
6
|
|
| |
7
|
|
| |
8
|
J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478--490, July 1981.
|
 |
9
|
Brian Grant , Matthai Philipose , Markus Mock , Craig Chambers , Susan J. Eggers, An evaluation of staged run-time optimizations in DyC, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.293-304, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
10
|
|
 |
11
|
|
| |
12
|
Wen-Mei W. Hwu , Scott A. Mahlke , William Y. Chen , Pohua P. Chang , Nancy J. Warter , Roger A. Bringmann , Roland G. Ouellette , Richard E. Hank , Tokuzo Kiyohara , Grant E. Haab , John G. Holm , Daniel M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing, v.7 n.1-2, p.229-248, May 1993
[doi> 10.1007/BF01205185]
|
| |
13
|
|
| |
14
|
A. Klaiber. The technology behind Crusoetexttrademark processors. Transmeta Technical Brief, Jan. 2000.
|
 |
15
|
|
 |
16
|
|
| |
17
|
Eric Rotenberg , Quinn Jacobson , Yiannakis Sazeides , Jim Smith, Trace processors, Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, p.138-148, December 01-03, 1997, Research Triangle Park, North Carolina, United States
|
 |
18
|
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
| |
22
|
E. Yardimci, C. Fensch, N. Dalton, and M. Franz. Azure: A virtual machine for improving execution of sequential programs on throughput-oriented explicitly parallel processors. In 11th International Workshop on Compilers for Parallel Computers, July 2004.
|
| |
23
|
|
|