|
ABSTRACT
There is a trend towards using accelerators to increase performance and energy efficiency of general-purpose processors. Adoption of accelerators, however, depends on the availability of tools to facilitate programming these devices. In this paper, we present techniques for automatically partitioning programs for execution on accelerators. We call the off-loaded code regions sub-algorithms, which are parts of the program that are loosely connected to the remainder of the program. We present three heuristics for automatically identifying sub-algorithms based on control flow and data flow properties. Analysis of SPECint and MiBench benchmarks shows that on average 12 sub-algorithms are identified (up to 54), covering the full execution time for 27 out of 30 benchmarks. We show that these sub-algorithms are suitable for off-loading them to accelerators by manually implementing sub-algorithms for 2 SPECint benchmarks on the Cell processor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. O. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Copenhagen, May 1994.
|
| |
2
|
R. H. Bell. Automatic workload synthesis for early design studies and performance model validation. lib.utexas.edu, page 169, 2005.
|
 |
3
|
|
| |
4
|
A. Cantle and R. Bruce. An Introduction to the Nallatech Slipstream FSB-FPGA Accelerator Module for Intel Platforms. White paper, http://www.nallatech.com, Sept. 2007.
|
| |
5
|
Marc Eaddy , Alfred V. Aho , Giuliano Antoniol , Yann-Gaël Guéhéneuc, CERBERUS: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis, Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension, p.53-62, June 10-13, 2008
[doi> 10.1109/ICPC.2008.39]
|
| |
6
|
Alexandre E. Eichenberger , Kathryn O'Brien , Kevin O'Brien , Peng Wu , Tong Chen , Peter H. Oden , Daniel A. Prener , Janice C. Shepherd , Byoungro So , Zehra Sura , Amy Wang , Tao Zhang , Peng Zhao , Michael Gschwind, Optimizing Compiler for the CELL Processor, Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, p.161-172, September 17-21, 2005
[doi> 10.1109/PACT.2005.33]
|
| |
7
|
|
| |
8
|
P. Elias, A. Feinstein, and C. Shannon. A note on the maximum flow through a network. Information Theory, IEEE Transactions on, 2(4):117--119, 1956.
|
 |
9
|
|
 |
10
|
|
| |
11
|
M. R. Guthaus , J. S. Ringenberg , D. Ernst , T. M. Austin , T. Mudge , R. B. Brown, MiBench: A free, commercially representative embedded benchmark suite, Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop, p.3-14, December 02-02, 2001
[doi> 10.1109/WWC.2001.15]
|
| |
12
|
T. R. Halfhill. Floating point buoys ClearSpeed. Microprocessor Report, page 7, Nov. 2003.
|
| |
13
|
IBM. Performance Analysis with the IBM Full-System Simulator. Documentation, http://www.ibm.com/developerworks/power/cell/, Sept. 2007.
|
| |
14
|
R. Jain. The Art of Computer Systems Performance Analysis. John Wiley & Sons, 1991.
|
| |
15
|
John H. Kelm , Isaac Gelado , Mark J. Murphy , Nacho Navarro , Steve Lumetta , Wen-mei Hwu, CIGAR: Application Partitioning for a CPU/Coprocessor Architecture, Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, p.317-326, September 15-19, 2007
[doi> 10.1109/PACT.2007.21]
|
| |
16
|
|
| |
17
|
D. J. Lilja. Measuring Computer Performance. Cambridge University Press, 2000.
|
| |
18
|
|
 |
19
|
|
| |
20
|
D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, M. Riley, D. Shippy, D. Stasiak, M. Suzuoki, M. Wang, J. Warnock, S. Weitzel, D. Wendel, T. Yamazaki, and K. Yazawa. The design and implementation of a first-generation Cell processor. In ISSCC 2005, IEEE International Solid-State Circuits Conference, pages 184--592, 2005.
|
| |
21
|
S. Rul, H. Vandierendonck, and K. De Bosschere. Detecting the existence of coarse-grain parallelism in general-purpose programs. In Proceedings of the First Workshop on Programmability Issues for Multi-Core Computers, MULTIPROG-1, page 12, 1 2008.
|
 |
22
|
Scott Schneider , Jae-Seung Yeom , Benjamin Rose , John C. Linford , Adrian Sandu , Dimitrios S. Nikolopoulos, A comparison of programming models for multiprocessors with explicitly managed memory hierarchies, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, February 14-18, 2009, Raleigh, NC, USA
|
| |
23
|
|
 |
24
|
|
 |
25
|
Dinesh C. Suresh , Walid A. Najjar , Frank Vahid , Jason R. Villarreal , Greg Stitt, Profiling tools for hardware/software partitioning of embedded applications, Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems, June 11-13, 2003, San Diego, California, USA
|
| |
26
|
H. Vandierendonck, S. Rul, M. Questier, and K. De Bosschere. Experiences with parallelizing a bio-informatics program on the Cell BE. In HiPEAC 2008, volume 4917, pages 161--175. Springer, 1 2008.
|
| |
27
|
|
| |
28
|
|
|