| Challenges in exploitation of loop parallelism in embedded applications |
| Full text |
Pdf
(360 KB)
|
| Source
|
International Conference on Hardware Software Codesign
archive
Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
table of contents
Seoul, Korea
SESSION: Programming models for multiprocessor systems: from supercomputing programming to multiprocessors on a chip
table of contents
Pages: 173 - 180
Year of Publication: 2006
ISBN:1-59593-370-0
|
|
Authors
|
|
Arun Kejariwal
|
University of California at Irvine, Irvine, CA, USA
|
|
Alexander V. Veidenbaum
|
University of California at Irvine, Irvine, CA, USA
|
|
Alexandru Nicolau
|
University of California at Irvine, Irvine, CA, USA
|
|
Milind Girkarmark
|
Intel Corporation, Santa Clara, CA, USA
|
|
Xinmin Tian
|
Intel Corporation, Santa Clara, CA, USA
|
|
Hideki Saito
|
Intel Corporation, Santa Clara, CA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 36, Downloads (12 Months): 116, Citation Count: 2
|
|
|
ABSTRACT
Embedded processors have been increasingly exploiting hardware parallelism. Vector units, multiple processors or cores, hyper-threading, special-purpose accelerators such as DSPs or cryptographic engines, or a combination of the above have appeared in a number of processors. They serve to address the increasing performance requirements of modern embedded applications. How this hardware parallelism can be exploited by applications is directly related to the amount of parallelism inherent in a target application. In this paper we evaluate the performance potential of different types of parallelism, viz., true thread-level parallelism, speculative thread-level parallelism and vector parallelism, when executing loops. Applications from the industry-standard EEMBC 1.1, EEMBC 2.0 and the MiBench embedded benchmark suites are analyzed using the Intel C compiler. The results show what can be achieved today, provide upper bounds on the performance potential of different types of thread parallelism, and point out a number of issues that need to be addressed to improve performance. The latter include parallelization of libraries such as libc and design of parallel algorithms to allow maximal exploitation of parallelism. The results also point to the need for developing new benchmark suites more suitable to parallel compilation and execution.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Prakash and A. C. Parker. SOS: Synthesis of application-specific heterogeneous multiprocessor systems. Journal of Parallel and Distributed Computing, 16:338--351, 1992.
|
| |
2
|
|
 |
3
|
Damien Lyonnard , Sungjoo Yoo , Amer Baghdadi , Ahmed A. Jerraya, Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip, Proceedings of the 38th conference on Design automation, p.518-523, June 2001, Las Vegas, Nevada, United States
[doi> 10.1145/378239.379015]
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
ARM11 Family. http://www.arm.com/products/CPUs/families/ARM11Family.html.
|
| |
8
|
Intel® IXP2850 Network Processor. http://www.intel.com/design/network/products/npfamily/ixp2850.htm.
|
| |
9
|
OMAP2420. http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123&navigationId=11990&contentId=4671.
|
| |
10
|
Intel® Multi-Core Processor Architecture Development. http://www.intel.com/cd/ids/developer/asmo-na/eng/201969.htm?page=6.
|
| |
11
|
Dual-Core Intel® Xeon® Processor 7000 sequence Platform Brief. ftp://download.intel.com/products/processor/xeon/dc7kplatbrief.pdf.
|
| |
12
|
The Cell Processor. http://arstechnica.com/articles/paedia/cpu/cell-1.ars.
|
| |
13
|
RAMP: Research Accelerator for Multiple Processors. http://ramp.eecs.berkeley.edu/.
|
 |
14
|
|
| |
15
|
M. Flynn. Very high-speed computing systems. Proceedings of the IEEE, 54(12):1901--1909, December 1966.
|
| |
16
|
S. Lundstrom and G. Barnes. A controllable MIMD architectures. In Proceedings of the 1980 International Conference on Parallel Processing, St. Charles, IL, August 1980.
|
| |
17
|
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211--243, February 1993.
|
| |
18
|
EEMBC. http://www.eembc.org/.
|
| |
19
|
MiBench Version 1.0. http://www.eecs.umich.edu/mibench/.
|
| |
20
|
OpenMP Specification, version 2.5. http://www.openmp.org/drupal/mp-documents/spec25.pdf.
|
| |
21
|
C. Polychronopoulos. Loop coalescing: A compiler transformation for parallel machines In Proceedings of the 1987 International Conference on Parallel Processing, pages 235--242, August 1987.
|
 |
22
|
|
 |
23
|
|
| |
24
|
GNU C library. http://www.gnu.org/software/libc/.
|
| |
25
|
|
| |
26
|
A. Kejariwal and A. Nicolau. Reading list of performance analysis, speculative execution. http://www.ics.uci.edu/~akejariw/SpeculativeExecutionReadingList.pdf.
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
Sriram Vajapeyam , P. J. Joseph , Tulika Mitra, Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs, Proceedings of the 26th annual international symposium on Computer architecture, p.16-27, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
32
|
SPEC: Standard Performance Evaluation Corporation. http://www.spec.org/.
|
CITED BY 3
|
|
|
|
|
|
|
|
Arun Kejariwal , Alexander V. Veidenbaum , Alexandru Nicolau , Milind Girkar , Xinmin Tian , Hideki Saito, On the exploitation of loop-level parallelism in embedded applications, ACM Transactions on Embedded Computing Systems (TECS), v.8 n.2, p.1-34, January 2009
|
|