ACM Home Page
Please provide us with feedback. Feedback
Challenges in exploitation of loop parallelism in embedded applications
Full text PdfPdf (360 KB)
Source International Conference on Hardware Software Codesign archive
Proceedings of the 4th international conference on Hardware/software codesign and system synthesis table of contents
Seoul, Korea
SESSION: Programming models for multiprocessor systems: from supercomputing programming to multiprocessors on a chip table of contents
Pages: 173 - 180  
Year of Publication: 2006
ISBN:1-59593-370-0
Authors
Arun Kejariwal  University of California at Irvine, Irvine, CA, USA
Alexander V. Veidenbaum  University of California at Irvine, Irvine, CA, USA
Alexandru Nicolau  University of California at Irvine, Irvine, CA, USA
Milind Girkarmark  Intel Corporation, Santa Clara, CA, USA
Xinmin Tian  Intel Corporation, Santa Clara, CA, USA
Hideki Saito  Intel Corporation, Santa Clara, CA, USA
Sponsors
SIGDA: ACM Special Interest Group on Design Automation
ACM: Association for Computing Machinery
SIGBED: ACM Special Interest Group on Embedded Systems
SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 36,   Downloads (12 Months): 116,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1176254.1176298
What is a DOI?

ABSTRACT

Embedded processors have been increasingly exploiting hardware parallelism. Vector units, multiple processors or cores, hyper-threading, special-purpose accelerators such as DSPs or cryptographic engines, or a combination of the above have appeared in a number of processors. They serve to address the increasing performance requirements of modern embedded applications. How this hardware parallelism can be exploited by applications is directly related to the amount of parallelism inherent in a target application. In this paper we evaluate the performance potential of different types of parallelism, viz., true thread-level parallelism, speculative thread-level parallelism and vector parallelism, when executing loops. Applications from the industry-standard EEMBC 1.1, EEMBC 2.0 and the MiBench embedded benchmark suites are analyzed using the Intel C compiler. The results show what can be achieved today, provide upper bounds on the performance potential of different types of thread parallelism, and point out a number of issues that need to be addressed to improve performance. The latter include parallelization of libraries such as libc and design of parallel algorithms to allow maximal exploitation of parallelism. The results also point to the need for developing new benchmark suites more suitable to parallel compilation and execution.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Prakash and A. C. Parker. SOS: Synthesis of application-specific heterogeneous multiprocessor systems. Journal of Parallel and Distributed Computing, 16:338--351, 1992.
 
2
3
4
 
5
 
6
 
7
ARM11 Family. http://www.arm.com/products/CPUs/families/ARM11Family.html.
 
8
Intel® IXP2850 Network Processor. http://www.intel.com/design/network/products/npfamily/ixp2850.htm.
 
9
OMAP2420. http://focus.ti.com/general/docs/wtbu/wtbuproductcontent.tsp?templateId=6123&navigationId=11990&contentId=4671.
 
10
Intel® Multi-Core Processor Architecture Development. http://www.intel.com/cd/ids/developer/asmo-na/eng/201969.htm?page=6.
 
11
Dual-Core Intel® Xeon® Processor 7000 sequence Platform Brief. ftp://download.intel.com/products/processor/xeon/dc7kplatbrief.pdf.
 
12
The Cell Processor. http://arstechnica.com/articles/paedia/cpu/cell-1.ars.
 
13
RAMP: Research Accelerator for Multiple Processors. http://ramp.eecs.berkeley.edu/.
14
 
15
M. Flynn. Very high-speed computing systems. Proceedings of the IEEE, 54(12):1901--1909, December 1966.
 
16
S. Lundstrom and G. Barnes. A controllable MIMD architectures. In Proceedings of the 1980 International Conference on Parallel Processing, St. Charles, IL, August 1980.
 
17
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211--243, February 1993.
 
18
EEMBC. http://www.eembc.org/.
 
19
MiBench Version 1.0. http://www.eecs.umich.edu/mibench/.
 
20
OpenMP Specification, version 2.5. http://www.openmp.org/drupal/mp-documents/spec25.pdf.
 
21
C. Polychronopoulos. Loop coalescing: A compiler transformation for parallel machines In Proceedings of the 1987 International Conference on Parallel Processing, pages 235--242, August 1987.
22
23
 
24
GNU C library. http://www.gnu.org/software/libc/.
 
25
 
26
A. Kejariwal and A. Nicolau. Reading list of performance analysis, speculative execution. http://www.ics.uci.edu/~akejariw/SpeculativeExecutionReadingList.pdf.
 
27
 
28
 
29
 
30
31
 
32
SPEC: Standard Performance Evaluation Corporation. http://www.spec.org/.


Collaborative Colleagues:
Arun Kejariwal: colleagues
Alexander V. Veidenbaum: colleagues
Alexandru Nicolau: colleagues
Milind Girkarmark: colleagues
Xinmin Tian: colleagues
Hideki Saito: colleagues