ACM Home Page
Please provide us with feedback. Feedback
Tight analysis of the performance potential of thread speculation using spec CPU 2006
Full text PdfPdf (1.04 MB)
Source
Principles and Practice of Parallel Programming archive
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming table of contents
San Jose, California, USA
SESSION: Thread-level speculation table of contents
Pages: 215 - 225  
Year of Publication: 2007
ISBN:978-1-59593-602-8
Authors
Arun Kejariwal  University of California, Irvine, Irvine, CA
Xinmin Tian  Intel Corporation, Santa Clara, CA
Milind Girkar  Intel Corporation, Santa Clara, CA
Wei Li  Intel Corporation, Santa Clara, CA
Sergey Kozhukhov  Intel Corporation, Santa Clara, CA
Utpal Banerjee  Intel Corporation, Santa Clara, CA
Alexander Nicolau  University of California, Irvine, Irvine, CA
Alexander V. Veidenbaum  University of California, Irvine, Irvine, CA
Constantine D. Polychronopoulos  University of Illinois at Urbana-Champaign, Urbana-Champaign, IL
Sponsors
ACM: Association for Computing Machinery
SIGPLAN: ACM Special Interest Group on Programming Languages
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1229428.1229475
What is a DOI?

ABSTRACT

Multi-cores such as the Intel®1 Core™2 Duo processor, facilitate efficient thread-level parallel execution of ordinary programs, wherein the different threads-of-execution are mapped onto different physical processors. In this context, several techniques have been proposed for auto-parallelization of programs. Recently, thread-level speculation (TLS) has been proposed as a means to parallelize difficult-to-analyze serial codes. In general, more than one technique can be employed for parallelizing a given program. The overlapping nature of the applicability of the various techniques makes it hard to assess the intrinsic performance potential of each. In this paper, we present a tight analysis of the (unique) performance potential of both: (a) TLS in general and (b) specific types of thread-level speculation, viz., control speculation, data dependence speculation and data value speculation, for the SPEC2 CPU2006 benchmark suite in light of the various limiting factors such as the threading overhead and misspeculation penalty. To the best of our knowledge, this is the first evaluation of TLS based on SPEC CPU2006 and accounts for the aforementioned real-life con-straints. Our analysis shows that, at the innermost loop level, the upper bound on the speedup uniquely achievable via TLS with the state-of-the-art thread implementations for both SPEC CINT2006 and CFP2006 is of the order of 1%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Kejariwal and A. Nicolau. Reading list of performance analysis, speculative execution. http://www.ics.uci.edu~akejariw/SpeculativeExecutionReadingList.pdf.
 
2
S. F. Lundstrom and G. H. Barnes. A controllable MIMD architectures. In Proceedings of the 1980 International Conference on Parallel Processing, pages 19--27, St. Charles, IL, August 1980.
3
 
4
SPEC CPU2006. http://www.spec.org/cpu2006.
 
5
G. M. Amdahl. Validity of the single processor approachtoachieving large scale computing capabilities. In AFIPS Conference Proceedings, pages 483--485, 1967.
 
6
Open Research Compiler for Itanium TM Processor Family. http://ipf-orc.sourceforge.net/.
 
7
GCC, the GNU Compiler Collection. http://gcc.gnu.org/.
8
9
 
10
D. J. Quinlan, M. Schordan, Q. Yi, and B. R. de Supinski. Semantic-driven parallelization of loops operating on user-defined containers. pages 524--538, College Station, TX, October 2003.
 
11
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211--243, February 1993.
12
13
 
14
A. Nicolau. Percolation scheduling. In Proceedings of the 1985 International Conference on Parallel Processing, August 1985.
 
15
 
16
A. Aiken and A. Nicolau. Perfect pipelining: A new loop parallelization technique. Technical Report 87--873, Dept. of Computer Science, Cornell University, 1987.
 
17
 
18
19
 
20
 
21
R. Gerber, A. J. C. Bik, K. B. Smith, and X. Tian. The Software Optimization Cookbook, Second Edition. Intel Press, 2006.
22
23
 
24
 
25
 
26
Intel R VTune TM Performance Analyzer 8.0 for Windows. http://www.intel.com/cd/software/products/asmo-na/eng/vtune/219898.htm.
 
27
U. Drepper. The Native POSIX Thread Library for Linux. http://people.redhat.com/drepper/nptl-design.pdf, February 2005.
 
28
EPCC OpenMP Microbenchmarks. http://www.epcc.ed.ac.uk/research/openmpbench/openmp index.html.
 
29
 
30
 
31
 
32
 
33
34
35


Collaborative Colleagues:
Arun Kejariwal: colleagues
Xinmin Tian: colleagues
Milind Girkar: colleagues
Wei Li: colleagues
Sergey Kozhukhov: colleagues
Utpal Banerjee: colleagues
Alexander Nicolau: colleagues
Alexander V. Veidenbaum: colleagues
Constantine D. Polychronopoulos: colleagues