| Tight analysis of the performance potential of thread speculation using spec CPU 2006 |
| Full text |
Pdf
(1.04 MB)
|
Source
|
Principles and Practice of Parallel Programming
archive
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
table of contents
San Jose, California, USA
SESSION: Thread-level speculation
table of contents
Pages: 215 - 225
Year of Publication: 2007
ISBN:978-1-59593-602-8
|
|
Authors
|
|
Arun Kejariwal
|
University of California, Irvine, Irvine, CA
|
|
Xinmin Tian
|
Intel Corporation, Santa Clara, CA
|
|
Milind Girkar
|
Intel Corporation, Santa Clara, CA
|
|
Wei Li
|
Intel Corporation, Santa Clara, CA
|
|
Sergey Kozhukhov
|
Intel Corporation, Santa Clara, CA
|
|
Utpal Banerjee
|
Intel Corporation, Santa Clara, CA
|
|
Alexander Nicolau
|
University of California, Irvine, Irvine, CA
|
|
Alexander V. Veidenbaum
|
University of California, Irvine, Irvine, CA
|
|
Constantine D. Polychronopoulos
|
University of Illinois at Urbana-Champaign, Urbana-Champaign, IL
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): n/a, Downloads (12 Months): n/a, Citation Count: 1
|
|
|
ABSTRACT
Multi-cores such as the Intel®1 Core™2 Duo processor, facilitate efficient thread-level parallel execution of ordinary programs, wherein the different threads-of-execution are mapped onto different physical processors. In this context, several techniques have been proposed for auto-parallelization of programs. Recently, thread-level speculation (TLS) has been proposed as a means to parallelize difficult-to-analyze serial codes. In general, more than one technique can be employed for parallelizing a given program. The overlapping nature of the applicability of the various techniques makes it hard to assess the intrinsic performance potential of each. In this paper, we present a tight analysis of the (unique) performance potential of both: (a) TLS in general and (b) specific types of thread-level speculation, viz., control speculation, data dependence speculation and data value speculation, for the SPEC2 CPU2006 benchmark suite in light of the various limiting factors such as the threading overhead and misspeculation penalty. To the best of our knowledge, this is the first evaluation of TLS based on SPEC CPU2006 and accounts for the aforementioned real-life con-straints. Our analysis shows that, at the innermost loop level, the upper bound on the speedup uniquely achievable via TLS with the state-of-the-art thread implementations for both SPEC CINT2006 and CFP2006 is of the order of 1%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Kejariwal and A. Nicolau. Reading list of performance analysis, speculative execution. http://www.ics.uci.edu~akejariw/SpeculativeExecutionReadingList.pdf.
|
| |
2
|
S. F. Lundstrom and G. H. Barnes. A controllable MIMD architectures. In Proceedings of the 1980 International Conference on Parallel Processing, pages 19--27, St. Charles, IL, August 1980.
|
 |
3
|
Arun Kejariwal , Xinmin Tian , Wei Li , Milind Girkar , Sergey Kozhukhov , Hideki Saito , Utpal Banerjee , Alexandru Nicolau , Alexander V. Veidenbaum , Constantine D. Polychronopoulos, On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
[doi> 10.1145/1183401.1183407]
|
| |
4
|
SPEC CPU2006. http://www.spec.org/cpu2006.
|
| |
5
|
G. M. Amdahl. Validity of the single processor approachtoachieving large scale computing capabilities. In AFIPS Conference Proceedings, pages 483--485, 1967.
|
| |
6
|
Open Research Compiler for Itanium TM Processor Family. http://ipf-orc.sourceforge.net/.
|
| |
7
|
GCC, the GNU Compiler Collection. http://gcc.gnu.org/.
|
 |
8
|
|
 |
9
|
|
| |
10
|
D. J. Quinlan, M. Schordan, Q. Yi, and B. R. de Supinski. Semantic-driven parallelization of loops operating on user-defined containers. pages 524--538, College Station, TX, October 2003.
|
| |
11
|
U. Banerjee, R. Eigenmann, A. Nicolau, and D. Padua. Automatic program parallelization. Proceedings of the IEEE, 81(2):211--243, February 1993.
|
 |
12
|
Nicholas Mitchell , Larry Carter , Jeanne Ferrante , Dean Tullsen, ILP versus TLP on SMT, Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM), p.37-es, November 14-19, 1999, Portland, Oregon, United States
[doi> 10.1145/331532.331569]
|
 |
13
|
|
| |
14
|
A. Nicolau. Percolation scheduling. In Proceedings of the 1985 International Conference on Parallel Processing, August 1985.
|
| |
15
|
|
| |
16
|
A. Aiken and A. Nicolau. Perfect pipelining: A new loop parallelization technique. Technical Report 87--873, Dept. of Computer Science, Cornell University, 1987.
|
| |
17
|
|
| |
18
|
|
 |
19
|
Jose Renau , James Tuck , Wei Liu , Luis Ceze , Karin Strauss , Josep Torrellas, Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
[doi> 10.1145/1088149.1088173]
|
| |
20
|
|
| |
21
|
R. Gerber, A. J. C. Bik, K. B. Smith, and X. Tian. The Software Optimization Cookbook, Second Edition. Intel Press, 2006.
|
 |
22
|
|
 |
23
|
Jin Lin , Tong Chen , Wei-Chung Hsu , Pen-Chung Yew , Roy Dz-Ching Ju , Tin-Fook Ngai , Sun Chan, A compiler framework for speculative analysis and optimizations, Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, June 09-11, 2003, San Diego, California, USA
|
| |
24
|
|
| |
25
|
|
| |
26
|
Intel R VTune TM Performance Analyzer 8.0 for Windows. http://www.intel.com/cd/software/products/asmo-na/eng/vtune/219898.htm.
|
| |
27
|
U. Drepper. The Native POSIX Thread Library for Linux. http://people.redhat.com/drepper/nptl-design.pdf, February 2005.
|
| |
28
|
EPCC OpenMP Microbenchmarks. http://www.epcc.ed.ac.uk/research/openmpbench/openmp index.html.
|
| |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
 |
34
|
|
 |
35
|
|
|