| Boosting single-thread performance in multi-core systems through fine-grain multi-threading |
| Full text |
Pdf
(667 KB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Speculative threading and parallelization
table of contents
Pages 474-483
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Carlos Madriles
|
Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Pedro López
|
Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Josep M. Codina
|
Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Enric Gibert
|
Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Fernando Latorre
|
Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Alejandro Martinez
|
Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Raúl Martinez
|
Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
Antonio Gonzalez
|
Intel Barcelona Research Center, Intel Labs - Universitat Politecnica de Catalunya, Barcelona, Spain
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 146, Downloads (12 Months): 377, Citation Count: 0
|
|
|
ABSTRACT
Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's law. In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations. The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
| |
4
|
|
 |
5
|
|
| |
6
|
J. D. Collins and D. M. Tullsen, Clustered Multithreaded Architectures - Pursuing Both Ipc and Cycle Time, in Int. Parallel and Distributed Processing Symp., April 2004
|
 |
7
|
Jamison D. Collins , Hong Wang , Dean M. Tullsen , Christopher Hughes , Yong-Fong Lee , Dan Lavery , John P. Shen, Speculative precomputation: long-range prefetching of delinquent loads, Proceedings of the 28th annual international symposium on Computer architecture, p.14-25, June 30-July 04, 2001, Göteborg, Sweden
|
 |
8
|
Carlos García Quiñones , Carlos Madriles , Jesús Sánchez , Pedro Marcuello , Antonio González , Dean M. Tullsen, Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices, Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, June 12-15, 2005, Chicago, IL, USA
|
| |
9
|
|
 |
10
|
Lance Hammond , Mark Willey , Kunle Olukotun, Data speculation support for a chip multiprocessor, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.58-69, October 02-07, 1998, San Jose, California, United States
|
 |
11
|
Engin Ipek , Meyrem Kirman , Nevin Kirman , Jose F. Martinez, Core fusion: accommodating software diversity in chip multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
 |
12
|
|
| |
13
|
J. A. Kahle , M. N. Day , H. P. Hofstee , C. R. Johns , T. R. Maeurer , D. Shippy, Introduction to the cell multiprocessor, IBM Journal of Research and Development, v.49 n.4/5, p.589-604, July 2005
|
 |
14
|
|
| |
15
|
B. Kernighan, and S. Lin, An Efficient Heuristic Procedure for Partitioning of Electrical Circuits, in Bell System Technical Journal, 1970
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
20
|
A. Mendelson, J, Mandelblat, S. Gochman, A. Shemer, R. Chabukswar, E. Niemeyer, A. Kumar, "CMP Implementation in Systems Based on the Intel® CoreTM Duo Processor", in Intel Technology Journal, Volume 10, Issue 2, 2006
|
| |
21
|
Taku Ohsawa , Masamichi Takagi , Shoji Kawahara , Satoshi Matsushita, Pinot: Speculative Multi-threading Processor Architecture Exploiting Parallelism over a Wide Range of Granularities, Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, p.81-92, November 12-16, 2005, Barcelona, Spain
[doi> 10.1109/MICRO.2005.26]
|
 |
22
|
Milos Prvulovic , María Jesús Garzarán , Lawrence Rauchwerger , Josep Torrellas, Removing architectural bottlenecks to the scalability of speculative parallelization, Proceedings of the 28th annual international symposium on Computer architecture, p.204-215, June 30-July 04, 2001, Göteborg, Sweden
|
| |
23
|
S. Thoziyoor, N. Muralimanohar, J. Ahn, and N. P. Jouppi, CACTI 5.1, Technical Report HPL-2008-20, HP Labs.
|
| |
24
|
Neil Vachharajani , Ram Rangan , Easwaran Raman , Matthew J. Bridges , Guilherme Ottoni , David I. August, Speculative Decoupled Software Pipelining, Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, p.49-59, September 15-19, 2007
[doi> 10.1109/PACT.2007.66]
|
 |
25
|
|
| |
26
|
|
| |
27
|
|
|