| Decoupled store completion/silent deterministic replay: enabling scalable data memory for CPR/CFP processors |
| Full text |
Pdf
(442 KB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Load and stores
table of contents
Pages 245-254
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Andrew Hilton
|
University of Pennsylvania, Philadelphia, PA, USA
|
|
Amir Roth
|
University of Pennsylvania, Philadelphia, PA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 24, Downloads (12 Months): 69, Citation Count: 0
|
|
|
ABSTRACT
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction window that scales to tolerate last-level cache misses. CPR/CFP scale the register file by aggressively reclaiming the destination registers of many in-flight instructions. However, an analogous mechanism does not exist for stores and loads. As the window expands, CPR/CFP processors must track all in-flight stores and loads to support forwarding and detect memory ordering violations. The previously-described SVW (Store Vulnerability Window) and SQIP (Store Queue Index Prediction) schemes provide scalable, non-associative load and store queues, respectively. However, they don't work smoothly in a CPR/CFP context. SVW/SQIP rely on the ability to dynamically stall some loads until a specific older store writes to the cache. Enforcing this serialization in CPR/CFP is expensive if the load and store are in the same checkpoint. We introduce two complementary procedures that implement this serialization efficiently. Decoupled Store Completion (DSC) allows stores to write to the cache before the enclosing checkpoint completes execution. Silent Deterministic Replay (SDR) supports mis-speculation recovery in the presence of DSC by replaying loads older than completed stores using values from the load queue. The combination of DSC and SDR enables an SVW/SQIP based CPR/CFP memory system that outperforms previous designs while occupying less area.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
Luis Ceze , James Tuck , Pablo Montesinos , Josep Torrellas, BulkSC: bulk enforcement of sequential consistency, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
 |
6
|
|
 |
7
|
Chris Gniady , Babak Falsafi , T. N. Vijaykumar, Is SC + ILP = RC?, Proceedings of the 26th annual international symposium on Computer architecture, p.162-171, May 01-04, 1999, Atlanta, Georgia, United States
|
| |
8
|
A. Hilton, S. Nagarakatte, and A. Roth. iCFP: Tolerating All-Level Cache Misses in In-Order Pipelines. In Proc. 15th Intl. Symp. on High Performance Computer Architecture, pages 431--442, Feb. 2009.
|
| |
9
|
José F. Martínez , Jose Renau , Michael C. Huang , Milos Prvulovic , Josep Torrellas, Cherry: checkpointed early resource recycling in out-of-order microprocessors, Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, November 18-22, 2002, Istanbul, Turkey
|
| |
10
|
P. Michaud. A PPM-like, Tag-Based Branch Predictor. Journal of Instruction Level Parallelism, 7(1):1--10, Apr. 2005.
|
| |
11
|
|
 |
12
|
|
| |
13
|
Miquel Pericas , Adrian Cristal , Francisco J. Cazorla , Ruben Gonzalez , Daniel A. Jimenez , Mateo Valero, A Flexible Heterogeneous Multi-Core Architecture, Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, p.13-24, September 15-19, 2007
[doi> 10.1109/PACT.2007.5]
|
| |
14
|
M. Pericas, R. Gonzalez, D. Jimenez, and M. Valero. A Decoupled KILO-Instruction Processor. In Proc. 12th Intl. Symp. on High Performance Computer Architecture, pages 53--64, Feb. 2006.
|
 |
15
|
Parthasarathy Ranganathan , Vijay S. Pai , Sarita V. Adve, Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, p.199-210, June 23-25, 1997, Newport, Rhode Island, United States
[doi> 10.1145/258492.258512]
|
 |
16
|
|
| |
17
|
A. Roth. Store Vulnerability Window (SVW): A Filter and Potential Replacement for Load Re-Execution. Journal of Instruction Level Parallelism, 8, 2006. (http://www.jilp.org/vol8/).
|
| |
18
|
|
 |
19
|
Elham Safi , Patrick Akl , Andreas Moshovos , Andreas Veneris , Aggeliki Arapoyianni, On the latency, energy and area of checkpointed, superscalar register alias tables, Proceedings of the 2007 international symposium on Low power electronics and design, August 27-29, 2007, Portland, OR, USA
[doi> 10.1145/1283780.1283863]
|
 |
20
|
Simha Sethumadhavan , Franziska Roesner , Joel S. Emer , Doug Burger , Stephen W. Keckler, Late-binding: enabling unordered load-store queues, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
21
|
|
| |
22
|
|
 |
23
|
Srikanth T. Srinivasan , Ravi Rajwar , Haitham Akkary , Amit Gandhi , Mike Upton, Continual flow pipelines, Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, October 07-13, 2004, Boston, MA, USA
|
| |
24
|
|
| |
25
|
D. Tarjan, S. Thoziyoor, and N. Jouppi. CACTI 4.0. Technical Report HPL-2006-86, Hewlett-Packard Labs Technical Report, Jun. 2006.
|
 |
26
|
|
 |
27
|
Thomas F. Wenisch , Anastasia Ailamaki , Babak Falsafi , Andreas Moshovos, Mechanisms for store-wait-free multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
 |
28
|
Adi Yoaz , Mattan Erez , Ronny Ronen , Stephan Jourdan, Speculation techniques for improving load related instruction scheduling, Proceedings of the 26th annual international symposium on Computer architecture, p.42-53, May 01-04, 1999, Atlanta, Georgia, United States
|
|