| Stream chaining: exploiting multiple levels of correlation in data prefetching |
| Full text |
Pdf
(1.43 MB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Prefetching and streaming
table of contents
Pages 81-92
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Pedro Diaz
|
University of Edinburgh, Edinburgh, United Kingdom
|
|
Marcelo Cintra
|
University of Edinburgh, Edinburgh, United Kingdom
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 88, Downloads (12 Months): 188, Citation Count: 1
|
|
|
ABSTRACT
Data prefetching has long been an important technique to amortize the effects of the memory wall, and is likely to remain so in the current era of multi-core systems. Most prefetchers operate by identifying patterns and correlations in the miss address stream. Separating streams according to the memory access instruction that generates the misses is an effective way of filtering out spurious addresses from predictable streams. On the other hand, by localizing streams based on the memory access instructions, such prefetchers both lose the complete time sequence information of misses and can only issue prefetches for a single memory access instruction at a time. This paper proposes a novel class of prefetchers based on the idea of linking various localized streams into predictable chains of missing memory access instructions such that the prefetcher can issue prefetches along multiple streams. In this way the prefetcher is not limited to prefetching deeply for a single missing memory access instruction but can instead adaptively prefetch for other memory access instructions closer in time. Experimental results show that the proposed prefetcher consistently achieves better performance than a state-of-the-art prefetcher -- 10% on average, being only outperformed in very few cases and then by only 2%, and outperforming that prefetcher by as much as 55% -- while consuming the same amount of memory bandwidth.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
K. Albayraktaroglu , A. Jaleel , Xue Wu , M. Franklin , B. Jacob , Chau-Wen Tseng , D. Yeung, BioBench: A Benchmark Suite of Bioinformatics Applications, Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005, p.2-9, March 20-22, 2005
[doi> 10.1109/ISPASS.2005.1430554]
|
 |
2
|
|
| |
3
|
J. Doweck. "Inside Intel Core Microarchitecture and Smart Memory Access." White paper, Intel Corporation, 2006. http://download.intel.com/technology/architecture/sma.pdf.
|
 |
4
|
John W. C. Fu , Janak H. Patel , Bob L. Janssens, Stride directed prefetching in scalar processors, Proceedings of the 25th annual international symposium on Microarchitecture, p.102-110, December 01-04, 1992, Portland, Oregon, United States
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
H. Q. Le , W. J. Starke , J. S. Fields , F. P. O'Connell , D. Q. Nguyen , B. J. Ronchetti , W. M. Sauer , E. M. Schwarz , M. T. Vaden, IBM POWER6 microarchitecture, IBM Journal of Research and Development, v.51 n.6, p.639-662, November 2007
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator. http://sesc.sf.net
|
| |
17
|
P. Shivakumar and N. P. Jouppi. CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. WRL Research Report, 2001/2.
|
 |
18
|
|
 |
19
|
|
 |
20
|
Stephen Somogyi , Thomas F. Wenisch , Anastasia Ailamaki , Babak Falsafi, Spatio-temporal memory streaming, Proceedings of the 36th annual international symposium on Computer architecture, June 20-24, 2009, Austin, TX, USA
|
| |
21
|
|
 |
22
|
Thomas F. Wenisch , Stephen Somogyi , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Temporal Streaming of Shared Memory, Proceedings of the 32nd annual international symposium on Computer Architecture, p.222-233, June 04-08, 2005
|
| |
23
|
|
|