| Spatio-temporal memory streaming |
| Full text |
Pdf
(638 KB)
|
Source
|
International Symposium on Computer Architecture
archive
Proceedings of the 36th annual international symposium on Computer architecture
table of contents
Austin, TX, USA
SESSION: Prefetching and streaming
table of contents
Pages 69-80
Year of Publication: 2009
ISBN:978-1-60558-526-0
Also published in ...
|
|
Authors
|
|
Stephen Somogyi
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Thomas F. Wenisch
|
University of Michigan, Ann Arbor, MI, USA
|
|
Anastasia Ailamaki
|
Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland
|
|
Babak Falsafi
|
Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 50, Downloads (12 Months): 160, Citation Count: 1
|
|
|
ABSTRACT
Recent research advocates memory streaming techniques to alleviate the performance bottleneck caused by the high latencies of off-chip memory accesses. Temporal memory streaming replays previously observed miss sequences to eliminate long chains of dependent misses. Spatial memory streaming predicts repetitive data layout patterns within fixed-size memory regions. Because each technique targets a different subset of misses, their effectiveness varies across workloads and each leaves a significant fraction of misses unpredicted. In this paper, we propose Spatio-Temporal Memory Streaming (STeMS) to exploit the synergy between spatial and temporal streaming. We observe that the order of spatial accesses repeats both within and across regions. STeMS records and replays the temporal sequence of region accesses and uses spatial relationships within each region to dynamically reconstruct a predicted total miss order. Using trace-driven and cycle-accurate simulation across a suite of commercial workloads, we demonstrate that with similar implementation complexity as temporal streaming, STeMS achieves equal or higher coverage than spatial or temporal memory streaming alone, and improves performance by 31%, 3%, and 18% over stride, spatial, and temporal prediction, respectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Ioana Burcea , Stephen Somogyi , Andreas Moshovos , Babak Falsafi, Predictor virtualization, Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, March 01-05, 2008, Seattle, WA, USA
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
Craig G. Nevill-Manning and Ian H. Witten. Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research, 7, 1997.
|
| |
10
|
Richard A. Hankins , Trung Diep , Murali Annavaram , Brian Hirano , Harald Eri , Hubert Nueckel , John P. Shen, Scaling and Charact rizing Database Workloads: Bridging the Gap between Research and Practice, Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture, p.151, December 03-05, 2003
|
| |
11
|
Nikos Hardavellas, Ippokratis Pandis, Ryan Johnson, Naju G. Mancheril, Anastassia Ailamaki, and Babak Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In Proceedings of the 3rd Conference on Innovative Data Systems Research, Jan. 2007.
|
| |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Jack L. Lo , Luiz André Barroso , Susan J. Eggers , Kourosh Gharachorloo , Henry M. Levy , Sujay S. Parekh, An analysis of database workload performance on simultaneous multithreaded processors, Proceedings of the 25th annual international symposium on Computer architecture, p.39-50, June 27-July 02, 1998, Barcelona, Spain
|
| |
17
|
|
| |
18
|
Minglong Shao , Anastassia Ailamaki , Babak Falsafi, DBmbench: fast and accurate database workload representation on modern microarchitecture, Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, p.254-267, October 17-20, 2005, Toranto, Ontario, Canada
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
 |
23
|
Thomas F. Wenisch , Anastasia Ailamaki , Babak Falsafi , Andreas Moshovos, Mechanisms for store-wait-free multiprocessors, Proceedings of the 34th annual international symposium on Computer architecture, June 09-13, 2007, San Diego, California, USA
|
| |
24
|
Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Temporal streams in commercial server applications. In Proceedings of the International Symposium on Workload Characterization, Sep. 2008.
|
| |
25
|
Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, and Andreas Moshovos. Practical off-chip meta-data for address-correlated prefetching. In Proceedings of the 15th Symposium on High-Performance Computer Architecture, Feb. 2009.
|
 |
26
|
Thomas F. Wenisch , Stephen Somogyi , Nikolaos Hardavellas , Jangwoo Kim , Anastassia Ailamaki , Babak Falsafi, Temporal Streaming of Shared Memory, Proceedings of the 32nd annual international symposium on Computer Architecture, p.222-233, June 04-08, 2005
|
| |
27
|
Thomas F. Wenisch , Roland E. Wunderlich , Michael Ferdman , Anastassia Ailamaki , Babak Falsafi , James C. Hoe, SimFlex: Statistical Sampling of Computer System Simulation, IEEE Micro, v.26 n.4, p.18-31, July 2006
[doi> 10.1109/MM.2006.79]
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
|