|
ABSTRACT
Prefetching is a widely used technique in modern data storage systems. We study the most widely used class of prefetching algorithms known as sequential prefetching. There are two problems that plague the state-of-the-art sequential prefetching algorithms: (i) cache pollution, which occurs when prefetched data replaces more useful prefetched or demand-paged data, and (ii) prefetch wastage, which happens when prefetched data is evicted from the cache before it can be used. A sequential prefetching algorithm can have a fixed or adaptive degree of prefetch and can be either synchronous (when it can prefetch only on a miss) or asynchronous (when it can also prefetch on a hit). To capture these distinctions we define four classes of prefetching algorithms: fixed synchronous (FS), fixed asynchronous (FA), adaptive synchronous (AS), and adaptive asynchronous (AsynchA). We find that the relatively unexplored class of AsynchA algorithms is in fact the most promising for sequential prefetching. We provide a first formal analysis of the criteria necessary for optimal throughput when using an AsynchA algorithm in a cache shared by multiple steady sequential streams. We then provide a simple implementation called AMP (adaptive multistream prefetching) which adapts accordingly, leading to near-optimal performance for any kind of sequential workload and cache size. Our experimental setup consisted of an IBM xSeries 345 dual processor server running Linux using five SCSI disks. We observe that AMP convincingly outperforms all the contending members of the FA, FS, and AS classes for any number of streams and over all cache sizes. As anecdotal evidence, in an experiment with 100 concurrent sequential streams and varying cache sizes, AMP surpasses the FA, FS, and AS algorithms by 29--172%, 12--24%, and 21--210%, respectively, while outperforming OBL by a factor of 8. Even for complex workloads like SPC1-Read, AMP is consistently the best-performing algorithm. For the SPC2 video-on-demand workload, AMP can sustain at least 25% more streams than the next best algorithm. Furthermore, for a workload consisting of short sequences, where optimality is more elusive, AMP is able to outperform all the other contenders in overall performance. Finally, we implemented AMP in the state-of-the-art enterprise storage system, the IBM system storage DS8000 series. We demonstrated that AMP dramatically improves performance for common sequential and batch processing workloads and delivers up to a twofold increase in the sequential read capacity.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Pei Cao , Edward W. Felten , Anna R. Karlin , Kai Li, A study of integrated prefetching and caching strategies, Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.188-197, May 15-19, 1995, Ottawa, Ontario, Canada
|
| |
2
|
Chen, T. -F. and Baer, J. -L. 1992. Reducing memory latency via non-blocking and prefetching caches. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS). SIGPLAN Not. 27, 9, 51--61.
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
IBM Redbooks. 2007. IBM System Storage DS8000 Series: Architecture and Implementation. http://www.redbooks.ibm.com/redbooks/pdfs/sg246786.pdf.
|
| |
17
|
Jain, P., Devadas, S., and Rudolph, L. 2001. Controlling cache pollution in prefetching with software-assisted cache replacement. Tech. Rep. CSG-462. Massachusetts Institute of Technology.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
Tracy Kimbrel , Andrew Tomkins , R. Hugo Patterson , Brian Bershad , Pei Cao , Edward W. Felten , Garth A. Gibson , Anna R. Karlin , Kai Li, A trace-driven comparison of algorithms for parallel prefetching and caching, Proceedings of the second USENIX symposium on Operating systems design and implementation, p.19-34, October 29-November 01, 1996, Seattle, Washington, United States
|
| |
22
|
|
| |
23
|
|
| |
24
|
Lee, R. L., Yew, P. -C., and Lawrie, D. H. 1987. Data prefetching in shared memory multiprocessors. In Proceedings of the International Conference on Parallel Processing (ICPP). 28--31.
|
| |
25
|
|
| |
26
|
Mikko H. Lipasti , William J. Schmidt , Steven R. Kunkel , Robert R. Roediger, SPAID: software prefetching in pointer- and call-intensive environments, Proceedings of the 28th annual international symposium on Microarchitecture, p.231-236, November 29-December 01, 1995, Ann Arbor, Michigan, United States
|
 |
27
|
|
| |
28
|
McNutt, B. and Johnson, S. 2001. A standard test of I/O cache. In Proceedings of the Computer Measurements Group's International Conference.
|
| |
29
|
Metcalf, C. 1993. Data prefetching: A cost/performance analysis. Area Exam, MIT, October.
|
| |
30
|
|
 |
31
|
R. H. Patterson , G. A. Gibson , E. Ginting , D. Stodolsky , J. Zelenka, Informed prefetching and caching, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.79-95, December 03-06, 1995, Copper Mountain, Colorado, United States
|
| |
32
|
|
 |
33
|
|
 |
34
|
Amir Roth , Andreas Moshovos , Gurindar S. Sohi, Dependence based prefetching for linked data structures, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.115-126, October 02-07, 1998, San Jose, California, United States
|
 |
35
|
|
| |
36
|
Storage Performance Council. 2006a. SPC Benchmark-1: Specification, version 1.10.1. www.storageperformance.org.
|
| |
37
|
Storage Performance Council. 2006b. SPC Benchmark-2: Specification, version 1.2. www.storageperformance.org.
|
| |
38
|
|
 |
39
|
|
INDEX TERMS
Primary Classification:
B.
Hardware
B.3
MEMORY STRUCTURES
B.3.2
Design Styles
Subjects:
Cache memories
Additional Classification:
C.
Computer Systems Organization
C.4
PERFORMANCE OF SYSTEMS
Subjects:
Design studies
D.
Software
D.4
OPERATING SYSTEMS
D.4.2
Storage Management
Subjects:
Main memory
D.4.8
Performance
Subjects:
Measurements
General Terms:
Algorithms,
Design,
Experimentation,
Performance
Keywords:
Adaptive prefetching,
asynchronous prefetching,
cache pollution,
degree of prefetch,
fixed prefetching,
multistream read,
optimal prefetching,
prefetch wastage,
prestaging,
sequential prefetching,
synchronous prefetching,
trigger distance
|