| Profiler and compiler assisted adaptive I/O prefetching for shared storage caches |
| Full text |
Pdf
(1.06 MB)
|
Source
|
PACT
archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
table of contents
Toronto, Ontario, Canada
SESSION: I/O optimizations
table of contents
Pages 112-121
Year of Publication: 2008
ISBN:978-1-60558-282-5
|
|
Authors
|
|
Seung Woo Son
|
Pennsylvania State University, University Park, PA, USA
|
|
Sai Prashanth Muralidhara
|
Pennsylvania State University, University Park, PA, USA
|
|
Ozcan Ozturk
|
Bilkent University, Ankara, Turkey
|
|
Mahmut Kandemir
|
Pennsylvania State University, University Park, PA, USA
|
|
Ibrahim Kolcu
|
University of Manchester, Manchester, United Kngdm
|
|
Mustafa Karakoy
|
Imperial College, London, United Kngdm
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 114, Citation Count: 0
|
|
|
ABSTRACT
I/O prefetching has been employed in the past as one of the mechanisms to hide large disk latencies. However, I/O prefetching in parallel applications is problematic when multiple CPUs share the same set of disks due to the possibility that prefetches from different CPUs can interact on shared memory caches in the I/O nodes in complex and unpredictable ways. In this paper, we (i) quantify the impact of compiler-directed I/O prefetching - developed originally in the context of sequential execution - on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings benefits, its effectiveness reduces significantly as the number of CPUs is increased; (ii) identify inter-CPU misses due to harmful prefetches as one of the main sources for this reduction in performance with the increased number of CPUs; and (iii) propose and experimentally evaluate a profiler and compiler assisted adaptive I/O prefetching scheme targeting shared storage caches. The proposed scheme obtains inter-thread data sharing information using profiling and, based on the captured data sharing patterns, divides the threads into clusters and assigns a separate (customized) I/O prefetcher thread for each cluster. In our approach, the compiler generates the I/O prefetching threads automatically. We implemented this new I/O prefetching scheme using a compiler and the PVFS file system running on Linux, and the empirical data collected clearly underline the importance of adapting I/O prefetching based on program phases. Specifically, our proposed scheme improves performance, on average, by 19.9%, 11.9% and 10.3% over the cases without I/O prefetching, with independent I/O prefetching (each CPU is performing compiler-directed I/O prefetching independently), and with one CPU prefetching (one CPU is reserved for prefetching on behalf of others), respectively, when 8 CPUs are used.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Andrew Tomkins , R. Hugo Patterson , Garth Gibson, Informed multi-process prefetching and caching, Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, p.100-114, June 15-18, 1997, Seattle, Washington, United States
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
C. Jung et al. Helper Thread Prefetching for Loosely-Coupled Multiprocessor Systems. In IPDPS, 2006.
|
 |
7
|
|
| |
8
|
F. J. Corbato. A Paging Experiment with the Multics System. Technical Report MIT Project MAC Reort MAC-M-384, May 1968.
|
 |
9
|
Donghee Lee , Jongmoo Choi , Jong-Hun Kim , Sam H. Noh , Sang Lyul Min , Yookun Cho , Chong Sang Kim, On the existence of a spectrum of policies that subsumes the least recently used (LRU) and least frequently used (LFU) policies, ACM SIGMETRICS Performance Evaluation Review, v.27 n.1, p.134-143, June 1999
|
| |
10
|
|
| |
11
|
|
| |
12
|
M. D. et al. Cooperative Caching: Using Remote Client Memory to Improve File System Performance. In OSDI, pages 267--280, 1994.
|
 |
13
|
Pei Cao , Edward W. Felten , Anna R. Karlin , Kai Li, A study of integrated prefetching and caching strategies, Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, p.188-197, May 15-19, 1995, Ottawa, Ontario, Canada
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
|
 |
19
|
|
| |
20
|
D. Kim and D. Yeung. Design and Evaluation of Compiler Algorithms for Pre-Execution. In ASPLOS, pages 159--170, 2002.
|
| |
21
|
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Philip H. Carns , Walter B. Ligon, III , Robert B. Ross , Rajeev Thakur, PVFS: a parallel file system for linux clusters, Proceedings of the 4th annual Linux Showcase & Conference, p.28-28, October 10-14, 2000, Atlanta, Georgia
|
| |
26
|
|
 |
27
|
R. H. Patterson , G. A. Gibson , E. Ginting , D. Stodolsky , J. Zelenka, Informed prefetching and caching, Proceedings of the fifteenth ACM symposium on Operating systems principles, p.79-95, December 03-06, 1995, Copper Mountain, Colorado, United States
|
 |
28
|
Robert P. Wilson , Robert S. French , Christopher S. Wilson , Saman P. Amarasinghe , Jennifer M. Anderson , Steve W. K. Tjiang , Shih-Wei Liao , Chau-Wen Tseng , Mary W. Hall , Monica S. Lam , John L. Hennessy, SUIF: an infrastructure for research on parallelizing and optimizing compilers, ACM SIGPLAN Notices, v.29 n.12, p.31-37, Dec. 1994
[doi> 10.1145/193209.193217]
|
| |
29
|
|
| |
30
|
Song Jiang , Xiaoning Ding , Feng Chen , Enhua Tan , Xiaodong Zhang, DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality, Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, p.8-8, December 13-16, 2005, San Francisco, CA
|
 |
31
|
Steve S.W. Liao , Perry H. Wang , Hong Wang , Gerolf Hoflehner , Daniel Lavery , John P. Shen, Post-pass binary adaptation for software-based speculative precomputation, Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, June 17-19, 2002, Berlin, Germany
|
 |
32
|
Todd C. Mowry , Monica S. Lam , Anoop Gupta, Design and evaluation of a compiler algorithm for prefetching, Proceedings of the fifth international conference on Architectural support for programming languages and operating systems, p.62-73, October 12-15, 1992, Boston, Massachusetts, United States
|
 |
33
|
Todd C. Mowry , Angela K. Demke , Orran Krieger, Automatic compiler-inserted I/O prefetching for out-of-core applications, Proceedings of the second USENIX symposium on Operating systems design and implementation, p.3-17, October 29-November 01, 1996, Seattle, Washington, United States
|
 |
34
|
Tracy Kimbrel , Andrew Tomkins , R. Hugo Patterson , Brian Bershad , Pei Cao , Edward W. Felten , Garth A. Gibson , Anna R. Karlin , Kai Li, A trace-driven comparison of algorithms for parallel prefetching and caching, Proceedings of the second USENIX symposium on Operating systems design and implementation, p.19-34, October 29-November 01, 1996, Seattle, Washington, United States
|
| |
35
|
|
| |
36
|
|
 |
37
|
|
| |
38
|
|
| |
39
|
P. Wong and R. F. V. derWijngaart. NAS Parallel Benchmarks I/O Version 2.4. Technical Report NAS-03-002, NASA Advanced Supercomputing Division, January 2003.
|
| |
40
|
|
| |
41
|
Xiaoning Ding , Song Jiang , Feng Chen , Kei Davis , Xiaodong Zhang, DiskSeen: exploiting disk layout and access history to enhance I/O prefetch, 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference, p.1-14, June 17-22, 2007, Santa Clara, CA
|
| |
42
|
Xuhui Li , Ashraf Aboulnaga , Kenneth Salem , Aamer Sachedina , Shaobo Gao, Second-tier cache management using write hints, Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, p.9-9, December 13-16, 2005, San Francisco, CA
|
| |
43
|
Z. Chen et al. Eviction-Based Cache Placement for Storage Caches. In USENIX, pages 269--281, 2003.
|
|