ACM Home Page
Please provide us with feedback. Feedback
Profiler and compiler assisted adaptive I/O prefetching for shared storage caches
Full text PdfPdf (1.06 MB)
Source
PACT archive
Proceedings of the 17th international conference on Parallel architectures and compilation techniques table of contents
Toronto, Ontario, Canada
SESSION: I/O optimizations table of contents
Pages 112-121  
Year of Publication: 2008
ISBN:978-1-60558-282-5
Authors
Seung Woo Son  Pennsylvania State University, University Park, PA, USA
Sai Prashanth Muralidhara  Pennsylvania State University, University Park, PA, USA
Ozcan Ozturk  Bilkent University, Ankara, Turkey
Mahmut Kandemir  Pennsylvania State University, University Park, PA, USA
Ibrahim Kolcu  University of Manchester, Manchester, United Kngdm
Mustafa Karakoy  Imperial College, London, United Kngdm
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 114,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1454115.1454133
What is a DOI?

ABSTRACT

I/O prefetching has been employed in the past as one of the mechanisms to hide large disk latencies. However, I/O prefetching in parallel applications is problematic when multiple CPUs share the same set of disks due to the possibility that prefetches from different CPUs can interact on shared memory caches in the I/O nodes in complex and unpredictable ways. In this paper, we (i) quantify the impact of compiler-directed I/O prefetching - developed originally in the context of sequential execution - on shared caches at I/O nodes. The experimental data collected shows that while I/O prefetching brings benefits, its effectiveness reduces significantly as the number of CPUs is increased; (ii) identify inter-CPU misses due to harmful prefetches as one of the main sources for this reduction in performance with the increased number of CPUs; and (iii) propose and experimentally evaluate a profiler and compiler assisted adaptive I/O prefetching scheme targeting shared storage caches. The proposed scheme obtains inter-thread data sharing information using profiling and, based on the captured data sharing patterns, divides the threads into clusters and assigns a separate (customized) I/O prefetcher thread for each cluster. In our approach, the compiler generates the I/O prefetching threads automatically. We implemented this new I/O prefetching scheme using a compiler and the PVFS file system running on Linux, and the empirical data collected clearly underline the importance of adapting I/O prefetching based on program phases. Specifically, our proposed scheme improves performance, on average, by 19.9%, 11.9% and 10.3% over the cases without I/O prefetching, with independent I/O prefetching (each CPU is performing compiler-directed I/O prefetching independently), and with one CPU prefetching (one CPU is reserved for prefetching on behalf of others), respectively, when 8 CPUs are used.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
 
4
 
5
 
6
C. Jung et al. Helper Thread Prefetching for Loosely-Coupled Multiprocessor Systems. In IPDPS, 2006.
7
 
8
F. J. Corbato. A Paging Experiment with the Multics System. Technical Report MIT Project MAC Reort MAC-M-384, May 1968.
9
 
10
 
11
 
12
M. D. et al. Cooperative Caching: Using Remote Client Memory to Improve File System Performance. In OSDI, pages 267--280, 1994.
13
 
14
 
15
 
16
17
 
18
19
 
20
D. Kim and D. Yeung. Design and Evaluation of Compiler Algorithms for Pre-Execution. In ASPLOS, pages 159--170, 2002.
 
21
 
22
 
23
 
24
 
25
 
26
27
28
 
29
 
30
31
32
33
34
 
35
 
36
37
 
38
 
39
P. Wong and R. F. V. derWijngaart. NAS Parallel Benchmarks I/O Version 2.4. Technical Report NAS-03-002, NASA Advanced Supercomputing Division, January 2003.
 
40
 
41
 
42
 
43
Z. Chen et al. Eviction-Based Cache Placement for Storage Caches. In USENIX, pages 269--281, 2003.

Collaborative Colleagues:
Seung Woo Son: colleagues
Sai Prashanth Muralidhara: colleagues
Ozcan Ozturk: colleagues
Mahmut Kandemir: colleagues
Ibrahim Kolcu: colleagues
Mustafa Karakoy: colleagues