ACM Home Page
Please provide us with feedback. Feedback
Mining block correlations to improve storage performance
Full text PdfPdf (1.02 MB)
Source ACM Transactions on Storage (TOS) archive
Volume 1 ,  Issue 2  (May 2005) table of contents
Pages: 213 - 245  
Year of Publication: 2005
ISSN:1553-3077
Authors
Zhenmin Li  University of Illinois at Urbana-Champaign, Urbana, IL
Zhifeng Chen  University of Illinois at Urbana-Champaign, Urbana, IL
Yuanyuan Zhou  University of Illinois at Urbana-Champaign, Urbana, IL
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 176,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1063786.1063790
What is a DOI?

ABSTRACT

Block correlations are common semantic patterns in storage systems. They can be exploited for improving the effectiveness of storage caching, prefetching, data layout, and disk scheduling. Unfortunately, information about block correlations is unavailable at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough for discovering block correlations in storage systems.In this article, we propose two algorithms, C-Miner and C-Miner*, that use a data mining technique called frequent sequence mining to discover block correlations in storage systems. Both algorithms run reasonably fast with feasible space requirement, indicating that they are practical for dynamically inferring correlations in a storage system. C-Miner is a direct application of a frequent-sequence mining algorithm with a few modifications; compared with C-Miner, C-Miner* is redesigned for mining block correlations by making concessions for the specific problem of long sequences in storage system traces. Therefore, C-Miner* can discover 7--109% more correlation rules within 2--15 times shorter time than C-Miner. Moreover, we have also evaluated the benefits of block correlation-directed prefetching and data layout through experiments. Our results using real system workloads show that correlation-directed prefetching and data layout can reduce average I/O response time by 12--30% compared to the base case, and 7--25% compared to the commonly used sequential prefetching scheme for most workloads.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
Anthes, G. H. 2002. Storage virtualization: The next step. Computer World, January 28, 2002, p. 43.
 
4
Ari, I., Amer, A., Miller, E., Brandt, S., and Long, D. 2002. Who is more adaptive? ACME: Adaptive caching using multiple experts. In Proceedings of the Workshop on Distributed Data and Structures (WDAS).
5
6
7
8
 
9
Cao, P., Felten, E., and Li, K. 1994. Application-controlled file caching policies. In Proceedings of the USENIX Summer 1994 Technical Conference. 171--182.
10
11
 
12
 
13
Chen, Z., Zhou, Y., and Li, K. 2003. Eviction-based cache placement for storage caches. In Proceedings of the Conference on 2003 USENIX Annual Technical Conference. 269--282.
14
 
15
Chou, H. and DeWitt, D. 1993. An evaluation of buffer management strategies for relational database systems. In Proceedings of the 19th International Conference on Very Large Data Bases (Dublin, Ireland). 127--141.
 
16
Clifton, C. and Gengo, G. 2000. Developing custom intrusion detection filters using data mining. In Proceedings of the 2000 Military Communications International Symposium (MILCOM2000, Los Angeles, CA).
17
 
18
Eaton, P. R., Geels, D., and Mori, G. 1999. Clump: Improving file system performance through adaptive optimizations. Go to http://www.citeseer.csail.mit.edu/eatox99clump.html.
 
19
EMC Corporation. 1999. Symmetrix 3000 and 5000 Enterprise Storage Systems Product Description Guide. EMC Corporation, Hopkinton, MA. Web site: http://www.emc.com.
 
20
Ganger, G. 1995. Systemoriented evaluation of I/O subsystem performance. Tech. rep. CSE-TR-243-95. University of Michigan, Ann Arbor, MI.
 
21
 
22
23
 
24
Griffioen, J. and Appleton, R. 1994. Reducing file system latency using a predictive approach. In Proceedings of the 1994 Summer USENIX Conference.
 
25
Griffioen, J. and Appleton, R. 1995. Performance measurements of automatic prefetching. In Proceedings of the International Conference on Parallel and Distributed Computing Systems.
 
26
Han, J. 2002. How can data mining help bio-data analysis? In Proceedings of the 2002 Workshop on Data Mining in Bioinformatics (BIOKDD'02, Edmonton, Canada). 1--4.
 
27
28
 
29
IBM. 2002. Storage Tank, a distributed storage system. IBM White paper. Web site: http://www.almaden.ibm.com/StorageSystems/file_systems/storage_tank/papers.shtml.
 
30
Keeton, K. and Wilkes, J. 2002. Automating data dependability. In Proceedings of 10th ACM-SIGOPS European Workshop.
 
31
Kim, J., Choi, J., Kim, J., Noh, S., Min, S., Cho, Y., and Kim, C. 2000. A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation (OSDI, San Diego, CA). 119--134.
32
 
33
Kroeger, T. M. and Long, D. D. E. 1995. Predicting file-system actions from prior events. In Proceedings of the 1996 USENIX Annual Technical Conference. 319--328.
 
34
Kuenning, G. 1994. Design of the SEER predictive caching scheme. In Proceedings of the Workshop on Mobile Computing Systems and Applications.
35
36
 
37
Lee, W. and Stolfo, S. 1998. Data mining approaches for intrusion detection. In Proceedings of the 7th USENIX Security Symposium (San Antonio, TX).
 
38
Lei, H. and Duchamp, D. 1997. An analytical approach to file prefetching. In Proceedings of the 1997 USENIX Annual Technical Conference (Anaheim, CA.)
39
40
41
 
42
 
43
44
 
45
46
 
47
 
48
Pitkow, J. E. and Pirolli, P. 1999. Mining longest repeating subsequences to predict World Wide Web surfing. In Proceedings of the USENIX Symposium on Internet Technologies and Systems.
 
49
Ruemmler, C. and Wilkes, J. 1993a. A trace-driven analysis of disk working set sizes. Tech. rep. HPL-OSR-93-23. Hewlett-Packard Laboratories, Palo Alto, CA.
 
50
Ruemmler, C. and Wilkes, J. 1993b. UNIX disk access patterns. In Proceedings of the Winter 1993 USENIX Conference.
 
51
Salmon, B., Thereska, E., Soules, C. A., and Ganger, G. R. 2003. A two-tiered software architecture for automated tuning of disk layouts. In Proceedings of the First Workshop on Algorithms and Architectures for Self-Managing Systems.
 
52
 
53
 
54
Seifert, A. and Scholl, M. H. 2002. A multi-version cache replacement and prefetching policy for hybrid data delivery environments. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB).
 
55
56
 
57
Smith, B. J. 1978b. A pipelined, shared resource MIMD computer. In Proceedings of International Conference on Parallel Processing. 6--8.
58
 
59
Storage Performance Council. 2004. SPC I/O traces. Web site: http://www.storageperformance.org/.
60
61
62
 
63
 
64
65
66
 
67
 
68
Yan, X., Han, J., and Afshar, R. 2003. CloSpan: Mining closed sequential patterns in large datasets. In Proceedings of the 2003 SIAM International Conference Data Mining (SDM'03, San Francisco, CA).
 
69
 
70
 
71


Collaborative Colleagues:
Zhenmin Li: colleagues
Zhifeng Chen: colleagues
Yuanyuan Zhou: colleagues