|
ABSTRACT
The file-bundle caching problem arises frequently in scientific applications where jobs process several files concurrently. Consider a host system in a data-grid that maintains a disk cache for servicing jobs of file requests where a job is serviced only if all its requested files are present in the disk cache. Files must now be admitted into the cache and replaced in sets of file-bundles. We show that traditional caching algorithms based on file popularity measures do not perform well since they may hold in cache non-relevant combinations of files. We present and analyze a new caching algorithm for maximizing the throughput of jobs and minimizing data replacement costs at such data-grid hosts. We tested the new algorithm using a disk cache simulation model under a wide range of conditions of file request distributions, varying cache size, file size distribution, etc. The results show significant improvement over traditional caching algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
[2] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets. J. Network and Computer Applications, 23(3):187- 200, 2000.
|
| |
3
|
[3] ESG:. The Earth System Grid, http://www.scd.ucar.edu/css/esg/.
|
| |
4
|
[4] U. Feige, D. Peleg, and G. Kortsarz. The dense k-subgraph problem. Algorithmica, 29(3):410-421, 2001.
|
| |
5
|
[5] U. Hahn, W. Dilling, and D. Kaletta. Adaptive replacement algorithm for disk caches in hsm systems. In 16 Int'l. Symp on Mass Storage Syst., pages 128-140, San Diego, California, Mar. 15-18 1999.
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
[10] PPDG:. The Particle Physics Data Grid, http://www.ppdg.net/.
|
| |
11
|
|
| |
12
|
[12] A. Shoshani, A. Sim, and J. Gu. Storage resource managers: Middleware components for grid storage. In 10th NASA Goddard Conference on Mass Storage Syst. and Tech., Apr. 15-18 2002.
|
| |
13
|
Mitchell D. Theys , Howard Jay Siegel , Noah B. Beck , Min Ta , n , Inc. Cisco Systems , Michael Jurczyk, A Mathematical Model, Heuristic, and Simulation Study for a Basic Data Staging Problem in a Heterogeneous Networking Environment, Proceedings of the Seventh Heterogeneous Computing Workshop, p.115, March 30-30, 1998
|
| |
14
|
[14] J. Wang. A survey of web caching schemes for the internet. In ACM SIGCOMM'99, Cambridge, Massachusetts, Aug. 1999.
|
| |
15
|
Kesheng Wu , Wendy Koegler , Jacqueline Chen , Arie Shoshani, Using bitmap index for interactive exploration of large datasets, Proceedings of the 15th international conference on Scientific and statistical database management, p.65-74, July 09-11, 2003, Cambridge, MA
[doi> 10.1109/SSDM.2003.1214955]
|
| |
16
|
|
CITED BY 10
|
|
Sudharshan S. Vazhkudai , Xiaosong Ma , Vincent W. Freeh , Jonathan W. Strickland , Nandan Tammineedi , Tyler Simon , Stephen L. Scott, Constructing collaborative desktop storage caches for large scientific datasets, ACM Transactions on Storage (TOS), v.2 n.3, p.221-254, August 2006
|
|
|
Xiaosong Ma , Vincent W. Freeh , Tao Yang , Sudharshan S. Vazhkudai , Tyler A. Simon , Stephen L. Scott, Coupling prefix caching and collective downloads for remote dataset access, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|
|
Sudharshan S. Vazhkudai , Xiaosong Ma , Vincent W. Freeh , Jonathan W. Strickland , Nandan Tammineedi , Stephen L. Scott, FreeLoader: Scavenging Desktop Storage Resources for Scientific Data, Proceedings of the 2005 ACM/IEEE conference on Supercomputing, p.56, November 12-18, 2005
|
|
|
|
|
|
|
|
|
Elizeu Santos-Neto , Samer Al-Kiswany , Nazareno Andrade , Sathish Gopalakrishnan , Matei Ripeanu, enabling cross-layer optimizations in storage systems with custom metadata, Proceedings of the 17th international symposium on High performance distributed computing, June 23-27, 2008, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|