ACM Home Page
Please provide us with feedback. Feedback
Replication and retrieval strategies of multidimensional data on parallel disks
Full text PdfPdf (197 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the twelfth international conference on Information and knowledge management table of contents
New Orleans, LA, USA
SESSION: Database session 1: querying high-dimensional data table of contents
Pages: 32 - 39  
Year of Publication: 2003
ISBN:1-58113-723-0
Authors
Chung-Min Chen  Telcordia Technologies
Christine T. Cheng  University of Wisconsin-Milwaukee, WI
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 44,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956863.956871
What is a DOI?

ABSTRACT

Aside from enhancing data availability during disk failures, replication of data is also used to speed up I/O performance of read-intensive applications. There are two issues that need to be addressed: (a) data placement (Which disks should store the copies of each data block?) and (b) scheduling (Given a query Q, and a placement scheme P of the data, from which disk should each block in Q be retrieved so that retrieval time is minimized?) In this paper, we consider range queries and assume that the dataset is a multidimensional grid and r copies of each unit block of the grid must be stored among M disks. To accurately measure performance of a scheduling algorithm, we consider a metric that takes into account the scheduling overhead as well as the time it takes to retrieve the data blocks from the disks. We describe several combinations of data placement schemes and scheduling algorithms and analyze their performance for range queries with respect to the above metric. We then present simulation results for the most interesting case r=2, showing that the strategies do perform better than the previously known method, especially for large queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
4
 
5
Chung-Min Chen , Rakesh K. Sinha , Randeep Bhatia, Efficient Disk Allocation Schemes for Parallel Retrieval of Multidimensional Grid Data, Proceedings of the 13th International Conference on Scientific and Statistical Database Management, p.213-222, July 18-20, 2001
6
 
7
L. Ford, Jr and D. Fulkerson. Flows in Networks. Princeton University Press, Princeton, NJ, 1962.
 
8
 
9
M. Gutierrez. Storage of spatial data in a semantic database. Master's thesis, School of Computer Science, Florida International University, Miami, FL, 1997.
 
10
 
11
 
12
 
13
 
14
J. Srivastava, T. Niccum, and B. Himatsingka. Data declustering in PADMA: A parallel database manager. IEEE Data Engineering Bulletin, 17(3):3--13, 1994.
 
15
A. S. Tosun and H. Ferhatosmanoglu. Optimal parallel I/O using replication. In Int. Workshops on Parallel Processing, Vancouver, Canada, 2002.
 
16
A. S. Tosun and H. Ferhatosmanoglu. Soda: A framework for strictly optimal disk allocation. submitted, 2002.


Collaborative Colleagues:
Chung-Min Chen: colleagues
Christine T. Cheng: colleagues