ACM Home Page
Please provide us with feedback. Feedback
Partitioned signature files: design issues and performance evaluation
Full text PdfPdf (1.52 MB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 7 ,  Issue 2  (April 1989) table of contents
Pages: 158 - 180  
Year of Publication: 1989
ISSN:1046-8188
Authors
Dik Lun Lee  Ohio State Univ., Columbus
Chun-Wu Leng  Ohio State Univ., Columbus
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 36,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/65935.65937
What is a DOI?

ABSTRACT

A signature file acts as a filtering mechanism to reduce the amount of text that needs to be searched for a query. Unfortunately, the signature file itself must be exhaustively searched, resulting in degraded performance for a large file size. We propose to use a deterministic algorithm to divide a signature file into partitions, each of which contains signatures with the same “key.” The signature keys in a partition can be extracted and represented as the partition's key. The search can then be confined to the subset of partitions whose keys match the query key. Our main concern here is to study methods for obtaining the keys and their performance in terms of their ability to reduce the search space. Owing to the reduction of search space, partitioning a signature file has a direct benefit in a sequential search (single-processor) environment. In a parallel environment, search can be conducted in parallel effectively by allocating one or more partitions to a processor. Partitioning the signature tile with a deterministic method (as opposed to a random partitioning scheme) provides intraquery parallelism as well as interquery parallelism. In this paper, we outline the criteria for evaluating partitioning schemes. Three algorithms are described and studied. An analytical study of the performance of the algorithms is provided and the results are verified with simulation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
CHRISTODOULAKIS, S., AND FALOUTSOS, C. Design considerations for a message file server. IEEE Trans. Soft. Eng. SE-IO, 2 (Mar. 1984), 201-210.
4
5
6
7
8
 
9
 
10
LEE, D. L., AND LENG, C.-W. A fast access method based on partitioned signature files. Submitted for publication, 1988.
 
11
LEE, D. L., AND LOCHOVSKY, F. H. Text retrieval machines. In Office Automation, D. C. Tsichritzis, Ed. Springer-Verlag, New York, 1985, pp. 339-375.
 
12
OROSZ, G., AND TACKACS, L. Some probability problems concerning the marking of codes in the superimposition field. J. Doc. 12, 4 (Dec. 1956), 231-234.
13
 
14
RIVEST, R.L. Partial-match retrieval algorithms. SIAM J. Comput. 5, 1 (1976), 19-50.
 
15
ROBERTS, C.S. Partial-match retrieval via a method of superimposed codes. In Proceedings of the IEEE 67, 12 (Dec. 1979), 1624-1642.
 
16
SACKS-DAVIS, R., AND RAMAMOHANARAO, K. A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8, 4 (1983), 273-280.
17
18
19

CITED BY  18


REVIEW

"Esen A. Ozkarahan : Reviewer"

The search of very large databases requires special hardware and software architectures to achieve acceptable performance. Text (document) databases are among the largest, which further complicates the search problem. In order to achieve effic  more...

Collaborative Colleagues:
Dik Lun Lee: colleagues
Chun-Wu Leng: colleagues