|
ABSTRACT
A signature file acts as a filtering mechanism to reduce the amount of text that needs to be searched for a query. Unfortunately, the signature file itself must be exhaustively searched, resulting in degraded performance for a large file size. We propose to use a deterministic algorithm to divide a signature file into partitions, each of which contains signatures with the same “key.” The signature keys in a partition can be extracted and represented as the partition's key. The search can then be confined to the subset of partitions whose keys match the query key. Our main concern here is to study methods for obtaining the keys and their performance in terms of their ability to reduce the search space.
Owing to the reduction of search space, partitioning a signature file has a direct benefit in a sequential search (single-processor) environment. In a parallel environment, search can be conducted in parallel effectively by allocating one or more partitions to a processor. Partitioning the signature tile with a deterministic method (as opposed to a random partitioning scheme) provides intraquery parallelism as well as interquery parallelism.
In this paper, we outline the criteria for evaluating partitioning schemes. Three algorithms are described and studied. An analytical study of the performance of the algorithms is provided and the results are verified with simulation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
CHRISTODOULAKIS, S., AND FALOUTSOS, C. Design considerations for a message file server. IEEE Trans. Soft. Eng. SE-IO, 2 (Mar. 1984), 201-210.
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
| |
9
|
|
| |
10
|
LEE, D. L., AND LENG, C.-W. A fast access method based on partitioned signature files. Submitted for publication, 1988.
|
| |
11
|
LEE, D. L., AND LOCHOVSKY, F. H. Text retrieval machines. In Office Automation, D. C. Tsichritzis, Ed. Springer-Verlag, New York, 1985, pp. 339-375.
|
| |
12
|
OROSZ, G., AND TACKACS, L. Some probability problems concerning the marking of codes in the superimposition field. J. Doc. 12, 4 (Dec. 1956), 231-234.
|
 |
13
|
|
| |
14
|
RIVEST, R.L. Partial-match retrieval algorithms. SIAM J. Comput. 5, 1 (1976), 19-50.
|
| |
15
|
ROBERTS, C.S. Partial-match retrieval via a method of superimposed codes. In Proceedings of the IEEE 67, 12 (Dec. 1979), 1624-1642.
|
| |
16
|
SACKS-DAVIS, R., AND RAMAMOHANARAO, K. A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8, 4 (1983), 273-280.
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
REVIEW
"Esen A. Ozkarahan : Reviewer"
The search of very large databases requires special hardware and
software architectures to achieve acceptable performance. Text
(document) databases are among the largest, which further complicates
the search problem. In order to achieve effic
more...
|