ACM Home Page
Please provide us with feedback. Feedback
Optimal weight assignment for signature generation
Full text PdfPdf (1.52 MB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 17 ,  Issue 2  (June 1992) table of contents
Pages: 346 - 373  
Year of Publication: 1992
ISSN:0362-5915
Authors
Chun-Wu Roger Leng  Ohio State Univ., Columbus
Dik Lun Lee  Ohio State Univ., Columbus
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 26,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/128903.128907
What is a DOI?

ABSTRACT

Previous work on superimposed coding has been characterized by two aspects. First, it is generally assumed that signatures are generated from logical text blocks of the same size; that is, each block contains the same number of unique terms after stopword and duplicate removal. We call this approach the fixed-size block (FSB) method, since each text block has the same size, as measured by the number of unique terms contained in it. Second, with only a few exceptions [6,7,8,9,17], most previous work has assumed that each term in the text contributes the same number of ones to the signature (i.e., the weight of the term signatures is fixed). The main objective of this paper is to derive an optimal weight assignment that assigns weights to document terms according to their occurrence and query frequencies in order to minimize the false-drop probability. The optimal scheme can account for both uniform and nonuniform occurence and query frequencies, and the signature generation method is still based on hashing rather than on table lookup. Furthermore, a new way of generating signatures, the fixed-weight block (FWB) method, is introduced. FWB controls the weight of every signature to a constant, whereas in FSB, only the expected signature weight is constant. We have shown that FWB has a lower false-drop probability than that of the FSB method, but its storage overhead is slightly higher. Other advantages of FWB are that the optimal weight assignment can be obtained analytically without making unrealistic assumptions and that the formula for computing the term signature weights is simple and efficient.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
APOSTOL, T M. Mathematical AnalysLs. 2nd ed Addison-Wesley, Reading, Mass., 1974.
 
2
3
 
4
5
 
6
FALOUTSOS, C., AND CHRISTODOULAKIS, S. Design of a signature file method that accounts for non-uniform occurrence and query frequencies. In Proceedings of the 11th International Conference on Very Large Data Bases (Stockholm, Aug. 1985). VLDB Endowment, 1985, pp. 165-170.
7
8
 
9
KENT, A., RAMAMOHANARAO, K., AND SACKS-DAviS, R. A signature file scheme based on multiple organizations for indexing very large databases. J. Am. Soc. Inf. Sci. 41, 7 (1990), 508-534.
 
10
KIM, Y. M., LEE, D. L., AND GAURAV, P. Efficient search methods for signature files. Submitted for publication.
 
11
12
 
13
14
 
15
ROBERTS, C.S. Partial-match retrieval via method of superimposed codes. Proc. IEEE 67, 12 (Dec. 1979), 1624-1642.
 
16
SACKS-DAvIS, R., AND RAMAMOHANARAO, K. A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8, 4 (1983), 273-280.
17
 
18
19
 
20
STIASSNY, S. Mathematical analysis of various superimposed coding methods. Am. Doc. 11, 2 (Feb. 1960), 155-169.
21



REVIEW

"Fazli Can : Reviewer"

Due to the information loss that takes place during signature generation, the common concern of all signature generation methods is to minimize the false drop probability without generating too much space overhead. The traditional fixed-size blo  more...

Collaborative Colleagues:
Chun-Wu Roger Leng: colleagues
Dik Lun Lee: colleagues