ACM Home Page
Please provide us with feedback. Feedback
Description and performance analysis of signature file methods for office filing
Full text PdfPdf (1.31 MB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 5 ,  Issue 3  (July 1987) table of contents
Pages: 237 - 257  
Year of Publication: 1987
ISSN:1046-8188
Authors
Christos Faloutsos  Univ. of Maryland, College Park
Stavros Christodoulakis  Univ. of Waterloo, Waterloo, Ont., Canada
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 41,   Citation Count: 30
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/27641.28057
What is a DOI?

ABSTRACT

Signature files have attracted a lot of interest as an access method for text and specifically for messages in the office environment. Messages are stored sequentially in the message file, whereas their hash-coded abstractions (signatures) are stored sequentially in the signature file. To answer a query, the signature file is examined first, and many nonqualifying messages are immediately rejected. In this paper we examine the problem of designing signature extraction methods and studying their performance. We describe two old methods, generalize another one, and propose a new method and its variation. We provide exact and approximate formulas for the dependency between the false drop probability and the signature size for all the methods, and we show that the proposed method (VBC) achieves approximately ten times smaller false drop probability than the old methods, whereas it is well suited for collections of documents with variable document sizes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
CHRISTODOULAKIS, S., AND FALOUTSOS, C. Design considerations for a message file server. IEEE Trans. Softw. Eng. SE-IO, 2 (Mar. 1984), 201-210.
5
 
6
FALOUTSOS, C. Integrated access methods for messages using signature files. In IFIP Conference on Methods and Tools for O{fice systems (Pisa, Italy, Oct.), 1986, pp. 135-157.
7
8
 
9
GALLAGER, R. G., AND VAN VOORHIS, D.C. Optimal source codes for geometrically distributed integer alphabets. IEEE Trans. Inf. Theory IT-21 (Mar. 1975), 228-230.
 
10
GOLOMB, S.W. Run length encodings. IEEE Trans. Inf. Theory IT-12 (July 1966), 399-401.
 
11
GRAVINA, C.M. National Westminster Bank mass storage archiving. IBM Syst. J. 17, 4 (1978), 344-358.
 
12
GRAY, P., ED. Special issue on Meridian system information. Telesis 12, 2, 1985.
13
 
14
 
15
HOLLAAR, L.A. Text retrieval computers. Computer 12, 3 (Mar. 1979), 40-50.
 
16
HOLLAAR, L. A., SMITH, K. F., CHOW, W. H., EMRATH, P. A., AND HASKIN, R.a. Architecture and operation of a large, full-text information-retrieval system, in Advanced Database Machine Architecture, D. K. Hsiao, Ed. Prentice-Hall, Englewood Cliffs, New Jersey, 1983, pp. 256-299.
 
17
IBM. STAIRS/VS: Reference Manual. IBM System Manual, IBM Corp., Armonk, N.Y.
 
18
KNUTH, D. E., MORRIS, J. H., AND PRATT, V. R. Fast pattern matching in strings. SIAM J. Comput. 6, 2 {June 1977), 323-350.
19
 
20
MCILROY, M.D. Development of a spelling list. IEEE Trans. Commun. COM-30, 1 (Jan. 1982), 91-99.
 
21
MOCKAPETRIS, P. The domain name server. ISI/RS-84-133, University of Southern California Information Science Institute, Los Angeles, Calif., June 1984.
 
22
MOOERS, C. Application of random codes to the gathering of statistical information. Bulletin 31, Zator Co., Cambridge, Mass., 1949.
 
23
24
 
25
 
26
ROBERTS, C.S. Partial-match retrieval via the method of superimposed codes. In Proceedings of the IEEE 67, 12 (Dec. 1979), 1624-1642.
 
27
SACKS-DAVIS, R., AND RAMAMOHANARAO, K. A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8, 4, 1983, 273-280.
 
28
 
29
SOLOMON, M., LANDWEBER, L., AND NEUHENGEN, D. The CSNET name server. Comput. Networks 6, 3, (July 1982), 161-172.
 
30
STANDISH, W.A. An essay on software reuse. IEEE Trans. Softw. Eng. SE-IO, 5 (Sept. 1984), 494-497.
31
 
32
STIASSNY, S. Mathematical analysis of various superimposed coding methods. Am. Documentation 11, 2 (Feb. 1960), 155-169.
33
 
34
 
35

CITED BY  30


REVIEW

"Dennis Roy Thompson : Reviewer"

Christos and Stavros stuck in their thumbs and pulled out a signature file method plum. Access method classifications are full text scanning, inversion, signature files, clustering, and multi-attribute hashing. Signature file methods are concern  more...

Collaborative Colleagues:
Christos Faloutsos: colleagues
Stavros Christodoulakis: colleagues