|
ABSTRACT
Signature files have attracted a lot of interest as an access method for text and specifically for messages in the office environment. Messages are stored sequentially in the message file, whereas their hash-coded abstractions (signatures) are stored sequentially in the signature file. To answer a query, the signature file is examined first, and many nonqualifying messages are immediately rejected. In this paper we examine the problem of designing signature extraction methods and studying their performance. We describe two old methods, generalize another one, and propose a new method and its variation. We provide exact and approximate formulas for the dependency between the false drop probability and the signature size for all the methods, and we show that the proposed method (VBC) achieves approximately ten times smaller false drop probability than the old methods, whereas it is well suited for collections of documents with variable document sizes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
CHRISTODOULAKIS, S., AND FALOUTSOS, C. Design considerations for a message file server. IEEE Trans. Softw. Eng. SE-IO, 2 (Mar. 1984), 201-210.
|
 |
5
|
S. Christodoulakis , F. Ho , M. Theodoridou, The multimedia object presentation manager of MINOS: a symmetric approach, Proceedings of the 1986 ACM SIGMOD international conference on Management of data, p.295-310, May 28-30, 1986, Washington, D.C., United States
|
| |
6
|
FALOUTSOS, C. Integrated access methods for messages using signature files. In IFIP Conference on Methods and Tools for O{fice systems (Pisa, Italy, Oct.), 1986, pp. 135-157.
|
 |
7
|
|
 |
8
|
|
| |
9
|
GALLAGER, R. G., AND VAN VOORHIS, D.C. Optimal source codes for geometrically distributed integer alphabets. IEEE Trans. Inf. Theory IT-21 (Mar. 1975), 228-230.
|
| |
10
|
GOLOMB, S.W. Run length encodings. IEEE Trans. Inf. Theory IT-12 (July 1966), 399-401.
|
| |
11
|
GRAVINA, C.M. National Westminster Bank mass storage archiving. IBM Syst. J. 17, 4 (1978), 344-358.
|
| |
12
|
GRAY, P., ED. Special issue on Meridian system information. Telesis 12, 2, 1985.
|
 |
13
|
|
| |
14
|
|
| |
15
|
HOLLAAR, L.A. Text retrieval computers. Computer 12, 3 (Mar. 1979), 40-50.
|
| |
16
|
HOLLAAR, L. A., SMITH, K. F., CHOW, W. H., EMRATH, P. A., AND HASKIN, R.a. Architecture and operation of a large, full-text information-retrieval system, in Advanced Database Machine Architecture, D. K. Hsiao, Ed. Prentice-Hall, Englewood Cliffs, New Jersey, 1983, pp. 256-299.
|
| |
17
|
IBM. STAIRS/VS: Reference Manual. IBM System Manual, IBM Corp., Armonk, N.Y.
|
| |
18
|
KNUTH, D. E., MORRIS, J. H., AND PRATT, V. R. Fast pattern matching in strings. SIAM J. Comput. 6, 2 {June 1977), 323-350.
|
 |
19
|
|
| |
20
|
MCILROY, M.D. Development of a spelling list. IEEE Trans. Commun. COM-30, 1 (Jan. 1982), 91-99.
|
| |
21
|
MOCKAPETRIS, P. The domain name server. ISI/RS-84-133, University of Southern California Information Science Institute, Los Angeles, Calif., June 1984.
|
| |
22
|
MOOERS, C. Application of random codes to the gathering of statistical information. Bulletin 31, Zator Co., Cambridge, Mass., 1949.
|
| |
23
|
|
 |
24
|
|
| |
25
|
Andrew A. Poggio , J. J. Garcia Luna Aceves , Earl J. Craighill , Douglas B. Moran , Lorenzo Aguilar , David R. L. Worthington , John M. Hight, CCWS: a computer-based, multimedia information system, Computer, v.18 n.10, p.92-103, Oct. 1985
[doi> 10.1109/MC.1985.1662718]
|
| |
26
|
ROBERTS, C.S. Partial-match retrieval via the method of superimposed codes. In Proceedings of the IEEE 67, 12 (Dec. 1979), 1624-1642.
|
| |
27
|
SACKS-DAVIS, R., AND RAMAMOHANARAO, K. A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8, 4, 1983, 273-280.
|
| |
28
|
|
| |
29
|
SOLOMON, M., LANDWEBER, L., AND NEUHENGEN, D. The CSNET name server. Comput. Networks 6, 3, (July 1982), 161-172.
|
| |
30
|
STANDISH, W.A. An essay on software reuse. IEEE Trans. Softw. Eng. SE-IO, 5 (Sept. 1984), 494-497.
|
 |
31
|
|
| |
32
|
STIASSNY, S. Mathematical analysis of various superimposed coding methods. Am. Documentation 11, 2 (Feb. 1960), 155-169.
|
 |
33
|
|
| |
34
|
Dennis Tsichritzis , Stavros Christodoulakis , P. Economopoulos , Christos Faloutsos , A. Lee , D. Lee , J. Vandenbroeck , Carson C. Woo, A Multimedia Office Filing System, Proceedings of the 9th International Conference on Very Large Data Bases, p.2-7, October 31-November 02, 1983
|
| |
35
|
|
REVIEW
"Dennis Roy Thompson : Reviewer"
Christos and Stavros stuck in their thumbs and pulled out a signature file
method plum. Access method classifications are full text scanning, inversion,
signature files, clustering, and multi-attribute hashing. Signature file
methods are concern
more...
|