|
ABSTRACT
This paper compares text retrieval methods intended for office systems. The operational requirements of the office environment are discussed, and retrieval methods from database systems and from information retrieval systems are examined. We classify these methods and examine the most interesting representatives of each class. Attempts to speed up retrieval with special purpose hardware are also presented, and issues such as approximate string matching and compression are discussed. A qualitative comparison of the examined methods is presented. The signature file method is discussed in more detail.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
ANGELL, R. C., FREUND, G. E., AND WILLET, P. 1983. Automatic spelling correction using a trigram similarity measure. Inf. Process. Manage. 19, 4, pp. 255-261.
|
 |
5
|
|
| |
6
|
BATCHER, K. E. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference. AFIPS Press, Reston, Va., pp. 307-314.
|
| |
7
|
BAYER, R., AND MCCREIGHT, E. 1972. Organization and maintenance of large ordered indexes. Acta Inf. 1, 3, pp. 173-189.
|
 |
8
|
|
 |
9
|
|
| |
10
|
BOURNE, C. P. 1963. Methods of Information Handling. Wiley, New York.
|
| |
11
|
BOURNE, C. P. 1977. Frequency and impact of spelling errors in bibliographic databases. Inf. Process. Manage. 13, 1, pp. 1-12.
|
 |
12
|
|
| |
13
|
BRAUEN, T. 1971. Document vector modification. In The SMART Retrieval System--Experiments in Automatic Document Processing, G. Salton, Ed. Prentice-Hall, Englewood Cliffs, N.J., Ch. 24.
|
| |
14
|
BURKOWSKI, F. J. 1982. A hardware hashing scheme in the design of a multiterm string comparator. IEEE Trans. Comput. C-31, 9 (Sept.), 825-834.
|
| |
15
|
CHRISTODOULAKIS, S. 1983. Access files for hatching queries in large information systems. In Proceedings of ICOD II (Aug.).
|
| |
16
|
|
| |
17
|
CHRISTODOULAKIS, S., AND FALOUTSOS, C. 1984. Design considerations for a message file server. IEEE Trans. Softw. Eng. SE-IO, 2 (Mar.), 201- 210.
|
 |
18
|
|
| |
19
|
COOPER, W. S. 1970. On deriving design equations for information retrieval systems. J. Am. Soc. Inf. Sci. (Nov.-Dec.).
|
| |
20
|
CROFT, W. B. 1980. A model of cluster searching based on classification. Inf. Syst. 5, 189-195.
|
 |
21
|
|
| |
22
|
DATrOLA, R. 1979. FIRST: Flexible information retrieval system for text. J. Am. Soc. Inf. Sci, 30, (Jan.), 9-14.
|
| |
23
|
DE LA BRIANDAIS, S. R. 1959. File searching using variable length keys. In Proceedings of the AFIPS Spring Joint Computer Conference. AFIPS Press, Reston, Va., pp. 295-298.
|
| |
24
|
DEWEY, C. 1950. Relative Frequency of English Speech Sounds. Harvard University Press, Cambridge, Mass.
|
| |
25
|
DUDA, R. O., AND HART, P. E. 1973. Pattern Classification and Scene Analysis. Wiley, New York.
|
 |
26
|
|
| |
27
|
FALOUTSOS, C. 1985a. Design of a signature file method that accounts for nonuniform occurrence and query frequencies. In Proceedings of the 11th Conference on Very Large Data Bases (Stockholm, Sweden, Aug.), to appear.
|
 |
28
|
|
 |
29
|
|
| |
30
|
FILES, J. R., A~D HUSKEY, H. D. 1969. An information retrieval system based on superimposed coding. In Proceedings of the Fall Joint Computer Conference, vol. 35. AFIPS Press, Reston, Va., pp. 423- 432.
|
| |
31
|
Fox, E. A. 1984. Extended information retrieval with data and text. PODS, submitted for publication.
|
 |
32
|
|
| |
33
|
FRIEDMAN, S. R., MACEYAK, J. A., AND WEISS, S. F. 1971. A relevance feedback system based on document transformation. In The SMART Retrieval System--Experiments in Automatic Document Processing, G. Salton, Ed., Prentice-Hall Englewood Cliffs, N.J., Ch. 23.
|
 |
34
|
|
| |
35
|
GALLAGER, R. G., AND VAN VOORH{$, D. C. 1975. Optimal source codes for geometrically distributed integer alphabets. IEEE Trans. Inf. Theor. IT~21 (March), 228-230.
|
| |
36
|
GOLOMB, S. W. 1966. Run length encodings. IEEE Trans. Inf. Theor. IT-12 (July), 399-401.
|
| |
37
|
GRAVINA, C. M. 1978. National Westminster Bank mass storage archiving. IBM Syst. J. 17, 4, 344- 358.
|
 |
38
|
|
 |
39
|
|
 |
40
|
|
| |
41
|
HASKIN, R. L. 1981. Special-purpose processors for text retrieval. Database Eng. 4, i (Sept.), 16-29.
|
 |
42
|
|
 |
43
|
|
 |
44
|
|
| |
45
|
HOLLAAR, L. A. 1979. Text retrieval computers. IEEE Comput. Mag. 12, 3 (Mar.), 40-50.
|
| |
46
|
HOLLAAR, L. A., SMITH, K. F., CHOW, W. H., EMRATH, P. A., AND HASKIN, R. L. 1983. Architecture and operation of a large, full-text information-retrieval system. In Advanced Database Machine Architecture, D. K. Hsiao, Ed. Prentice- Hall, Englewood Cliffs, N.J., pp. 256-299.
|
| |
47
|
|
| |
48
|
IBM 1979. STAIRS/VS: Reference Manual. IBM Systern Manual.
|
| |
49
|
|
| |
50
|
KAUTZ, W. H., AND SINGLETON, R. C. 1964. Nonrandom binary superimposed codes. IEEE Trans. Inf. Theor. IT-IO (Oct.), 363-377.
|
| |
51
|
KNOTT, G. D. 1971. Expandable open addressing hash table storage and retrieval. In Proceedings of the ACM SIGFIDET Conference (San Diego, Calif.). ACM, New York, pp. 187-206.
|
| |
52
|
KNOTT, G. D. 1975. Hashing functions. Comput. J. 18, 3, pp. 265-278.
|
| |
53
|
|
| |
54
|
KNUTH, D. E., MORRIS, J. H., AND PRATT, V. R. 1977. Fast pattern matching in strings. SlAM J. Cornput. 6, 2 (June), 323-350.
|
| |
55
|
LARSON, P. 1978. Dynamic hashing. BIT 18, pp. 184- 201.
|
 |
56
|
|
| |
57
|
LESK, M. E. 1978. Some applications of inverted indexes on the UNIX system. In the UNIX Programmer's Manual. Bell Laboratories, Murray Hill, N.J.
|
| |
58
|
LLOYD, J. W. 1980. Optimal partial-match retrieval. BIT 20, pp. 406-413.
|
| |
59
|
LLOYD, J. W., AND RAMAMOHANARAO, K. 1982. Partial-match retrieval for dynamic files. BIT, 22, pp. 150-168.
|
 |
60
|
|
| |
61
|
|
| |
62
|
MCILROY, M. D. 1982. Development of a spelling list. IEEETrans. Commun. COM-30, 1 (Jan.), 91-99.
|
| |
63
|
McLEoo, I. A. 1981. A database management system for document retrieval applications. Inf. Syst. 6, 2, pp. 131-137.
|
| |
64
|
MOOERS, C. 1949. Application of random codes to the gathering of statistical information. Bull. 31, Zator Co., Cambridge, Mass., 1949. Based on M.S.C. thesis, MIT, Jan. 1948.
|
 |
65
|
|
| |
66
|
OROSZ, G., AND TACKACS, L. 1956. Some probability problems concerning the marking of codes into the superimposed field. J. Doc. 12, 4 (Dec.), 231- 234.
|
 |
67
|
|
 |
68
|
|
| |
69
|
|
| |
70
|
RIVEST, R. L. ~976. Partial match retrieval algorithms. SIAM J. Comput. 5, 1 (Mar.), 19-50.
|
| |
71
|
ROBERTS, C. S. 1979. Partial-match retrieval via the method of superimposed codes. Proc. iEEE 67, 12 (Dec.), 1624-1642.
|
 |
72
|
|
| |
73
|
Roccmo, J. J. 1971. Relevance feedback in information retrieval. In The SMART Retrieval System-- Experiments in Automatic Document Processing, G. Salton, Ed., Prentice-Hall, Englewood Cliffs, N.J., Ch. 14.
|
 |
74
|
|
| |
75
|
|
| |
76
|
SALTON, G. 1971b. Relevance feedback and the optimization of retrieval effectiveness. In The SMART Retrieval System--Experiments in Automatic Document Processing, G. Salton, Ed., Prentice-Hall, Englewood Cliffs, N.J., Ch. 15.
|
| |
77
|
SALTON, G. 1972. Experiments in automatic thesaurus construction for information retrieval. In Informarion Processing 71. North-Holland, Amsterdam, pp. 115-123.
|
 |
78
|
|
| |
79
|
|
| |
80
|
SALTO~, G. 1980. Automatic information retrieval. IEEE Comput. Mag. 13, 9 (Sept.), 41-56.
|
| |
81
|
|
 |
82
|
|
 |
83
|
|
| |
84
|
SCHUEGRAPH, E. J., ANO HEAPS, H. 8. 1976. Query processing in a retrospective document retrieval system that uses word fragments as language elements. Inf. Process. Manage. 12, pp. 283- 292.
|
 |
85
|
|
| |
86
|
SEVERANCE, D. G. 1983. A practitioner's Guide to database compression. In{. Syst. 8, 1, 51-62.
|
| |
87
|
8PARCK-JONES, K. 1972. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28, 1 (Mar.), 11-20.
|
| |
88
|
STELLHORN, W. H. 1977. An inverted file processor for information retrieval. IEEE Trans. Comput. C-26, 12 (Dec.), 1258-1267.
|
| |
89
|
STIASSNY, 8. 1960. Mathematical analysis of various superimposed coding methods. Am. Doc. 11, 2 (Feb.), 155-169.
|
 |
90
|
|
 |
91
|
|
| |
92
|
Dennis Tsichritzis , Stavros Christodoulakis , P. Economopoulos , Christos Faloutsos , A. Lee , D. Lee , J. Vandenbroeck , Carson C. Woo, A Multimedia Office Filing System, Proceedings of the 9th International Conference on Very Large Data Bases, p.2-7, October 31-November 02, 1983
|
| |
93
|
VAN RIJSBERGEN, C. J. 1971. An algorithm for information structuring and retrieval. Comput. J. 14, 4, pp. 407-412.
|
| |
94
|
|
| |
95
|
WELCH, T. A. 1984. A technique for high-performance data compression. IEEE Comput. Mag. 17, 6 (June), 8-19.
|
| |
96
|
WINe, J. M. 1979. Partial-match retrieval using TRIES, hashing and superimposed codes. M.Sc. thesis, MIT, June.
|
 |
97
|
|
 |
98
|
|
| |
99
|
ZAHN, C. T. 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. C-20, i (Jan.), 68-86.
|
| |
100
|
ZIV, J., AND LEMPEL, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. In{. Theor. IT-23, 3 (May), 337-343.
|
CITED BY 86
|
|
|
|
|
|
|
|
Lon-Mu Liu , Yair M. Babad , Wei Sun , Ki-Kan Chan, Adaptive post-processing of OCR text via knowledge acquisition, Proceedings of the 19th annual conference on Computer Science, p.558-569, April 1991, San Antonio, Texas, United States
|
|
|
|
|
|
|
|
|
|
|
|
J. K. Cringean , R. England , G. A. Manson , P. Willett, Parallel text searching in serial files using a processor farm, Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, p.429-453, September 05-07, 1990, Brussels, Belgium
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Robert W.P. Luk , H. V. Leong , Tharam S. Dillon , Alvin T.S. Chan , W. Bruce Croft , James Allan, A survey in indexing and searching XML documents, Journal of the American Society for Information Science and Technology, v.53 n.6, p.415-437, May, 2002
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charu C. Aggarwal , Joel L. Wolf , Kun-Lung Wu , Philip S. Yu, Horting hatches an egg: a new graph-theoretic approach to collaborative filtering, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, p.201-212, August 15-18, 1999, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yong Kyu Lee , Seong-Joon Yoo , Kyoungro Yoon , P. Bruce Berra, Index structures for structured documents, Proceedings of the first ACM international conference on Digital libraries, p.91-99, March 20-23, 1996, Bethesda, Maryland, United States
|
|
|
Tamas E. Doszkocs, IR, NLP, AI and UFOS: or IR-relevance, natural language problems, artful intelligence and user-friendly online systems, Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval, p.49-57, September 1986, Palazzo dei Congressi, Pisa, Italy
|
|
|
|
|
|
|
|
|
John Light, A distributed, graphical, topic-oriented document search system, Proceedings of the sixth international conference on Information and knowledge management, p.285-292, November 10-14, 1997, Las Vegas, Nevada, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. Ayse Özel , I. Sengör Altingövde , Özgür Ulusoy , Gültekin Özsoyoǧlu , Z. Meral Özsoyoǧlu, Metadata-based modeling of information resources on the Web, Journal of the American Society for Information Science and Technology, v.55 n.2, p.97-110, January 15, 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REVIEW
"Harold Borko : Reviewer"
Automated text retrieval methods are used in libraries and in many online
bibliographic and numerical data files. They provide the library user with fast
and accurate access to books and journal papers. Text retrieval is also
important in office
more...
|