|
ABSTRACT
Many experts in mechanized text processing now agree that useful automatic language analysis procedures are largely unavailable and that the existing linguistic methodologies generally produce disappointing results. An attempt is made in the present study to identify those automatic procedures which appear most effective as a replacement for the missing language analysis.
A series of computer experiments is described, designed to simulate a conventional document retrieval environment. It is found that a simple duplication, by automatic means, of the standard, manual document indexing and retrieval operations will not produce acceptable output results. New mechanized approaches to document handling are proposed, including document ranking methods, automatic dictionary and word list generation, and user feedback searches. It is shown that the fully automatic methodology is superior in effectiveness to the conventional procedures in normal use.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
GARVIN, P. L., et al. Some opinions concerning linguistics and reformation processing. Rep. PB 190 639, Center for Applied Linguistics, May 1969. Available from National Technical Information Service, Washington, D.C.
|
 |
2
|
|
 |
3
|
|
| |
4
|
BAXENDALt~, P. An empirical model for machine indexing. Third Institute on Information Storage and Retrieval, American U, Washington, D.C., Feb. 1961, pp. 207-218.
|
| |
5
|
CLARK~, D. C, AND WALL, R E. An economical program for the limited parsing of English, Proc AFIPS 1965 FJCC, Vol. 27, Pt. 1, Spartan Books, New York, pp. 307-319.
|
 |
6
|
|
| |
7
|
RusH, J. E., SALVADOR, R., AND ZAMORA, h. Automatic abstracting and indexing: Production of indicative abstracts by application of contextual inference and syntactic coherence criteria. J. ASIS 22, 4 (July-Aug. 1971), 260-274.
|
| |
8
|
SALTON, G. Automatic text analysis. Science 168, 3929 (17 Apr. 1970), 335-343.
|
| |
9
|
CI~EVERDON, C. W., aND KnEN, E.M. Factors determining the performance of indexing systems; Vol. 2--test results. Ashb Cranfield Res. Proj., Cranfield, England, 1966.
|
 |
10
|
|
| |
11
|
SALTON, G. Automatic processing of foreign language documents. J. ASIS 21, 3 (May- June 1970), 187-194.
|
| |
12
|
DENNIS, S.F. The design and testing of a fully automatic indexing-searching system for documents consisting of expository text. In Information Retr~eval--A Critical View, G. Schecter, Ed., Thompson Book Co., Washington, D.C., 1967.
|
| |
13
|
GIULIANO, V. E , AND JoNas, P E. Study and test of a methodology for laboratory evaluation of message retrieval systems. Rep. ESD-TR-66-405, Arthur D. Little, Cambridge, Mass., 1966.
|
| |
14
|
SrxRc~ JONES, K. Automatic Keyword Classification for Information Retrieval. Butterworth and Co., London, 1971.
|
| |
15
|
STEVENS, M.E. Automatic indexing: A state of the art report. NBS Monograph 91, U.S. Bureau of Standards, Washington, D.C., March 1965.
|
| |
16
|
STEVENS, M E., GIULIANO, V. E., AND HEILPRIN, L.B. Statistical association methods for mechanized documentation. NBS Misc. Pub. 269, U. S. Bureau of Standards, Washington, D.C, Dec. 1965
|
| |
17
|
SWANSON, D.R. Searching natural language text by computer. Science lS$, 3434 (Oct. 21, 1960), 1099-1104.
|
| |
18
|
SWANSON, D.R. Interrogating a computer in natural language. Proc. IFIP Cong. 1962, North-Holland Pubhshing Co., Amsterdam, p. 288-393.
|
| |
19
|
The Principles of Medlars. National Library of Medicine, Bethesda, Md., 1970. Available from Superintendent of Documents, Washington, D.C.
|
| |
20
|
|
| |
21
|
|
| |
22
|
SALTON, G. A comparison between manual and automatic indexing methods. American Documentation 20, 1 (Jan 1969), 61-71
|
| |
23
|
SALTON, G. A new comparison between conventional indexing (Medlars) and automatic text processing (SMART). J.ASIS 28, 2 (March-April 1972), 75-84.
|
| |
24
|
LANCASTER, F.W. Evaluation of the Medlars demand search service. National Library of Medicine, Bethesda, Md., Jan. 1968.
|
| |
25
|
SALTON, G. Search and retrieval experiments in reM-time information retrieval. In Informahon Processvng 68 (Proc IFIP Cong.), North-Holland Publishing Company, Amsterdam, 1969, pp. 1082-1093.
|
| |
26
|
SALTON, G The performance of interactive information retrieval. Information Processing Letters 1, 2 (July 1971), 35-41.
|
| |
27
|
BORKO, H. The construction of an empirically based mathematically derived classificatmn system. Rep. SP-588, System Development Corp., Santa Monica, Calif., Oct. 1961.
|
 |
28
|
|
| |
29
|
DOYLE, L.B. Breaking the cost barrier in automatic classification, Rep. SP-2516, System Develpment Corp,, Santa Monica, Calif , July 1966.
|
 |
30
|
|
| |
31
|
DATTOLA, R.T. Expemments with a fast algorithm for automatic classificatmn. In The SMART Retrieval System--Experiments in Automatic Document Processing, G. Salton, Ed., Prentice-Hall, Englewood Cliffs, N J., 1971
|
| |
32
|
JohNson, D. B., AND LAFUENTE, J.M. A controlled single-pass classification algorithm with application to multi-level clustering. Sci. Rep. ISR-18, See. XII, Dept. of Computer Science, Cornell U., Ithaca, N.Y., Oct 1970.
|
| |
33
|
BONWlT, K., AND ASTE TONSMAN, J. Negative dictmnaries. Sci. Rep. ISR-18, Sec. VI, Dept. of Computer Science, Cornell University, Ithaca, N Y., Oct. 1970.
|
| |
34
|
SALTON, G. Experiments in automatic thesaurus construction for information retrieval. Proc. IFIP Congress 71, Ljubljana, North-Holland Publishing Co., Amsterdam, 1972, pp. 115-123.
|
CITED BY 9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Keishi Tajima , Yoshiaki Mizuuchi , Masatsugu Kitagawa , Katsumi Tanaka, Cut as a querying unit for WWW, Netnews, e-mail, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.235-244, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|