|
ABSTRACT
A component theory of information retrieval using single content terms as component for queries and documents was reviewed and experimented with. The theory has the advantages of being able to (1) bootstrap itself, that is, define initial term weights naturally based on the fact that items are self relevent; (2) make use of within-item term frequencies; (3) account for query-focused and document-focused indexing and retrieval strategies cooperatively; and (4) allow for component-specific feedback if such information is available. Retrieval results with four collections support the effectiveness of all the first three aspects, except for predictive retrieval. At the initial indexing stage, the retrieval theory performed much more consistantly across collections than croft's model and provided results comparable to Salton's tf*idf approach. An inverse collection term frequency (ICTF) formula was also tested that performed much better than the inverse document frequency (IDF). With full feedback retrospective retrieval, the component theory performed substantially better than Croft's, because of the highly specific nature of document-focused feedback. Repetitive retireval results with partial relevance feedback mirrored those for the retrospective. However, for the important case of predictive retrieval using residual ranking, results were not unequivocal.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
BELKIN, N. J., ODDY, R. N., AND BROOKS, H.M. ASK for information retrieval: Part I. J. Doc. 38 (1982), 61-71.
|
| |
2
|
BELKIN, N. J., ODDY, R. N., AND BROOKS, H.M. ASK for information retrieval: Part II. J. Doc. 38 (1982), 145-164.
|
 |
3
|
|
| |
4
|
BOOKSTEIN, A., AND SWANSON, D.R. A decision theoretic foundation for indexing. J. ASIS. 26 (1975), 45-50.
|
 |
5
|
|
| |
6
|
CROFT, W.B. Experiments with representation in a document retrieval system. Inf. Tech. R&D. 2 (1983), 1-21.
|
| |
7
|
CROFT, W. B., AND HARPER, D.J. Using probabilistic models of document retrieval without relevance information. J. Doc. 35 (1979), 285-295.
|
| |
8
|
|
| |
9
|
|
| |
10
|
DUDA, R. O., AND HART, P.E. Pattern Classification and Scene Analysis. Wiley-Interscience, New York, 1973.
|
| |
11
|
FAGAN, J. Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. Ph.D. thesis, Dept. of Computer Science, Cornell Univ., Ithaca, N.Y.
|
| |
12
|
FEINMAN, R. D., AND KWOK, K. L. Automatic classification of scientific documents by means of self-generated groups employing free language. J. ASIS. 24 (1973), 382-396.
|
| |
13
|
|
 |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
HARPER, D. J., AND VAN RIJSBERGEN, C.J. An evaluation of feedback in document retrieval using co-occurrence data. J. Doc. 34 (1978), 189-216.
|
| |
18
|
HARTER, S.P. A probabilistic approach to automatic keyword indexing. Part 2: An algorithm for probabilistic indexing. J. ASIS. 26 (1975), 280-289.
|
| |
19
|
KATZER, J., MCGILL, M. J., TESSIER, J. A., FRAKES, W., AND DASGUPTA, P. A study of the overlap among document representations. Inf. Tech. 1 (1982), 261-274.
|
| |
20
|
KEMP, D. A. Relevance, pertinence and information system development. Inf. Storage Retrieval 10 (1974), 37-47.
|
| |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
KWOK, K.L. Application of neural network to information retrieval. In Proceeding of the International Joint Conference on Neural Networks (Washington D.C. Jan. 1990). M. Caudill, Ed., Erlbaum, N.J., 1990, pp. 623-626.
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
ROBERTSON, S.E. The probability ranking principle in IR. J. Doc. 33 (1977), 294-304.
|
| |
29
|
ROBER'rSON, S. E., MARON, M. E., AND COOPER, W.S. Probability of relevance: A unification o\f two competing models for document retrieval. Inf. Tech. R&D. 1 (1982), 1-21.
|
| |
30
|
ROBERTSON, S. E., AND SPARCK JONES, K. Relevance weighting of search terms. J. ASIS. 27 (1976), 129-146.
|
| |
31
|
|
| |
32
|
|
| |
33
|
SALTON, G., Wu, H., AND Yu, C. W. The measurement of term importance in automatic indexing. J. ASIS 32 (1981), 175-186.
|
| |
34
|
SARACEVm, T. Relevance: A review of and a framework for the thinking on the notion in information science. J. ASIS. 26 (19'75), 321-343.
|
| |
35
|
SMEATON, A. F., AND VAN RIJSBERGEN, C.J. The retrieval effects of query expansion on a feedback document retrieval system. Computer J. 26 (1983), 239-246.
|
| |
36
|
SPARCK JONES, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 8 (1972), 11-21.
|
| |
37
|
SPARCK JONES, K. Experiments in rei.evance weighting of search terms. Inf. Proc. Manage. 15 (1979), 133-144.
|
| |
38
|
VAN RIJSBERGEN, C.J. A theoretical basis for the use of co-occurrence data in information retrieval. J. Doc. 33 (1977), 106-119.
|
| |
39
|
|
| |
40
|
Yu, C. T., BUCKLEY, C., LAM, K., AND SALTON, G. A generalized term dependence model in information retrieval. Inf. Tech. R&I). 2 (1983), 129-154.
|
 |
41
|
|
 |
42
|
|
CITED BY 18
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hinrich Schütze , David A. Hull , Jan O. Pedersen, A comparison of classifiers and document representations for the routing problem, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.229-237, July 09-13, 1995, Seattle, Washington, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
REVIEW
"Richard S. Marcus : Reviewer"
The author reviews his prior efforts in viewing documents for
indexing and retrieval purposes as being made of components such as
paragraphs or sentences, rather than simply as sets of terms, which is
how most representatives of the statistica
more...
|