ACM Home Page
Please provide us with feedback. Feedback
A critical investigation of recall and precision as measures of retrieval system performance
Full text PdfPdf (1.68 MB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 7 ,  Issue 3  (July 1989) table of contents
Pages: 205 - 229  
Year of Publication: 1989
ISSN:1046-8188
Authors
Vijay Raghavan  Univ. of Southwestern Louisiana, Lafayette
Peter Bollmann  Technische Univ. Berlin, Berlin, W. Germany
Gwang S. Jung  Technische Univ. Berlin, Berlin, W. Germany
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 157,   Citation Count: 32
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/65943.65945
What is a DOI?

ABSTRACT

Recall and precision are often used to evaluate the effectiveness of information retrieval systems. They are easy to define if there is a single query and if the retrieval result generated for the query is a linear ordering. However, when the retrieval results are weakly ordered, in the sense that several documents have an identical retrieval status value with respect to a query, some probabilistic notion of precision has to be introduced. Relevance probability, expected precision, and so forth, are some alternatives mentioned in the literature for this purpose. Furthermore, when many queries are to be evaluated and the retrieval results averaged over these queries, some method of interpolation of precision values at certain preselected recall levels is needed. The currently popular approaches for handling both a weak ordering and interpolation are found to be inconsistent, and the results obtained are not easy to interpret. Moreover, in cases where some alternatives are available, no comparative analysis that would facilitate the selection of a particular strategy has been provided. In this paper, we systematically investigate the various problems and issues associated with the use of recall and precision as measures of retrieval system performance. Our motivation is to provide a comparative analysis of methods available for defining precision in a probabilistic sense and to promote a better understanding of the various issues involved in retrieval performance evaluation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
BOLLMANN, P. A comparison of evaluation measures for document retrieval Systems. J. Informatics 1 (1977), 97-116.
 
2
 
3
4
 
5
BOLLMANN, P., RAGHAVAN, V. V., JUNG, G. S., AND SHU, L. Probabiity of relevance and expected precision in evaluating retrieval performance. In preparation.
 
6
BOOKSTEIN, A., AND COOPER, W.S. A general mathematical model for information retrieval systems. Libr. Quarterly 46 (1976), 153-157.
 
7
 
8
CHERNIAVSKY, V. S., AND LAKHUTY, D.G. Problem of evaluating retrieval systems. I. Naucho- Techniceskaya Informazia, Ser. 2, pp. 24-30 (in Russian). In English: Automatic Documentation and Mathematical Linguistics 4 (1970), pp. 9-26.
 
9
CLEVERDON, C. W. Evaluation of tests of information retrieval systems. J. Doc. 26 (1970), 55-67.
 
10
CLEVERDON, C. W. On the inverse relationship of recall and precision. J. Doc. 28 (1972), 195-201.
 
11
COOPER, W.S. Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. Am. Doc. 19 (1968), 30-41.
 
12
COOPER, W.S. On selecting a measure of retrieval effectiveness. Part II. Implementation of the philosophy. J. Am. Soc. Inf. Sci. 24 (Nov./Dec. 1973), 413-424.
 
13
COOPER, W.S. On selecting a measure of retrieval effectiveness. J. Am. Soc. Inf. Sci. 24 (1973), 87-100.
 
14
HEINE, M. H. Distance between sets as an objective measure of retrieval effectiveness. Inf. Storage and Retrieval 9 (1973), 181-198.
 
15
HEINE, M.H. The inverse relationship of precision and recall in terms of the Swets model. J. Doc. 29 (1973), 81-84.
 
16
HOEL, P.G. Introduction to Mathematical Statistics, 4th ed. Wiley, New York, 1971.
 
17
KRAFT, D. H., AND BOOKSTEIN, A. Evaluation of information retrieval systems: A decision theory approach. J. Am. Soc. Inf. Sci. 29 (1978), 31-40.
 
18
KRAFT, D. H., AND LEE, T. Stopping rules and their effect on expected search length. Inf. Process. Manage. I5 (1979), 47-58.
 
19
ROBERTSON, S. E. The parametric description of retrieval tests. Part II: Overall measures. J. Doc. 25 (1969), 93-107.
 
20
SALTON, G. Evaluation problems in interactive information retrieval. Inf. Storage and Retrieval 6 (1970), 29-44.
 
21
22
 
23
24
 
25
 
26
SALTON, G., AND YANG, S. G. On the specification of term values in automatic indexing. J. Doc. 29, 4 (1973), 351-372.
 
27
SALTON, G., YANO, C. S., AND Yu, C.T. Contribution to the theory of indexing, information Processing 74, North-Holland, Amsterdam, The Netherlands, 1974, pp. 584-590.
 
28
SPARCK JONES, K. Performance averaging for recall and precision. J. Informat&s 2 (1978), 95-105.
 
29
SUPPES, P. Introduction to Log&. Van Nostrand, New York, 1957.
 
30
SWETS, J.A. Effectiveness of information retrieval methods. Am. Doc. 20 (1969), 72-89.
 
31
VAN RIJSBERGEN, C.J. Foundations of evaluation. J. Doc. 30 (1974), 365-373.
 
32
33
 
34
Yu, C. T., AND RA(~HAVAN, V.V. A single-pass method for determining the se nantic relationship between terms. J. Am. Soc. Inf. Sci. 28 (1977), 345-354.

CITED BY  32


REVIEW

"Dagobert Soergel : Reviewer"

This paper deals with the problem of defining performance measures under two conditions: (1) the retrieval system produces a list of items which is weakly rank-ordered by some relevance coefficient, and (2) performance is to be eva  more...

Collaborative Colleagues:
Vijay Raghavan: colleagues
Peter Bollmann: colleagues
Gwang S. Jung: colleagues