ACM Home Page
Please provide us with feedback. Feedback
Cumulated gain-based evaluation of IR techniques
Full text PdfPdf (440 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 20 ,  Issue 4  (October 2002) table of contents
Pages: 422 - 446  
Year of Publication: 2002
ISSN:1046-8188
Authors
Kalervo Järvelin  University of Tampere, Finland
Jaana Kekäläinen  University of Tampere, Finland
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 103,   Downloads (12 Months): 515,   Citation Count: 105
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/582415.582418
What is a DOI?

ABSTRACT

Modern large retrieval environments tend to overwhelm their users by their large output. Since all documents are not of equal relevance to their users, highly relevant documents should be identified and ranked first for presentation. In order to develop IR techniques in this direction, it is necessary to develop evaluation approaches and methods that credit IR methods for their ability to retrieve highly relevant documents. This can be done by extending traditional evaluation methods, that is, recall and precision based on binary relevance judgments, to graded relevance judgments. Alternatively, novel measures based on graded relevance judgments may be developed. This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position. The first one accumulates the relevance scores of retrieved documents along the ranked result list. The second one is similar but applies a discount factor to the relevance scores in order to devaluate late-retrieved documents. The third one computes the relative-to-the-ideal performance of IR techniques, based on the cumulative gain they are able to yield. These novel measures are defined and discussed and their use is demonstrated in a case study using TREC data: sample system run results for 20 queries in TREC-7. As a relevance base we used novel graded relevance judgments on a four-point scale. The test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences. The graphs based on the measures also provide insight into the performance IR techniques and allow interpretation, for example, from the user point of view.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Borlund, P. 2000. Evaluation of interactive information retrieval systems. PhD Dissertation. Åbo University Press.
3
 
4
Conover, W. J. 1980. Practical Nonparametric Statistics, 2nd ed., Wiley, New York.
 
5
Cooper, W. S. 1968. Expected search length: A single measure of retrieval effectiveness based on weak ordering action of retrieval systems. J. Am. Soc. Inf. Sci. 19, 1, 30--41.
 
6
7
8
9
 
10
 
11
 
12
Kekäläinen, J. and Järvelin, K. 2002b. Evaluating information retrieval systems under the challenges of interaction and multidimensional dynamic relevance. In Proceedings of the CoLIS 4 Conference, H. Bruce, R. Fidel, P. Ingwersen, AND P. Vakkari, Eds., Libraries Unlimited: Greenwood Village, Colo., 253--270.
 
13
 
14
 
15
 
16
Pollack, S. M. 1968. Measures for the comparison of information retrieval systems. Am. Doc. 19, 4, 387--397.
 
17
Over, P. 1999. TREC-7 interactive track report {On-line}. Available at http://trec.nist.gov/pubs/trec7/papers/t7irep.pdf.gz. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7).
 
18
Robertson, S. E. and Belkin, N. J. 1978. Ranking in principle. J. Doc. 34, 2, 93--100.
 
19
Rocchio, J. J., Jr. 1966. Document retrieval systems---Optimization and evaluation. PhD Dissertation. Harvard Computation Laboratory, Harvard University.
20
 
21
 
22
Saracevic, T. Kantor, P. Chamis, A., and Trivison, D. 1988. A study of information seeking and retrieving. I. Background and methodology. J. Am. Soc. Inf. Sci. 39, 3, 161--176.
 
23
Sormunen, E. 2000. A method for measuring wide range performance of Boolean queries in full-text databases {On-line}. Available at http://acta.uta.fi/pdf/951-44-4732-8.pdf. PhD Dissertation. Department of Information Studies, University of Tampere.
 
24
25
 
26
Sparck-Jones, K. 1974. Automatic indexing. J. Doc. 30, 393--432.
 
27
 
28
 
29
Trec Homepage 2001. Data---English relevance judgements {On-line}. Available at http://trec.nist.gov/data/reljudge_eng.html.
 
30
Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. J. Doc. 56, 540--562.
31
 
32
Voorhees, E. and Harman, D. 1999. Overview of the Seventh Text REtrieval Conference (TREC-7) {On-line}. Available at http://trec.nist.gov/pubs/trec7/papers/overview7.pdf.gz. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7).
33

CITED BY  106

Collaborative Colleagues:
Kalervo Järvelin: colleagues
Jaana Kekäläinen: colleagues