ACM Home Page
Please provide us with feedback. Feedback
Evaluation over thousands of queries
Full text PdfPdf (453 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Singapore, Singapore
SESSION: Evaluation--2 table of contents
Pages 651-658  
Year of Publication: 2008
ISBN:978-1-60558-164-4
Authors
Ben Carterette  University of Massachusetts Amherst, Amherst, MA, USA
Virgil Pavlu  Northeastern University, Boston, MA, USA
Evangelos Kanoulas  Northeastern University, Boston, MA, USA
Javed A. Aslam  Northeastern University, Boston, MA, USA
James Allan  University of Massachusetts Amherst, Amherst, MA, USA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 225,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390334.1390445
What is a DOI?

ABSTRACT

Information retrieval evaluation has typically been performed over several dozen queries, each judged to near-completeness. There has been a great deal of recent work on evaluation over much smaller judgment sets: how to select the best set of documents to judge and how to estimate evaluation measures when few judgments are available. In light of this, it should be possible to evaluate over many more queries without much more total judging effort. The Million Query Track at TREC 2007 used two document selection algorithms to acquire relevance judgments for more than 1,800 queries. We present results of the track, along with deeper analysis: investigating tradeoffs between the number of queries and number of judgments shows that, up to a point, evaluation over more queries with fewer judgments is more cost-effective and as reliable as fewer queries with more judgments. Total assessor effort can be reduced by 95% with no appreciable increase in evaluation errors.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, B. Carterette, J. A. Aslam, V. Pavlu, B. Dachev, and E. Kanoulas. Overview of the TREC 2007 Million Query Track. In Proceedings of TREC, 2007.
 
2
J. A. Aslam and V. Pavlu. A practical sampling strategy for efficient retrieval evaluation, technical report.
 
3
J. A. Aslam and V. Pavlu. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proceedings of ECIR, pages 198--209. 2007.
4
5
6
 
7
R. L. Brennan. Generalizability Theory. Springer-Verlag, New York, 2001.
 
8
K. R. W. Brewer and M. Hanif. Sampling With Unequal Probabilities. Springer, New York, 1983..
9
10
11
12
 
13
C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 Terabyte Track. In Proceedings of TREC, 2004.
 
14
C. L. A. Clarke, F. Scholer, and I. Soboroff. The TREC 2005 terabyte track. In Proceedings of TREC, 2005.
 
15
16
 
17
K. Sparck Jones and C. J. van Rijsbergen. Information retrieval test collections. Journal of Documentation, 32(1):59--75, 1976.
 
18
W. L. Stevens. Sampling without replacement with probability proportional to size. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 20, No. 2. (1958), pp. 393--397
 
19
S. K. Thompson. Sampling. Wiley Series in Probability and Mathematical Statistics, 1992.
20
21

CITED BY  8

Collaborative Colleagues:
Ben Carterette: colleagues
Virgil Pavlu: colleagues
Evangelos Kanoulas: colleagues
Javed A. Aslam: colleagues
James Allan: colleagues