| Evaluation over thousands of queries |
| Full text |
Pdf
(453 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Singapore, Singapore
SESSION: Evaluation--2
table of contents
Pages 651-658
Year of Publication: 2008
ISBN:978-1-60558-164-4
|
|
Authors
|
|
Ben Carterette
|
University of Massachusetts Amherst, Amherst, MA, USA
|
|
Virgil Pavlu
|
Northeastern University, Boston, MA, USA
|
|
Evangelos Kanoulas
|
Northeastern University, Boston, MA, USA
|
|
Javed A. Aslam
|
Northeastern University, Boston, MA, USA
|
|
James Allan
|
University of Massachusetts Amherst, Amherst, MA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 26, Downloads (12 Months): 297, Citation Count: 2
|
|
|
ABSTRACT
Information retrieval evaluation has typically been performed over several dozen queries, each judged to near-completeness. There has been a great deal of recent work on evaluation over much smaller judgment sets: how to select the best set of documents to judge and how to estimate evaluation measures when few judgments are available. In light of this, it should be possible to evaluate over many more queries without much more total judging effort. The Million Query Track at TREC 2007 used two document selection algorithms to acquire relevance judgments for more than 1,800 queries. We present results of the track, along with deeper analysis: investigating tradeoffs between the number of queries and number of judgments shows that, up to a point, evaluation over more queries with fewer judgments is more cost-effective and as reliable as fewer queries with more judgments. Total assessor effort can be reduced by 95% with no appreciable increase in evaluation errors.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, B. Carterette, J. A. Aslam, V. Pavlu, B. Dachev, and E. Kanoulas. Overview of the TREC 2007 Million Query Track. In Proceedings of TREC, 2007.
|
| |
2
|
J. A. Aslam and V. Pavlu. A practical sampling strategy for efficient retrieval evaluation, technical report.
|
| |
3
|
J. A. Aslam and V. Pavlu. Query hardness estimation using Jensen-Shannon divergence among multiple scoring functions. In Proceedings of ECIR, pages 198--209. 2007.
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
R. L. Brennan. Generalizability Theory. Springer-Verlag, New York, 2001.
|
| |
8
|
K. R. W. Brewer and M. Hanif. Sampling With Unequal Probabilities. Springer, New York, 1983..
|
 |
9
|
Chris Buckley , Darrin Dimmick , Ian Soboroff , Ellen Voorhees, Bias and the limits of pooling, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148284]
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
C. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2004 Terabyte Track. In Proceedings of TREC, 2004.
|
| |
14
|
C. L. A. Clarke, F. Scholer, and I. Soboroff. The TREC 2005 terabyte track. In Proceedings of TREC, 2005.
|
| |
15
|
|
 |
16
|
|
| |
17
|
K. Sparck Jones and C. J. van Rijsbergen. Information retrieval test collections. Journal of Documentation, 32(1):59--75, 1976.
|
| |
18
|
W. L. Stevens. Sampling without replacement with probability proportional to size. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 20, No. 2. (1958), pp. 393--397
|
| |
19
|
S. K. Thompson. Sampling. Wiley Series in Probability and Mathematical Statistics, 1992.
|
 |
20
|
|
 |
21
|
|
|