| Statistical precision of information retrieval evaluation |
| Full text |
Pdf
(213 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Seattle, Washington, USA
SESSION: Evaluation 2
table of contents
Pages: 533 - 540
Year of Publication: 2006
ISBN:1-59593-369-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 18, Downloads (12 Months): 108, Citation Count: 12
|
|
|
ABSTRACT
We introduce and validate bootstrap techniques to compute confidence intervals that quantify the effect of test-collection variability on average precision (AP) and mean average precision (MAP) IR effectiveness measures. We consider the test collection in IR evaluation to be a representative of a population of materially similar collections, whose documents are drawn from an infinite pool with similar characteristics. Our model accurately predicts the degree of concordance between system results on randomly selected halves of the TREC-6 ad hoc corpus. We advance a framework for statistical evaluation that uses the same general framework to model other sources of chance variation as a source of input for meta-analysis techniques.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Efron, B., and Tsibirani, R. J. An Introduction to the Bootstrap. Chapman and Hall, New York, 1994.
|
| |
4
|
Fisher, R. A. Theory of statistical estimation. Proceedings of the Cambridge Philosophical Society 22 (1925), 700--725.
|
| |
5
|
Glass, G. V. Meta-analysis at 25. http://glass.ed.asu.edu/gene/papers/meta25.html, 2000.
|
 |
6
|
|
| |
7
|
Lenhard, J. Models and statistical inference: The controversy between Fisher and Neyman-Pearson. British Journal for the Philosophy of Science (2006).
|
| |
8
|
Rothman, K. J., and Greenland, S. Modern Epidemiology. Lippincott Williams & Wilkins, 1998.
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Tague-Sutcliffe, J., and Blustein, J. A statistical analysis of the TREC-3 data. In Proceedings of TREC-3, The Third Information Retrieval Conference (1994), pp. 385--398.
|
| |
14
|
Voorhees, E., and Harman, D. Overview of the Sixth Text REtrieval Conference (TREC-6). In 6th Text REtrieval Conference (Gaithersburg, MD, 1997).
|
 |
15
|
|
| |
16
|
Voorhees, E. M. Overview of the TREC-2004 robust track. In 13th Text REtrieval Conference (Gaithersburg, MD, 2004).
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
CITED BY 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ryan T. K. Lin , Justin Liang-Te Chiu , Hong-Jie Dai , Richard Tzong-Han Tsai , Min-Yuh Day , Wen-Lian Hsu, A supervised learning approach to biological question answering, Integrated Computer-Aided Engineering, v.16 n.3, p.271-281, August 2009
|
|