| Where to stop reading a ranked list?: threshold optimization using truncated score distributions |
| Full text |
Pdf
(476 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Evaluation and measurement II
table of contents
Pages 524-531
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 47, Downloads (12 Months): 163, Citation Count: 0
|
|
|
ABSTRACT
Ranked retrieval has a particular disadvantage in comparison with traditional Boolean retrieval: there is no clear cut-off point where to stop consulting results. This is a serious problem in some setups. We investigate and further develop methods to select the rank cut-off value which optimizes a given effectiveness measure. Assuming no other input than a system's output for a query--document scores and their distribution--the task is essentially a score-distributional threshold optimization problem. The recent trend in modeling score distributions is to use a normal-exponential mixture: normal for relevant, and exponential for non-relevant document scores. We discuss the two main theoretical problems with the current model, support incompatibility and non-convexity, and develop new models that address them. The main contributions of the paper are two truncated normal-exponential models, varying in the way the out-truncated score ranges are handled. We conduct a range of experiments using the TREC 2007 and 2008 Legal Track data, and show that the truncated models lead to significantly better results.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Avi Arampatzis. Unbiased s-d threshold optimization, initial query degradation, decay, and incrementality, for adaptive document filtering. In Proceedings TREC 2001. NIST, 2002.
|
| |
2
|
Avi Arampatzis and Jaap Kamps. Where to stop reading a ranked list? In Proceedings TREC 2008. NIST, 2009.
|
 |
3
|
|
| |
4
|
A. Arampatzis, J. Beney, C.H.A. Koster, and T.P. van der Weide. Incrementality, half-life, and threshold optimization for adaptive document filtering. In Proceedings TREC 2000. NIST, 2001.
|
 |
5
|
|
| |
6
|
A. Bookstein. When the most "pertinent" document should not be retrieved--an analysis of the Swets model. Information Processing and Management, 13(6): 377--383, 1977.
|
| |
7
|
K. Collins-Thompson, P. Ogilvie, Y. Zhang, and J. Callan. Information filtering, novelty detection, and named-page finding. In Proceedings TREC 2002. NIST, 2003.
|
| |
8
|
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society}, 39(1): 1--38, 1977.
|
| |
9
|
N.L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions, volume 1. Wiley, 2nd edition, 1994.
|
| |
10
|
D.D. Lewis. Applying support vector machines to the TREC-2001 batch filtering and routing tasks. In Proceedings TREC 2001, pages 286--292. NIST, 2002.
|
 |
11
|
|
| |
12
|
D.W. Oard, B. Hedin, S. Tomlinson, and J.R. Baron. Overview of the TREC legal track. In Proceedings TREC 2008. NIST, 2009.
|
| |
13
|
A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw--Hill, 2nd edition, 1984.
|
| |
14
|
S. Robertson. On score distributions and relevance. In Proceedings of 29th European Conference on IR Research, ECIR'07, pages 40--51. Springer, Berlin, 2007.
|
| |
15
|
S. Robertson and J. Callan. Routing and filtering. In E.M. Voorhees and D.K. Harman, editors, TREC: Experiment and Evaluation in Information Retrieval, chapter 5, pages 99--121. MIT Press, 2005.
|
| |
16
|
S.E. Robertson. The parametric description of retrieval tests. part 1: The basic parameters. Journal of Documentation}, 25(1):1--27, 1969.
|
 |
17
|
|
| |
18
|
J.A. Swets. Information retrieval systems. Science, 141(3577):245--250, 1963.
|
| |
19
|
J.A. Swets. Effectiveness of information retrieval methods. American Documentation, 20:72--89, 1969.
|
 |
20
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Information filtering
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models
General Terms:
Experimentation,
Performance,
Theory
Keywords:
distributed retrieval,
effectiveness measure optimization,
expectation maximization,
filtering,
fusion,
meta-search,
probability of relevance,
score distribution,
score normalization,
threshold optimization,
truncated distribution
|