|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ABSTRACT
Text retrieval queries frequently contain named entities. The standard approach of term frequency weighting does not work well when estimating the term frequency of a named entity, since anaphoric expressions (like he, she, the movie, etc) are frequently used to refer to named entities in a document, and the use of anaphoric expressions causes the term frequency of named entities to be underestimated. In this paper, we propose a novel 2-Poisson model to estimate the frequency of anaphoric expressions of a named entity, without explicitly resolving the anaphoric expressions. Our key assumption is that the frequency of anaphoric expressions is distributed over named entities in a document according to the probabilities of whether the document is elite for the named entities. This assumption leads us to formulate our proposed Co-referentially Enhanced Entity Frequency (CEEF). Experimental results on the text collection of TREC Blog Track show that CEEF achieves significant and consistent improvements over state-of-the-art retrieval methods using standard term frequency estimation. In particular, we achieve a 3% increase of MAP over the best performing run of TREC 2008 Blog Track. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
Additional Classification:
General Terms:
Keywords:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||