ACM Home Page
Please provide us with feedback. Feedback
A 2-poisson model for probabilistic coreference of named entities for improved text retrieval
Full text PdfPdf (663 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Information extraction table of contents
Pages 275-282  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Seung-Hoon Na  National University of Singapore, Singapore, Singapore
Hwee Tou Ng  National University of Singapore, Singapore, Singapore
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571990
What is a DOI?

ABSTRACT

Text retrieval queries frequently contain named entities. The standard approach of term frequency weighting does not work well when estimating the term frequency of a named entity, since anaphoric expressions (like he, she, the movie, etc) are frequently used to refer to named entities in a document, and the use of anaphoric expressions causes the term frequency of named entities to be underestimated. In this paper, we propose a novel 2-Poisson model to estimate the frequency of anaphoric expressions of a named entity, without explicitly resolving the anaphoric expressions. Our key assumption is that the frequency of anaphoric expressions is distributed over named entities in a document according to the probabilities of whether the document is elite for the named entities. This assumption leads us to formulate our proposed Co-referentially Enhanced Entity Frequency (CEEF). Experimental results on the text collection of TREC Blog Track show that CEEF achieves significant and consistent improvements over state-of-the-art retrieval methods using standard term frequency estimation. In particular, we achieve a 3% increase of MAP over the best performing run of TREC 2008 Blog Track.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, M. E. Connell, W. B. Croft, F.-F. Feng, D. Fisher, and X. Li. INQUERY and TREC-9. In TREC-9, pages 551--562, 2000.
 
2
J. Artiles, J. Gonzalo, and S. Sekine. The SemEval-2007 WePS evaluation: Establishing a benchmark for the web people search task. In SemEval-2007, pages 64--69, 2007.
 
3
M. Braschler and C. Peters. CLEF 2002 methodology and metrics. In CLEF-2002 (LNCS 2785), pages 512--528, 2003.
4
 
5
F. Gey, R. Larson, M. Sanderson, K. Bischoff, T. Mandl, C. Womser-Hacker, D. Santos, P. Rocha, G. M. D. Nunzio, and N. Ferro. GeoCLEF 2006: The CLEF 2006 cross-language geographic information retrieval track overview. In CLEF-2006 (LNCS 4730), pages 852--876, 2007.
 
6
R. Guha and A. Garg. Disambiguating people in search. In WWW '04, 2004.
 
7
S. P. Harter. A probabilistic approach to automatic keyword indexing. Part I. On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science,26(4):197--206, 1975.
 
8
Y. Lee, S.-H. Na, J. Kim, S.-H. Nam, H.-Y. Jung, and J.-H. Lee. KLE at TREC 2008 blog track: Blog post and feed retrieval. In TREC 2008, 2008.
 
9
10
 
11
 
12
 
13
I. Ounis, C. Macdonald, and I. Soboroff. Overview of the TREC 2008 blog track. In TREC 2008, 2008.
 
14
 
15
 
16
S. E. Robertson, S. Walker, M. M. Beaulieu,M. Gatford, and A. Payne. Okapi at TREC-4. In TREC-4, pages 73--96, 1995.
17
 
18
 
19
 
20
Y. Versley, S. Ponzetto, M. Poesio, V. Eidelman, A. Jern, J. Smith, X. Yang, and A. Moschitti. BART: A modular toolkit for coreference resolution. In LREC '08, 2008.
 
21
22
23

Collaborative Colleagues:
Seung-Hoon Na: colleagues
Hwee Tou Ng: colleagues