ACM Home Page
Please provide us with feedback. Feedback
Positional language models for information retrieval
Full text PdfPdf (776 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Retrieval models II table of contents
Pages 299-306  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
Yuanhua Lv  University of Illinois at Urbana-Champaign, Urbana, IL, USA
ChengXiang Zhai  University of Illinois at Urbana-Champaign, Urbana, IL, USA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 77,   Downloads (12 Months): 234,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571994
What is a DOI?

ABSTRACT

Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining "external" to the language modeling approach: (1) proximity heuristic which rewards a document where the matched query terms occur close to each other; (2) passage retrieval which scores a document mainly based on the best matching passage. Existing studies have only attempted to use a standard language model as a "black box" to implement these heuristics, making it hard to optimize the combination parameters.

In this paper, we propose a novel positional language model (PLM) which implements both heuristics in a unified language model. The key idea is to define a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an effect of "soft" passage retrieval. We propose and study several representative density functions and several different PLM-based document ranking strategies. Experiment results on standard TREC test collections show that the PLM is effective for passage retrieval and performs better than a state-of-the-art proximity-based retrieval model.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, 1964.
 
2
Stefan Buttcher and Charles L. A. Clarke. E±ciency vs. effectiveness in terabyte-scale information retrieval. In Proceedings of TREC '05, 2005.
3
 
4
 
5
Charles L. A. Clarke, Gordon V. Cormack, and Forbes J. Burkowski. Shortest substring ranking (multitext experiments for trec-4). In Proceedings of TREC '95, pages 295--304, 1995.
6
 
7
David Hawking and Paul B. Thistlewaite. Proximity operators -- so near and yet so far. In Proceedings of TREC '95, pages 500--236, 1995.
8
 
9
10
 
11
 
12
 
13
Koichi Kise, Markus Junker, Andreas Dengel, and Keinosuke Matsumoto. Passage Retrieval Based on Density Distributions of Terms and Its Applications to Document Retrieval and Question Answering, volume 2956 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2004.
14
15
16
17
 
18
 
19
Annabelle Mercier and Michel Beigbeder. Fuzzy proximity ranking with boolean queries. In Proceedings of TREC '05, 2005.
20
 
21
Christof Monz. Minimal span weighting retrieval for question answering. In Rob Gaizauskas, Mark Greenwood, and Mark Hepple, editors, SIGIR Workshop on Information Retrieval for Question Answering, pages 23--30, 2004.
22
23
 
24
Yves Rasolofo and Jacques Savoy. Term proximity scoring for keyword--based retrieval systems. In Proceedings of ECIR '03, pages 207--218, 2003.
25
26
 
27
Ruihua Song, Ji-Rong Wen, and Wei-Ying Ma. Viewing term proximity from a different perspective. In Proceedings of ECIR'08, 2008.
28
29
 
30
31
32

Collaborative Colleagues:
Yuanhua Lv: colleagues
ChengXiang Zhai: colleagues