| Positional language models for information retrieval |
| Full text |
Pdf
(776 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Retrieval models II
table of contents
Pages 299-306
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
Yuanhua Lv
|
University of Illinois at Urbana-Champaign, Urbana, IL, USA
|
|
ChengXiang Zhai
|
University of Illinois at Urbana-Champaign, Urbana, IL, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 77, Downloads (12 Months): 234, Citation Count: 0
|
|
|
ABSTRACT
Although many variants of language models have been proposed for information retrieval, there are two related retrieval heuristics remaining "external" to the language modeling approach: (1) proximity heuristic which rewards a document where the matched query terms occur close to each other; (2) passage retrieval which scores a document mainly based on the best matching passage. Existing studies have only attempted to use a standard language model as a "black box" to implement these heuristics, making it hard to optimize the combination parameters. In this paper, we propose a novel positional language model (PLM) which implements both heuristics in a unified language model. The key idea is to define a language model for each position of a document, and score a document based on the scores of its PLMs. The PLM is estimated based on propagated counts of words within a document through a proximity-based density function, which both captures proximity heuristics and achieves an effect of "soft" passage retrieval. We propose and study several representative density functions and several different PLM-based document ranking strategies. Experiment results on standard TREC test collections show that the PLM is effective for passage retrieval and performs better than a state-of-the-art proximity-based retrieval model.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, 1964.
|
| |
2
|
Stefan Buttcher and Charles L. A. Clarke. E±ciency vs. effectiveness in terabyte-scale information retrieval. In Proceedings of TREC '05, 2005.
|
 |
3
|
|
| |
4
|
|
| |
5
|
Charles L. A. Clarke, Gordon V. Cormack, and Forbes J. Burkowski. Shortest substring ranking (multitext experiments for trec-4). In Proceedings of TREC '95, pages 295--304, 1995.
|
 |
6
|
|
| |
7
|
David Hawking and Paul B. Thistlewaite. Proximity operators -- so near and yet so far. In Proceedings of TREC '95, pages 500--236, 1995.
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Koichi Kise, Markus Junker, Andreas Dengel, and Keinosuke Matsumoto. Passage Retrieval Based on Density Distributions of Terms and Its Applications to Document Retrieval and Question Answering, volume 2956 of Lecture Notes in Computer Science. Springer Berlin/Heidelberg, 2004.
|
 |
14
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
Annabelle Mercier and Michel Beigbeder. Fuzzy proximity ranking with boolean queries. In Proceedings of TREC '05, 2005.
|
 |
20
|
|
| |
21
|
Christof Monz. Minimal span weighting retrieval for question answering. In Rob Gaizauskas, Mark Greenwood, and Mark Hepple, editors, SIGIR Workshop on Information Retrieval for Question Answering, pages 23--30, 2004.
|
 |
22
|
|
 |
23
|
|
| |
24
|
Yves Rasolofo and Jacques Savoy. Term proximity scoring for keyword--based retrieval systems. In Proceedings of ECIR '03, pages 207--218, 2003.
|
 |
25
|
Gerard Salton , J. Allan , Chris Buckley, Approaches to passage retrieval in full text information systems, Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, p.49-58, June 27-July 01, 1993, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/160688.160693]
|
 |
26
|
|
| |
27
|
Ruihua Song, Ji-Rong Wen, and Wei-Ying Ma. Viewing term proximity from a different perspective. In Proceedings of ECIR'08, 2008.
|
 |
28
|
|
 |
29
|
Stefanie Tellex , Boris Katz , Jimmy Lin , Aaron Fernandes , Gregory Marton, Quantitative evaluation of passage retrieval algorithms for question answering, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860445]
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
|