|
ABSTRACT
Recent work in supervised learning of term-based retrieval models has shown significantly improved accuracy can often be achieved via better model estimation. In this paper, we show retrieval accuracy with Metzler and Croft's Markov random field (MRF) approach can be similarly improved via supervised learning. While the original MRF method estimates a parameter for each of its three feature classes from data, parameters within each class are set via a uniform weighting scheme adopted from the standard unigram. We conjecture greater MRF retrieval accuracy should be possible by better estimating within-class parameters, particularly for verbose queries employing natural language terms. Retrieval experiments with these queries on three TREC document collections show our improved MRF consistently out-performs both the original MRF and supervised unigram baselines. Additional experiments using blind-feedback and evaluation with optimal weighting demonstrate both the immediate value and further potential of our method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, M. Connell, W.B. Croft, F. Feng, D. Fisher, and X. Li. INQUERY and TREC-9. In Proc. of TREC-9, pages 551--562, 2000.
|
 |
2
|
|
| |
3
|
T. Brants and A. Franz. Web 1T 5-gram v1, LDC Catalog No. LDC2006T13, 2006.
|
| |
4
|
|
 |
5
|
Keke Cai , Chun Chen , Kangmiao Liu , Jiajun Bu , Peng Huang, MRF based approach for sentence retrieval, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1277741.1277913]
|
| |
6
|
|
 |
7
|
|
| |
8
|
D. Graff, J. Kong, K. Chen, and K. Maeda. English Gigaword. Linguistic Data Consortium catalog number LDC2005T12, 2005.
|
 |
9
|
|
| |
10
|
G. Kumaran and J. Allan. A Case for Shorter Queries, and Helping Users Create Them. In Proceedings of NAACL HLT, pages 220--227, 2007.
|
 |
11
|
|
 |
12
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
| |
13
|
J. Lafferty and C. Zhai. Probabilistic Relevance Models Based on Document and Query Generation. Language Modeling for Information Retrieval, pages 1--10, 2003.
|
| |
14
|
|
 |
15
|
|
| |
16
|
M. Lease. Brown at TREC'08 Relevance Feedback Track. In Proc. of the 17th Text Retrieval Conference (TREC), 2008.
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
G. Mishne and M. de Rijke. Boosting web retrieval through query operations. In Proc. of ECIR, 2005.
|
| |
23
|
|
 |
24
|
|
| |
25
|
M. Porter. The Porter Stemming Algorithm. Accessible at http://www. tartarus. org/martin/PorterStemmer.
|
| |
26
|
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
 |
30
|
|
| |
31
|
T. Strohman, D. Metzler, H. Turtle, and W.B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis, 2004.
|
 |
32
|
|
|