|
ABSTRACT
The proximity of query terms in a document is a very important information to enable ranking models go beyond the "bag of word" assumption in information retrieval. This paper studies the integration of term proximity information into the unigram language modeling. A new proximity language model (PLM) is proposed which views query terms' proximity centrality as the Dirichlet hyper-parameter that weights the parameters of the unigram document language model. Several forms of proximity measure are developed to be used in PLM which could compute a query term's proximate centrality in a specific document. In experiments, the proximity language model is compared with the basic language model and previous works that combine the proximity information with language model using linear score combination. The experiment results show that the proposed model performs better in both top precision and average precision.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Bai, Y. Chang, H. Cui, Z. Zheng, G. Sun, and X. Li. Investigation of partial query proximity in web search. 2008.
|
| |
2
|
|
| |
3
|
S. Buttcher and C. Clarke. Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval. Proceedings of the 14th Text Retrieval Conference (Gaithersburg, USA, November 2005).
|
| |
4
|
|
 |
5
|
|
| |
6
|
W. Croft. Boolean queries and term dependencies in probabilistic retrieval models. Journal of the American Society for Information Science, 37(2):71--77, 1986.
|
 |
7
|
W. Bruce Croft , Howard R. Turtle , David D. Lewis, The use of phrases and structured queries in information retrieval, Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, p.32-45, October 13-16, 1991, Chicago, Illinois, United States
[doi> 10.1145/122860.122864]
|
| |
8
|
J. Fagan. Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods. 1987.
|
| |
9
|
T. Ferguson. A Bayesian analysis of some nonparametric problems. Ann. Statist, 1(2):209--230, 1973.
|
 |
10
|
|
| |
11
|
D. Hawking and P. Thistlewaite. Proximity operators-So near and yet so far. Proceedings of the 4th Text Retrieval Conference, pages 131--143, 1995.
|
| |
12
|
William Hersh , Chris Buckley , T. J. Leone , David Hickam, OHSUMED: an interactive retrieval evaluation and new large test collection for research, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, p.192-201, July 03-06, 1994, Dublin, Ireland
|
| |
13
|
K. Jones, S. Walker, and S. Robertson. A Probabilistic Model of Information Retrieval: Development and Status. University of Cambridge, Computer Laboratory, 1998.
|
 |
14
|
|
 |
15
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
 |
16
|
|
 |
17
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
| |
18
|
M. Mitra, C. Buckley, A. Singhal, and C. Cardie. An analysis of statistical and syntactic phrases. Proceedings of RIAO-97, 5th International Conference "Recherche d'Information Assistee par Ordinateur, pages 200--214, 1997.
|
 |
19
|
|
| |
20
|
P. Ogilvie and J. Callan. Experiments Using the Lemur Toolkit. NIST Special Publication SP, pages 103--108, 2002.
|
| |
21
|
J. Ponte and W. Croft. A language modeling approach to information retrieval. ACM New York, NY, USA, 1998.
|
| |
22
|
Y. Rasolofo and J. Savoy. Term Proximity Scoring for Keyword-Based Retrieval Systems. Lecture Notes in Computer Science, pages 207--218, 2003.
|
| |
23
|
S. Robertson, S. Jones, et al. Relevance Weighting of Search Terms. Journal of the American Society for Information Science, 27(3):129--46, 1976.
|
| |
24
|
S. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. NIST Special Publication SP, pages 253--264, 1999.
|
 |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
C. Yu, C. Buckley, K. Lam, and G. Salton. A Generalized Term Dependence Model in Information Retrieval. Information technology: research and development, 2(4):129--154, 1983.
|
 |
29
|
|
|