|
ABSTRACT
This paper presents a new dependence language modeling approach to information retrieval. The approach extends the basic language modeling approach based on unigram by relaxing the independence assumption. We integrate the linkage of a query as a hidden variable, which expresses the term dependencies within the query as an acyclic, planar, undirected graph. We then assume that a query is generated from a document in two stages: the linkage is generated first, and then each term is generated in turn depending on other related terms according to the linkage. We also present a smoothing method for model parameter estimation and an approach to learning the linkage of a sentence in an unsupervised manner. The new approach is compared to the classical probabilistic retrieval model and the previously proposed language models with and without taking into account term dependencies. Results show that our model achieves substantial and significant improvements on TREC collections.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Chelba, Ciprian and Frederick Jelinek. 2000. Structured Language Modeling. In: Computer Speech and Language, Vol. 14, No. 4. pp 283--332.
|
| |
4
|
Chelba, C, D. Engle, F. Jelinek, V. Jimenez, S. Khudanpur, L. Mangu, H. Printz, E. S. Ristad, R. Rosenfeld, A. Stolcke and D. Wu. 1997. Structure and performance of a dependency language model. In: Processing of Eurospeech, Vol. 5, pp 2775--2778.
|
| |
5
|
|
 |
6
|
|
| |
7
|
Croft, W. B. 1986. Boolean queries and term dependencies in probabilistic retrieval models. In: JASIS, 37(2): 71--77.
|
| |
8
|
Stephen D Pietra , Vincent D Pietra , John Gillett , John Lafferty , Harry Printz , Lubos Ures, Inference and Estimation of a Long-Range Trigram Model, Carnegie Mellon University, Pittsburgh, PA, 1994
|
| |
9
|
|
| |
10
|
Harper, D. J. and C. J. van Rijsbergen. 1978. An evaluation of feedback in document retrieval using co-occurrence data. In: Journal of Documentation, 34: 189--216.
|
 |
11
|
Jianfeng Gao , Ming Zhou , Jian-Yun Nie , Hongzhao He , Weijun Chen, Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564409]
|
| |
12
|
|
| |
13
|
Harman, D. K. 1995. Overview of the fourth Text REtrieval Conference (TREC-4). In: TREC-4, pp 1--24.
|
| |
14
|
|
| |
15
|
Katz, S. M. 1987. Estimation of probabilities from sparse data for other language component of a speech recognizer. In: IEEE transactions on Acoustics, Speech and Signal Processing, 35(3): 400--401.
|
| |
16
|
|
| |
17
|
|
| |
18
|
Jones, K. S., S. Walker and S. Robertson. 1998. A probabilistic model of information retrieval: development and status. Technical Report TR-446, Cambridge University Computer Laboratory.
|
| |
19
|
Katz, S. M. 1987. Estimation of probabilities from sparse data for other language component of a speech recognizer. In: IEEE transactions on Acoustics, Speech and Signal Processing, 35(3): 400--401.
|
| |
20
|
Lafferty, J., Sleator, D. and Temperley, D. 1992. Grammatical trigrams: a probabilistic model of link grammar. In: Proc. of the 1992 AAAI Fall Symposium on Probabilistic Approaches to Natural Language.
|
 |
21
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
 |
22
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
 |
23
|
|
 |
24
|
|
| |
25
|
|
| |
26
|
Robertson, S. E. and Walker, S. 2000. Microsoft Cambridge at TREC-9: Filtering track. In: TREC-9, pp. 361--368.
|
 |
27
|
|
| |
28
|
Sparck Jones, K. 1998. What is the role of NLP in text retrieval? In: Naturnal language information retrieval (Ed. T. Strzalkowski), Dordrecht: Kluwer.
|
 |
29
|
|
| |
30
|
van Rijsbergen, C. J. 1977. A theoretical basis for the use of co-occurrence data in information retrieval. In: Journal of Documentation, 33(2): 106--119.
|
 |
31
|
|
| |
32
|
|
 |
33
|
|
CITED BY 40
|
|
|
|
|
|
|
|
Hang Cui , Renxu Sun , Keya Li , Min-Yen Kan , Tat-Seng Chua, Question answering passage retrieval using dependency relations, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
Jing Bai , Dawei Song , Peter Bruza , Jian-Yun Nie , Guihong Cao, Query expansion using term relationships in language models for information retrieval, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Chin-Yew Lin , Guihong Cao , Jianfeng Gao , Jian-Yun Nie, An information-theoretic approach to automatic evaluation of summaries, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.463-470, June 04-09, 2006, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|