|
ABSTRACT
Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech recognition. The basic idea of these approaches is to estimate a language model for each document, and then rank documents by the likelihood of the query according to the estimated language model. A core problem in language model estimation is smoothing, which adjusts the maximum likelihood estimator so as to correct the inaccuracy due to data sparseness. In this paper, we study the problem of language model smoothing and its influence on retrieval performance. We examine the sensitivity of retrieval performance to the smoothing parameters and compare several popular smoothing methods on different test collections.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
S.F.Chen and J.Goo man (1998)."An empirical study of smoothing techniques for language modeling,"Tech.Rep. TR-10-98,Harvar University.
|
| |
3
|
|
| |
4
|
I.J.Goo (1953)."The Population Frequencies of Species and the Estimation of Population Parameters,"Biometrika Volume 40,parts 3,4,pp.237 -264.
|
| |
5
|
D.Hiemstra and W.Kraaij (1998)."Twenty-one at TREC- 7:A -hoc and cross-language track,"in Proc. of Seventh Text REtrieval Conference (TREC-7),Gaithersburg,MD.
|
| |
6
|
F.Jelinek and R.Mercer (1980)."Interpolated estimation of Markov source parameters from sparse ata ".In Pattern Recognition in Practice E.S.Gelsemaan L.N.Kanal(editors),pages 381 -402.North Holland,Amsterdam.
|
| |
7
|
S.M.Katz (1987)."Estimation of probabilities from sparse data for the language model component of a speech recognizer,"IEEE Transactions on Acoustics, Speech and Signal Processing volume ASSP-35,pages 400 -401,March 1987.
|
| |
8
|
R.Kneser and H.Ney (1995)."Improved smoothing for mgram language modeling,"in Proceedings of the International Conference on Acoustics, Speech and Signal Processing Detroit,MI.
|
| |
9
|
MacKay,D.and Peto,L.(1995)."A hierarchical Dirichlet language model."Natural Language Engineering 1(3),pp. 289 -307.
|
 |
10
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
| |
11
|
H.Ney,U.Essen,and R.Kneser (1994)."On structuring probabilistic epen encies in stochastic language mo eling," Computer Speech and Language 8:1-38.
|
| |
12
|
|
 |
13
|
|
| |
14
|
C.J.van Rijsbergen (1986)."A Non-classical Logic for Information Retrieval,"The Computer Journal 29(6).
|
| |
15
|
|
| |
16
|
S.E.Robertson,S.Walker,S.Jones,M.M.Hancock- Beaulieu,and M.Gatfor (1995)."Okapi at TREC-3,"The Third Text REtrieval Conference (TREC-3),inD.K.Harman (e ),NIST Special Publication.
|
| |
17
|
|
| |
18
|
G.Salton and C.Buckley (1990),"Improving retrieval performance by relevance feedback ",Journal of the American Society for Information Science,Vol.44,No.4,288 -297.
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
 |
22
|
|
CITED BY 176
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Luo Si , Rong Jin , Jamie Callan , Paul Ogilvie, A language modeling framework for resource selection and results merging, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
Donald Metzler , Yaniv Bernstein , W. Bruce Croft , Alistair Moffat , Justin Zobel, Similarity measures for tracking information flow, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
András Kornai , Marc Krellenstein , Michael Mulligan , David Twomey , Fruzsina Veress , Alec Wysoker, Classifying the Hungarian web, Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, April 12-17, 2003, Budapest, Hungary
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shuming Shi , Ji-Rong Wen , Qing Yu , Ruihua Song , Wei-Ying Ma, Gravitation-based model for information retrieval, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jing Bai , Dawei Song , Peter Bruza , Jian-Yun Nie , Guihong Cao, Query expansion using term relationships in language models for information retrieval, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiaohua Zhou , Xiaohua Hu , Xiaodan Zhang , Xia Lin , Il-Yeol Song, Context-sensitive semantic smoothing for the language modeling approach to genomic IR, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zhang Jun-lin , Sun Le , Qu Wei-min , Sun Yu-fang, A trigger language model-based IR system, Proceedings of the 20th international conference on Computational Linguistics, p.680-es, August 23-27, 2004, Geneva, Switzerland
|
|
|
Tao Tao , Xuanhui Wang , Qiaozhu Mei , ChengXiang Zhai, Language model information retrieval with document expansion, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.407-414, June 04-09, 2006, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xirong Li , Dong Wang , Jianmin Li , Bo Zhang, Video search in concept subspace: a text-like paradigm, Proceedings of the 6th ACM international conference on Image and video retrieval, p.603-610, July 09-11, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
Victor Lavrenko , James Allan , Edward DeGuzman , Daniel LaFlamme , Veera Pollard , Stephen Thomas, Relevance models for topic detection and tracking, Proceedings of the second international conference on Human Language Technology Research, March 24-27, 2002, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Lei Wu , Zhiwei Li , Mingjing Li , Wei-Ying Ma , Nenghai Yu, Mutually beneficial learning with application to on-line news classification, Proceedings of the ACM first Ph.D. workshop in CIKM, November 09-09, 2007, Lisbon, Portugal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Giorgos Giannopoulos , Theodore Dalamagas , Magdalini Eirinaki , Timos Sellis, Boosting the ranking function learning process using clustering, Proceeding of the 10th ACM workshop on Web information and data management, October 30-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jianhan Zhu , Jun Wang , Ingemar J. Cox , Michael J. Taylor, Risky business: modeling and exploiting uncertainty in information retrieval, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
Young-In Song , Chin-Yew Lin , Yunbo Cao , Hae-Chang Rim, Question utility: a novel static ranking of question search, Proceedings of the 23rd national conference on Artificial intelligence, p.1231-1236, July 13-17, 2008, Chicago, Illinois
|
|
|
|
|
|
|
|
|
|
|
|
Yunzhang Zhu , Gang Wang , Junli Yang , Dakan Wang , Jun Yan , Zheng Chen, Revenue optimization with relevance constraint in sponsored search, Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising, p.55-60, June 28-28, 2009, Paris, France
|
|
|
|
|
|
|
|