| A general language model for information retrieval |
| Full text |
Pdf
(685 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eighth international conference on Information and knowledge management
table of contents
Kansas City, Missouri, United States
Pages: 316 - 321
Year of Publication: 1999
ISBN:1-58113-146-1
|
|
Authors
|
|
Fei Song
|
Dept. of Computing and Info. Science, University of Guelph, Guelph, Ontario, Canada N1G 2W1
|
|
W. Bruce Croft
|
Dept. of Computer Science, University of Massachusetts, Amherst, Massachusetts
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 23, Downloads (12 Months): 180, Citation Count: 48
|
|
|
ABSTRACT
Statistical language modeling has been successfully used for speech recognition, part-of-speech tagging, and syntactic parsing. Recently, it has also been applied to information retrieval. According to this new paradigm, each document is viewed as a language sample, and a query as a generation process. The retrieved documents are ranked based on the probabilities of producing a query from the corresponding language models of these documents. In this paper, we will present a new language model for information retrieval, which is based on a range of data smoothing techniques, including the Good-Turning estimate, curve-fitting functions, and model combinations. Our model is conceptually simple and intuitive, and can be easily extended to incorporate probabilities of phrases such as word pairs and word triples. The experiments with the Wall Street Journal and TREC4 data sets showed that the performance of our model is comparable to that of INQUERY and better than that of another language model for information retrieval. In particular, word pairs are shown to be useful in improving the retrieval performance.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Chamiak, E. Statistical Language Learning. The M1T Press, Cambridge MA, 1993.
|
| |
3
|
|
| |
4
|
Fralces, W.B., and Baeza-Yates, R. (editors). Information Retrieval: Data Structures and Algorithms. Englewood Cliffs, New Jersey: Prentice Hall, 1992.
|
| |
5
|
|
| |
6
|
Leek, T., Miller, D.R.H., and Schwartz, R.M. A Hidden Markov Model Information Retrieval System. TREC-7 PrOngs, 1998.
|
| |
7
|
|
 |
8
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
| |
9
|
|
 |
10
|
|
| |
11
|
Robcrtson, S.E. The probability ranking principle in IR. journal of Documentation, 33(4): 294-304, Decem~r 1977.
|
| |
12
|
|
CITED BY 48
|
|
|
|
|
|
|
|
|
|
|
Luo Si , Rong Jin , Jamie Callan , Paul Ogilvie, A language modeling framework for resource selection and results merging, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shuming Shi , Ji-Rong Wen , Qing Yu , Ruihua Song , Wei-Ying Ma, Gravitation-based model for information retrieval, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
Hang Cui , Renxu Sun , Keya Li , Min-Yen Kan , Tat-Seng Chua, Question answering passage retrieval using dependency relations, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jack G. Conrad , Xi S. Guo , Peter Jackson , Monem Meziou, Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment, Proceedings of the 28th international conference on Very Large Data Bases, p.71-82, August 20-23, 2002, Hong Kong, China
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xiangdong Zhou , Mei Wang , Qi Zhang , Junqi Zhang , Baile Shi, Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching, Proceedings of the 6th ACM international conference on Image and video retrieval, p.25-32, July 09-11, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hao Lang , Bin Wang , Gareth Jones , Jin-Tao Li , Fan Ding , Yi-Xuan Liu, Query performance prediction for information retrieval based on covering topic score, Journal of Computer Science and Technology, v.23 n.4, p.590-601, July 2008
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models
Additional Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
General Terms:
Design,
Documentation,
Experimentation,
Languages,
Management,
Measurement,
Performance,
Theory
Keywords:
curve-fitting functions,
good-turing estimate,
model combinations,
statistical language modeling
|