ACM Home Page
Please provide us with feedback. Feedback
A general language model for information retrieval
Full text PdfPdf (685 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eighth international conference on Information and knowledge management table of contents
Kansas City, Missouri, United States
Pages: 316 - 321  
Year of Publication: 1999
ISBN:1-58113-146-1
Authors
Fei Song  Dept. of Computing and Info. Science, University of Guelph, Guelph, Ontario, Canada N1G 2W1
W. Bruce Croft  Dept. of Computer Science, University of Massachusetts, Amherst, Massachusetts
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMIS: ACM Special Interest Group on Management Information Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 169,   Citation Count: 48
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/319950.320022
What is a DOI?

ABSTRACT

Statistical language modeling has been successfully used for speech recognition, part-of-speech tagging, and syntactic parsing. Recently, it has also been applied to information retrieval. According to this new paradigm, each document is viewed as a language sample, and a query as a generation process. The retrieved documents are ranked based on the probabilities of producing a query from the corresponding language models of these documents. In this paper, we will present a new language model for information retrieval, which is based on a range of data smoothing techniques, including the Good-Turning estimate, curve-fitting functions, and model combinations. Our model is conceptually simple and intuitive, and can be easily extended to incorporate probabilities of phrases such as word pairs and word triples. The experiments with the Wall Street Journal and TREC4 data sets showed that the performance of our model is comparable to that of INQUERY and better than that of another language model for information retrieval. In particular, word pairs are shown to be useful in improving the retrieval performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Chamiak, E. Statistical Language Learning. The M1T Press, Cambridge MA, 1993.
 
3
 
4
Fralces, W.B., and Baeza-Yates, R. (editors). Information Retrieval: Data Structures and Algorithms. Englewood Cliffs, New Jersey: Prentice Hall, 1992.
 
5
 
6
Leek, T., Miller, D.R.H., and Schwartz, R.M. A Hidden Markov Model Information Retrieval System. TREC-7 PrOngs, 1998.
 
7
8
 
9
10
 
11
Robcrtson, S.E. The probability ranking principle in IR. journal of Documentation, 33(4): 294-304, Decem~r 1977.
 
12

CITED BY  48

Collaborative Colleagues:
Fei Song: colleagues
W. Bruce Croft: colleagues