|
ABSTRACT
This paper follows a formal approach to information retrieval based on statistical language models. By introducing some simple reformulations of the basic language modeling approach we introduce the notion of importance of a query term. The importance of a query term is an unknown parameter that explicitly models which of the query terms are generated from the relevant documents (the important terms), and which are not (the unimportant terms). The new language modeling approach is shown to explain a number of practical facts of today's information retrieval systems that are not very well explained by the current state of information retrieval theory, including stop words, mandatory terms, coordination level ranking and retrieval using phrases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Henk Ernst Blok , Djoerd Hiemstra , Sunil Choenni , Franciska de Jong , Henk M. Blanken , Peter M.G. Apers, Predicting the cost-quality trade-off for information retrieval queries: facilitating database design and query optimization, Proceedings of the tenth international conference on Information and knowledge management, October 05-10, 2001, Atlanta, Georgia, USA
[doi> 10.1145/502585.502621]
|
| |
3
|
|
 |
4
|
|
| |
5
|
C.L.A. Clarke, G.V. Cormack, and E.A. Tudhope. Relevance ranking for one to three term queries. In Proceedings of RIAO'97, pages 388--400, 1997.
|
| |
6
|
W.B. Croft, D.J. Harper, D.H. Kraft, and J. Zobel, editors. Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR'01). ACM Press, 2001.
|
| |
7
|
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em-algorithm plus discussions on the paper. Journal of the Royal Statistical Society, 39(B):1--38, 1977.
|
| |
8
|
D. Hawking and P. Thistlewaite. Relevance weighting using distance between term occurrences. Technical Report TR-CS-96-08, The Australian National University, 1996. (http://cs.anu.edu.au/techreports/)
|
| |
9
|
D. Hiemstra. A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, 3(2):131--139,2000.
|
| |
10
|
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference TREC-7, pages 227--238. NIST Special Publication 500-242, 1998.
|
| |
11
|
D. Hiemstra and A.P. de Vries. Relating the new language models of information retrieval to the traditional retrieval models. Technical Report TR-CTIT-00-09, Centre for Telematics and Information Technology, 2000. (http://www.ctit.utwente.nl)
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
| |
16
|
S. Lawrence and C.L. Giles. Accessibility of information on the web. Nature, 400:107--109, 1999.
|
 |
17
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
 |
18
|
|
| |
19
|
K. Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the eighth Text Retrieval Conference, TREC-8. NIST Special Publications, 1999.
|
| |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
S.E. Robertson and D. Hiemstra. Language models and probability of relevance, In Proceedings of the Workshop on Language Modeling and Information Retrieval, pages 21--25, 2001.
|
| |
24
|
S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.
|
| |
25
|
|
| |
26
|
J.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971.
|
| |
27
|
D.E. Rose and C. Stevens. V-twin: A lightweight engine for interactive use. In E.M. Voorhees and D.K. Harman, editors, Proceedings of the 5th Text Retrieval Conference TREC-5, pages 279--290. NIST Special Publication 500-238, 1996.
|
| |
28
|
|
 |
29
|
|
| |
30
|
K. Sparck-Jones. A statistical interpretation of term specifity and its application in retrieval. Journal of Documentation, 28(1):11--20, 1972.
|
| |
31
|
R. Wilkinson, J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In D.K. Harman, editor, Proceedings of the 4th Text Retrieval Conference TREC-4, pages 277--286. NIST Special Publication 500-236, 1995.
|
 |
32
|
|
|