ACM Home Page
Please provide us with feedback. Feedback
Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term
Full text PdfPdf (190 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Tampere, Finland
SESSION: Information Retrieval Theory table of contents
Pages: 35 - 41  
Year of Publication: 2002
ISBN:1-58113-561-0
Author
Djoerd Hiemstra  University of Twente, The Netherlands
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 17,   Downloads (12 Months): 102,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564376.564385
What is a DOI?

ABSTRACT

This paper follows a formal approach to information retrieval based on statistical language models. By introducing some simple reformulations of the basic language modeling approach we introduce the notion of importance of a query term. The importance of a query term is an unknown parameter that explicitly models which of the query terms are generated from the relevant documents (the important terms), and which are not (the unimportant terms). The new language modeling approach is shown to explain a number of practical facts of today's information retrieval systems that are not very well explained by the current state of information retrieval theory, including stop words, mandatory terms, coordination level ranking and retrieval using phrases.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
4
 
5
C.L.A. Clarke, G.V. Cormack, and E.A. Tudhope. Relevance ranking for one to three term queries. In Proceedings of RIAO'97, pages 388--400, 1997.
 
6
W.B. Croft, D.J. Harper, D.H. Kraft, and J. Zobel, editors. Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR'01). ACM Press, 2001.
 
7
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em-algorithm plus discussions on the paper. Journal of the Royal Statistical Society, 39(B):1--38, 1977.
 
8
D. Hawking and P. Thistlewaite. Relevance weighting using distance between term occurrences. Technical Report TR-CS-96-08, The Australian National University, 1996. (http://cs.anu.edu.au/techreports/)
 
9
D. Hiemstra. A probabilistic justification for using tf.idf term weighting in information retrieval. International Journal on Digital Libraries, 3(2):131--139,2000.
 
10
D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and cross-language track. In Proceedings of the seventh Text Retrieval Conference TREC-7, pages 227--238. NIST Special Publication 500-242, 1998.
 
11
D. Hiemstra and A.P. de Vries. Relating the new language models of information retrieval to the traditional retrieval models. Technical Report TR-CTIT-00-09, Centre for Telematics and Information Technology, 2000. (http://www.ctit.utwente.nl)
 
12
 
13
14
15
 
16
S. Lawrence and C.L. Giles. Accessibility of information on the web. Nature, 400:107--109, 1999.
17
18
 
19
K. Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the eighth Text Retrieval Conference, TREC-8. NIST Special Publications, 1999.
 
20
21
 
22
 
23
S.E. Robertson and D. Hiemstra. Language models and probability of relevance, In Proceedings of the Workshop on Language Modeling and Information Retrieval, pages 21--25, 2001.
 
24
S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.
 
25
 
26
J.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart Retrieval System: Experiments in Automatic Document Processing, pages 313--323. Prentice Hall, 1971.
 
27
D.E. Rose and C. Stevens. V-twin: A lightweight engine for interactive use. In E.M. Voorhees and D.K. Harman, editors, Proceedings of the 5th Text Retrieval Conference TREC-5, pages 279--290. NIST Special Publication 500-238, 1996.
 
28
29
 
30
K. Sparck-Jones. A statistical interpretation of term specifity and its application in retrieval. Journal of Documentation, 28(1):11--20, 1972.
 
31
R. Wilkinson, J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In D.K. Harman, editor, Proceedings of the 4th Text Retrieval Conference TREC-4, pages 277--286. NIST Special Publication 500-236, 1995.
32

CITED BY  12