ACM Home Page
Please provide us with feedback. Feedback
A formal study of information retrieval heuristics
Full text PdfPdf (179 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Sheffield, United Kingdom
SESSION: Formal models-1 table of contents
Pages: 49 - 56  
Year of Publication: 2004
ISBN:1-58113-881-4
Authors
Hui Fang  University of Illinois at Urbana Champaign, Urbana, IL
Tao Tao  University of Illinois at Urbana Champaign, Urbana, IL
ChengXiang Zhai  University of Illinois at Urbana Champaign, Urbana, IL
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 27,   Downloads (12 Months): 247,   Citation Count: 29
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1008992.1009004
What is a DOI?

ABSTRACT

Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
N. Fuhr. Language models and uncertain inference in information retrieval. In Proceedings of the Language Modeling and IR workshop.
 
3
 
4
J. Kleinberg. An impossibility theorem for clustering. In Advances in NIPS 15, 2002.
 
5
J. Lafferty and C. Zhai.Probabilistic relevance models based on document and query generation. In W. B. Croft and J. Lafferty, editors, Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003.
6
 
7
S. Robertson and K. Sparck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129--146, 1976.
 
8
9
 
10
 
11
 
12
 
13
G. Salton, C. S. Yang, and C. T. Yu. A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26(1):33--44, Jan-Feb 1975.
 
14
A. Singhal. Modern information retrieval: A brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 24(4):35--43, 2001.
15
 
16
C. J. van Rijbergen. A theoretical basis for theuse of co-occurrence data in information retrieval. Journal of Documentation, pages 106--119, 1977.
 
17
C. J. van Rijsbergen. A non-classical logic for information retrieval. The Computer Journal, 29(6), 1986.
 
18
E. Voorhees and D. Harman, editors. Proceedings of Text RE trieval Conference(TREC1-9). NIST Special Publications, 2001. http://trec.nist.gov/pubs.html.
19
20
21

CITED BY  29

Collaborative Colleagues:
Hui Fang: colleagues
Tao Tao: colleagues
ChengXiang Zhai: colleagues