ACM Home Page
Please provide us with feedback. Feedback
A study of parameter tuning for term frequency normalization
Full text PdfPdf (152 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the twelfth international conference on Information and knowledge management table of contents
New Orleans, LA, USA
SESSION: Information retrieval session 1: adhoc retrieval table of contents
Pages: 10 - 16  
Year of Publication: 2003
ISBN:1-58113-723-0
Authors
Ben HE  University of Glasgow, Glasgow, UK
Iadh Ounis  University of Glasgow, Glasgow, UK
Sponsors
ACM: Association for Computing Machinery
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 64,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956863.956867
What is a DOI?

ABSTRACT

Most current term frequency normalization approaches for information retrieval involve the use of parameters. The tuning of these parameters has an important impact on the overall performance of the information retrieval system. Indeed, a small variation in the involved parameter(s) could lead to an important variation in the precision/recall values. Most current tuning approaches are dependent on the document collections. As a consequence, the effective parameter value cannot be obtained for a given new collection without extensive training data. In this paper, we propose a novel and robust method for the tuning of term frequency normalization parameter(s), by measuring the normalization effect on the within document frequency of the query terms. As an illustration, we apply our method on Amati \& Van Rijsbergen's so-called normalization 2. The experiments for the ad-hoc TREC-6,7,8 tasks and TREC-8,9,10 Web tracks show that the new method is independent of the collections and able to provide reliable and good performance.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
G. Amati. Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow, 2003.
2
 
3
4
 
5
 
6
S. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at trec-4. In NIST Special Publication 500-236: The Fourth Text REtrieval Conference (TREC-4), pages 73--96, 1995.
7
8
 
9
10

CITED BY  8