ACM Home Page
Please provide us with feedback. Feedback
Corpus structure, language models, and ad hoc information retrieval
Full text PdfPdf (214 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Sheffield, United Kingdom
SESSION: Language models table of contents
Pages: 194 - 201  
Year of Publication: 2004
ISBN:1-58113-881-4
Authors
Oren Kurland  Cornell University, Ithaca, NY
Lillian Lee  Cornell University, Ithaca, NY
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 113,   Citation Count: 31
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1008992.1009027
What is a DOI?

ABSTRACT

Most previous work on the recently developed language-modeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in precision and recall, and our new interpolation algorithm posts statistically significant improvements for both metrics over all three corpora tested.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
James Allan, M.E. Connel, W. Bruce Croft, Fang-Fang Feng, D. Fisher, and X. Li. Inquery and trec-9. In Proceedings of the Ninth Text Retrieval Conference (TREC-9), pages 551--562, 2001.
 
2
 
3
 
4
W. Bruce Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.
 
5
6
7
 
8
 
9
Thomas Hofmann and Jan Puzicha. Unsupervised learning from dyadic data. Technical Report TR-98-042, International Computer Science Institute (ICSI), 1998.
 
10
Rukmini Iyer and Mari Ostendorf. Modeling long distance dependence in language: Topic mixtures vs. dynamic cache models. IEEE Transactions on Speech and Audio Processing, 7(1):30--39, 1999.
11
 
12
13
 
14
Paul Ogilvie and Jamie Callan. Experiments using the lemur toolkit. In Proceedings of the Tenth Text Retrieval Conference (TREC-10), pages 103--108, 2001.
15
 
16
17
18
19

CITED BY  31

Collaborative Colleagues:
Oren Kurland: colleagues
Lillian Lee: colleagues