| Corpus structure, language models, and ad hoc information retrieval |
| Full text |
Pdf
(214 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Sheffield, United Kingdom
SESSION: Language models
table of contents
Pages: 194 - 201
Year of Publication: 2004
ISBN:1-58113-881-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 113, Citation Count: 31
|
|
|
ABSTRACT
Most previous work on the recently developed language-modeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in precision and recall, and our new interpolation algorithm posts statistically significant improvements for both metrics over all three corpora tested.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
James Allan, M.E. Connel, W. Bruce Croft, Fang-Fang Feng, D. Fisher, and X. Li. Inquery and trec-9. In Proceedings of the Ninth Text Retrieval Conference (TREC-9), pages 551--562, 2001.
|
| |
2
|
|
| |
3
|
|
| |
4
|
W. Bruce Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
|
| |
9
|
Thomas Hofmann and Jan Puzicha. Unsupervised learning from dyadic data. Technical Report TR-98-042, International Computer Science Institute (ICSI), 1998.
|
| |
10
|
Rukmini Iyer and Mari Ostendorf. Modeling long distance dependence in language: Topic mixtures vs. dynamic cache models. IEEE Transactions on Speech and Audio Processing, 7(1):30--39, 1999.
|
 |
11
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
| |
12
|
|
 |
13
|
|
| |
14
|
Paul Ogilvie and Jamie Callan. Experiments using the lemur toolkit. In Proceedings of the Tenth Text Retrieval Conference (TREC-10), pages 103--108, 2001.
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
|
CITED BY 31
|
|
|
|
|
|
|
|
Carlo A. Curino , Yuanyuan Jia , Bruce Lambert , Patricia M. West , Clement Yu, Mining officially unrecognized side effects of drugs by combining web search and machine learning, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tao Tao , Xuanhui Wang , Qiaozhu Mei , ChengXiang Zhai, Language model information retrieval with document expansion, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.407-414, June 04-09, 2006, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Donald Metzler , Jasmine Novak , Hang Cui , Srihari Reddy, Building enriched document representations using aggregated anchor text, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|