|
ABSTRACT
We explore the relation between classical probabilistic models of information retrieval and the emerging language modeling approaches. It has long been recognized that the primary obstacle to effective performance of classical models is the need to estimate arelevance model: probabilities of words in the relevant class. We propose a novel technique for estimating these probabilities using the query alone. We demonstrate that our technique can produce highly accurate relevance models, addressing important notions of synonymy and polysemy. Our experiments show relevance models outperforming baseline language modeling systems on TREC retrieval and TDT tracking tasks. The main contribution of this work is an effective formal method for estimating a relevance model with no training data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, J. Callan, F. Feng, and D. Malin. INQUERY and TREC-8. In D. Harman, editor, Proceedings of the Eighth Text REtrieval Conference (TREC-8), 1999.
|
 |
2
|
|
| |
3
|
|
 |
4
|
Adam Berger , Rich Caruana , David Cohn , Dayne Freitag , Vibhu Mittal, Bridging the lexical chasm: statistical approaches to answer-finding, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.192-199, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345576]
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
C. Cieri, D.Graff, M.Liberman, N.Martey, and S.Strassel. The TDT-2 text and speech corpus. In Proceedings of the DARPA Broadcast News Workshop, pp 57-60, 1999.
|
| |
10
|
D. Hiemstra. Using language models for information retrieval. In PhD Thesis, University of Twente, 2001.
|
| |
11
|
D. Hiemstra and A. de Vries. Relating the new language models of information retrieval to the traditional retrieval models. In CTIT Technical Report TR-CTIT-00-09, 2000.
|
| |
12
|
H. Jin, R. Schwartz, S. Sista, and F. Walls. Topic tracking for radio, TV broadcast and newswire. In Proceedings of DARPA Broadcast News Workshop, pp 199-204, 1999.
|
| |
13
|
A. Martin, G. Doddington, T. Kamm, and M. Ordowski. The DET curve in assessment of detection task performance. In EuroSpeech, pages 1895-1898, 1997.
|
 |
14
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
| |
15
|
|
 |
16
|
|
| |
17
|
S. Robertson and K. S. Jones. Relevance weighting of search terms. In Journal of the American Society for Information Science, vol.27, 1977.
|
| |
18
|
|
| |
19
|
|
| |
20
|
S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. OKAPI at TREC-3. In D. Harman, editor, Proceedings of the 3rd Text REtrieval Conference (TREC-3), 1996.
|
 |
21
|
|
| |
22
|
H. Turtle and W. B. Croft. Efficient probabilistic inference for text retrieval. In Proceedings of RIAO 3, pages 644-651, 1991.
|
| |
23
|
C. J. van Rijsbergen. A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation, 33:106-119, 1977.
|
 |
24
|
|
| |
25
|
J. Yamron, I. Carp, L. Gillick, S.Lowe, and P. van Mulbregt. Topic tracking in a news stream. In Proceedings of DARPA Broadcast News Workshop, pp 133-136, 1999.
|
| |
26
|
J. Yamron, S. Knecht, and P. van Mulbregt. Dragon's tracking and detection systems for the TDT2000 evaluation. In Proceedings of Topic Detection and Tracking Workshop, pp 75-80, 2000.
|
CITED BY 155
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thomas R. Lynam , Chris Buckley , Charles L. A. Clarke , Gordon V. Cormack, A multi-system analysis of document and term selection for blind feedback, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jing Bai , Dawei Song , Peter Bruza , Jian-Yun Nie , Guihong Cao, Query expansion using term relationships in language models for information retrieval, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H. C. Wu , R. W. P. Luk , K. F. Wong , K. L. Kwok, Probabilistic document-context based relevance feedback with limited relevance judgments, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
Changhu Wang , Feng Jing , Lei Zhang , Hong-Jiang Zhang, Image annotation refinement using random walk with restarts, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Zhang Jun-lin , Sun Le , Qu Wei-min , Sun Yu-fang, A trigger language model-based IR system, Proceedings of the 20th international conference on Computational Linguistics, p.680-es, August 23-27, 2004, Geneva, Switzerland
|
|
|
Tao Tao , Xuanhui Wang , Qiaozhu Mei , ChengXiang Zhai, Language model information retrieval with document expansion, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.407-414, June 04-09, 2006, New York, New York
|
|
|
Chin-Yew Lin , Guihong Cao , Jianfeng Gao , Jian-Yun Nie, An information-theoretic approach to automatic evaluation of summaries, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.463-470, June 04-09, 2006, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Filip Radlinski , Andrei Broder , Peter Ciccolo , Evgeniy Gabrilovich , Vanja Josifovski , Lance Riedel, Optimizing relevance and revenue in ad search: a query substitution approach, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
Jiyin He , Wouter Weerkamp , Martha Larson , Maarten de Rijke, Blogger, stick to your story: modeling topical noise in blogs with coherence measures, Proceedings of the second workshop on Analytics for noisy unstructured text data, p.39-46, July 24-24, 2008, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Krisztian Balog , Toine Bogers , Leif Azzopardi , Maarten de Rijke , Antal van den Bosch, Broad expertise retrieval in sparse data environments, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
Victor Lavrenko , James Allan , Edward DeGuzman , Daniel LaFlamme , Veera Pollard , Stephen Thomas, Relevance models for topic detection and tracking, Proceedings of the second international conference on Human Language Technology Research, March 24-27, 2002, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Broder , Massimiliano Ciaramita , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Donald Metzler , Vanessa Murdock , Vassilis Plachouras, To swing or not to swing: learning when (not) to advertise, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kristen Parton , Kathleen R. McKeown , James Allan , Enrique Henestroza, Simultaneous multilingual search for translingual information retrieval, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
Roelof van Zwol , Vanessa Murdock , Lluis Garcia Pueyo , Georgina Ramirez, Diversifying image search with user generated content, Proceeding of the 1st ACM international conference on Multimedia information retrieval, October 30-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Andrei Broder , Peter Ciccolo , Evgeniy Gabrilovich , Vanja Josifovski , Donald Metzler , Lance Riedel , Jeffrey Yuan, Online expansion of rare queries for sponsored search, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|