|
ABSTRACT
We develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents whose models are likely to generate the query. We suggest that clarity scores measure the ambiguity of a query with respect to a collection of documents and show that they correlate positively with average precision in a variety of TREC test sets. Thus, the clarity score may be used to identify ineffective queries, on average, without relevance information. We develop an algorithm for automatically setting the clarity score threshold between predicted poorly-performing queries and acceptable queries and validate it using TREC data. In particular, we compare the automatic thresholds to optimum thresholds and also check how frequently results as good are achieved in sampling experiments that randomly assign queries to the two classes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Bowman and A. Azzilini. Applied Smoothing Techniques for Data Analysis. Oxford University Press, New York, 1997.
|
| |
2
|
C. Buckley. The trec-9 query track. In E. Voorhees and D. Harman, editors, Proceedings of the Ninth Text REtrieval Conference(TREC-9), 2000. NIST Special Publication 500-249.
|
 |
3
|
|
| |
4
|
|
| |
5
|
W. B. Croft. Combining approaches in information retrieval. In W. B. Croft, editor, Advances in Information Retrieval: Recent Research from the CIIR, pages 1--36. Kluwer Academic Publishers, Boston, 2000.
|
| |
6
|
S. Cronen-Townsend and W. B. Croft. Quantifying query ambiguity. In Proc. of Human Language Technology 2002, pages 94--98, March 2002.
|
| |
7
|
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, New York, 1973.
|
| |
8
|
J. D. Gibbons and S. Chakraborty. Nonparametric Statistical Inference, 3rd ed. Marcel Dekker, New York, New York, 1992.
|
| |
9
|
|
| |
10
|
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
| |
17
|
P. Resnik. Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61:127--159, 1996.
|
| |
18
|
M. Rorvig. A new method of measurement for question difficulty. In Proceedings of the 2000 Annual Meeting of the American Society for Information Science, Knowledge Innovations, volume 37, pages 372--378, 2000.
|
 |
19
|
|
 |
20
|
|
| |
21
|
S. K. M. Wong and Y. Y. Yao. An information-theoretic measure of term specificity. Journal of the American Society for Information Science, 43(1):54--61, 1992.
|
CITED BY 88
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Eric C. Jensen , Steven M. Beitzel , David Grossman , Ophir Frieder , Abdur Chowdhury, Predicting query difficulty on the web by learning visual clues, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vishwa Vinay , Ingemar J. Cox , Natasa Milic-Frayling , Ken Wood, On ranking the effectiveness of searches, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
David Carmel , Elad Yom-Tov , Adam Darlow , Dan Pelleg, What makes a query difficult?, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ruihua Song , Zhenxiao Luo , Ji-Rong Wen , Yong Yu , Hsiao-Wuen Hon, Identifying ambiguous queries in web search, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Charles L.A. Clarke , Maheedhar Kolla , Gordon V. Cormack , Olga Vechtomova , Azin Ashkan , Stefan Büttcher , Ian MacKinnon, Novelty and diversity in information retrieval evaluation, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Jiyin He , Wouter Weerkamp , Martha Larson , Maarten de Rijke, Blogger, stick to your story: modeling topical noise in blogs with coherence measures, Proceedings of the second workshop on Analytics for noisy unstructured text data, p.39-46, July 24-24, 2008, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Bernard J. Jansen , Danielle L. Booth , Amanda Spink, Determining the informational, navigational, and transactional intent of Web queries, Information Processing and Management: an International Journal, v.44 n.3, p.1251-1266, May, 2008
|
|
|
|
|
|
Andrei Broder , Massimiliano Ciaramita , Marcus Fontoura , Evgeniy Gabrilovich , Vanja Josifovski , Donald Metzler , Vanessa Murdock , Vassilis Plachouras, To swing or not to swing: learning when (not) to advertise, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Hao Lang , Bin Wang , Gareth Jones , Jin-Tao Li , Fan Ding , Yi-Xuan Liu, Query performance prediction for information retrieval based on covering topic score, Journal of Computer Science and Technology, v.23 n.4, p.590-601, July 2008
|
|
|
Ruihua Song , Zhenxiao Luo , Jian-Yun Nie , Yong Yu , Hsiao-Wuen Hon, Identification of ambiguous queries in web search, Information Processing and Management: an International Journal, v.45 n.2, p.216-229, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
Linjun Yang , Li Wang , Bo Geng , Xian-Sheng Hua, Query sampling for ranking learning in web search, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
Jaime Arguello , Fernando Diaz , Jamie Callan , Jean-Francois Crespo, Sources of evidence for vertical selection, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|