|
ABSTRACT
We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to diversifying results that aims to minimize the risk of dissatisfaction of the average user. We propose an algorithm that well approximates this objective in general, and is provably optimal for a natural special case. Furthermore, we generalize several classical IR metrics, including NDCG, MRR, and MAP, to explicitly account for the value of diversification. We demonstrate empirically that our algorithm scores higher in these generalized metrics compared to results produced by commercial search engines.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Kannan Achan, Ariel Fuxman, Panayiotis Tsaparas, and Rakesh Agrawal. Using the wisdom of the crowds for keyword generation. In WWW, pages 1--8, 2008.
|
 |
2
|
|
| |
3
|
A. Bookstein. Information retrieval: A sequential learning process. Journal of the American Society for Information Sciences (ASIS), 34(5):331--342, 1983.
|
| |
4
|
B. Boyce. Beyond topicality: A two stage view of relevance and the retrieval process. Info. Processing and Management, 18(3):105--109, 1982.
|
 |
5
|
|
 |
6
|
|
 |
7
|
Charles L.A. Clarke , Maheedhar Kolla , Gordon V. Cormack , Olga Vechtomova , Azin Ashkan , Stefan Büttcher , Ian MacKinnon, Novelty and diversity in information retrieval evaluation, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390446]
|
| |
8
|
W. Goffman. A searching procedure for information retrieval. Info. Storage and Retrieval, 2:73--78, 1964.
|
| |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Math. Programming, 14:265--294, 1978.
|
 |
13
|
|
 |
14
|
|
| |
15
|
Erik Vee, Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, and Sihem Amer-Yahia. Efficient computation of diverse query results. In ICDE, pages 228--236, 2008.
|
| |
16
|
E. M. Voorhees. Overview of the trec 2004 robust retrieval track. In TREC, 2004.
|
| |
17
|
|
| |
18
|
ChengXiang Zhai. Risk Minimization and Language Modeling in Information Retrieval. PhD thesis, Carnegie Mellon University, 2002.
|
 |
19
|
|
| |
20
|
|
 |
21
|
|
CITED BY 2
|
|
|
|
|
Paul Clough , Mark Sanderson , Murad Abouammoh , Sergio Navarro , Monica Paramita, Multiple approaches to analysing query diversity, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|