|
ABSTRACT
We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics of a query topic. In such a problem, the utility of a document in a ranking is dependent on other documents in the ranking, violating the assumption of independent relevance which is assumed in most traditional retrieval methods. Subtopic retrieval poses challenges for evaluating performance, as well as for developing effective algorithms. We propose a framework for evaluating subtopic retrieval which generalizes the traditional precision and recall metrics by accounting for intrinsic topic difficulty as well as redundancy in documents. We propose and systematically evaluate several methods for performing subtopic retrieval using statistical language models and a maximal marginal relevance (MMR) ranking strategy. A mixture model combined with query likelihood relevance ranking is shown to modestly outperform a baseline relevance ranking on a data set used in the TREC interactive track.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
|
| |
4
|
D. Harman. Overview of the trec 2002 novelty track. In Proceedings of TREC 2002, 2002.
|
| |
5
|
W. Hersh and P. Over. Trec-8 interactive track report. In E. Voorhees and D. Harman, editors, The Seventh Text REtrieval Conference (TREC-8), pages 57--64, 2000. NIST Special Publication 500--246.
|
 |
6
|
|
 |
7
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
| |
8
|
P. Ogilvie and J. Callan. Experiments using the lemur toolkit. In Proceedings of the 2001 Text REtrieval Conference, pages 103--108, 2002.
|
| |
9
|
P. Over. Trec-6 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-6), pages 73--82, 1998. NIST Special Publication 500--240.
|
| |
10
|
P. Over. Trec-7 interactive track report. In E. Voorhees and D. Harman, editors, The Sixth Text REtrieval Conference (TREC-7), pages 65--72, 1999. NIST Special Publication 500--242.
|
| |
11
|
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, Dec. 1977.
|
| |
12
|
T. Saracevic. Relevance reconsidered. In Proceedings of the 2nd Conference on Conceptions of Library and Information Science, pages 201--218, 1996.
|
 |
13
|
|
| |
14
|
C. Zhai and J. Lafferty. Model-based feedback in the KL-divergence retrieval model. In Tenth International Conference on Information and Knowledge Management (CIKM 2001), pages 403--410, 2001.
|
 |
15
|
|
 |
16
|
|
CITED BY 40
|
|
|
|
|
|
|
|
Benyu Zhang , Hua Li , Yi Liu , Lei Ji , Wensi Xi , Weiguo Fan , Zheng Chen , Wei-Ying Ma, Improving web search results using affinity graph, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Tao Qin , Tie-Yan Liu , Xu-Dong Zhang , De-Sheng Wang , Wen-Ying Xiong , Hang Li, Learning to rank relational objects and its application to web search, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
David Carmel , Elad Yom-Tov , Adam Darlow , Dan Pelleg, What makes a query difficult?, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
|
|
|
|
|
|
Charles L.A. Clarke , Maheedhar Kolla , Gordon V. Cormack , Olga Vechtomova , Azin Ashkan , Stefan Büttcher , Ian MacKinnon, Novelty and diversity in information retrieval evaluation, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yiming Yang , Abhimanyu Lad , Ni Lao , Abhay Harpale , Bryan Kisiel , Monica Rogati, Utility-based information distillation over temporally sequenced documents, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
Gang Luo , Chunqiang Tang , Hao Yang , Xing Wei, MedSearch: a specialized search engine for medical information retrieval, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ruihua Song , Zhenxiao Luo , Jian-Yun Nie , Yong Yu , Hsiao-Wuen Hon, Identification of ambiguous queries in web search, Information Processing and Management: an International Journal, v.45 n.2, p.216-229, March, 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|