|
ABSTRACT
This paper investigates the pre-conditions for successful combination of document representations formed from structural markup for the task of known-item search. As this task is very similar to work in meta-search and data fusion, we adapt several hypotheses from those research areas and investigate them in this context. To investigate these hypotheses, we present a mixture-based language model and also examine many of the current meta-search algorithms. We find that compatible output from systems is important for successful combination of document representations. We also demonstrate that combining low performing document representations can improve performance, but not consistently. We find that the techniques best suited for this task are robust to the inclusion of poorly performing document representations. We also explore the role of variance of results across systems and its impact on the performance of fusion, with the surprising result that the correct documents have higher variance across document representations than highly ranking incorrect documents.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Abdur Chowdhury , Ophir Frieder , David Grossman , Catherine McCabe, Analyses of multiple-evidence combinations for retrieval strategies, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.394-395, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.384034]
|
| |
3
|
K. Collins-Thompson, P. Ogilvie, Y. Zhang, and J. Callan. Information filtering, novelty detection, and named-page finding. In Proceedings of the 11th Text REtrieval Conference (TREC-11), pages 338-349, notebook version, 2002.
|
 |
4
|
|
| |
5
|
W.B. Croft. Combining approaches to information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, chapter 1, pages 1-36. Kluwer Academic Publishers, 2000.
|
| |
6
|
E.A. Fox and J.A. Shaw. Combination of multiple searches. In The Second Text REtrieval Conference (TREC-2), pages 243-249, 1994.
|
| |
7
|
N. Fuhr, N. Govert, G. Kazai, and M. Lalmas, editors. INEX 2002 Workshop Proceedings. To be published. Draft available at http://qmir.dcs.qmul.ac.uk/inex/Workshop.html.
|
| |
8
|
D. Hawking and N. Craswell. Overview of the TREC-2001 Web Track. In Proceedings of the 10th Text REtrieval Conference (TREC-10), pages 61-67, 2002.
|
 |
9
|
|
| |
10
|
The Lemur toolkit for language modeling in information retrieval. http://www.cs.cmu.edu/~lemur
|
 |
11
|
Sung Hyon Myaeng , Don-Hyun Jang , Mun-Seok Kim , Zong-Cheol Zhoo, A flexible model for retrieval of SGML documents, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.138-145, August 24-28, 1998, Melbourne, Australia
[doi> 10.1145/290941.290980]
|
 |
12
|
|
 |
13
|
|
| |
14
|
K.B. Ng and P. Kantor. An investigation of the preconditions for effective data fusion in IR: a pilot study. In Proc. of the 61st Annual Meeting of the American Society for Information Science, 1998.
|
| |
15
|
E.K. Park, S.I. Moon, D.Y. Ra, and M.G. Jang. Web Document Retrieval Using Sentence-query Similarity. In Proceedings of the 11th Text REtrieval Conference (TREC-11), notebook version, 2002.
|
 |
16
|
|
| |
17
|
|
| |
18
|
J. Savoy, A.L. Calve, and D. Vrajitoru. Report on the TREC-5 experiment: data fusion and collection fusion. In The 5th Text REtrieval Conference (TREC-5), pages 489-502, 1997.
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
M. Zhang, R. Song, C. Lin, L. Ma, Z. Jiang, Y. Jin, Y. Liu, L. Zhao, and S. Ma. THU at TREC 2002: novelty, web, and filtering (draft). In Proceedings of the 11th Text REtrieval Conference (TREC-11), pages 29-42, notebook version, 2002.
|
CITED BY 27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shuming Shi , Ji-Rong Wen , Qing Yu , Ruihua Song , Wei-Ying Ma, Gravitation-based model for information retrieval, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
Yunhua Hu , Guomao Xin , Ruihua Song , Guoping Hu , Shuming Shi , Yunbo Cao , Hang Li, Title extraction from bodies of HTML documents and its application to web page retrieval, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
|
|
|
|
|
|
|
|
|
|
|
|
Yewei Xue , Yunhua Hu , Guomao Xin , Ruihua Song , Shuming Shi , Yunbo Cao , Chin-Yew Lin , Hang Li, Web page title extraction and its application, Information Processing and Management: an International Journal, v.43 n.5, p.1332-1347, September, 2007
|
|
|
|
|
|
Zaiqing Nie , Yunxiao Ma , Shuming Shi , Ji-Rong Wen , Wei-Ying Ma, Web object retrieval, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GunWoo Park , JinGi Chae , Dae Hee Lee , SangHoon Lee, Personalized search based on user intention through the hierarchical phrase vector model, Proceedings of the WSEAS International Conference on Applied Computing Conference, p.205-210, May 27-30, 2008, Istanbul, Turkey
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Donald Metzler , Jasmine Novak , Hang Cui , Srihari Reddy, Building enriched document representations using aggregated anchor text, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|