|
ABSTRACT
The cluster hypothesis states: closely related documents tend to be relevant to the same request. We exploit this hypothesis directly by adjusting ad hoc retrieval scores from an initial retrieval so that topically related documents receive similar scores. We refer to this process as score regularization. Score regularization can be presented as an optimization problem, allowing the use of results from semi-supervised learning. We demonstrate that regularized scores consistently and significantly rank documents better than unregularized scores, given a variety of initial retrieval algorithms. We evaluate our method on two large corpora across a substantial number of topics.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, J. Callan, K. Collins-Thompson, B. Croft, F. Feng, D. Fisher, J. Lafferty, L. Larkey, T. N. Truong, P. Ogilvie, L. Si, T. Strohman, H. Turtle, L. Yau, and C. Zhai. The lemur toolkit for language modeling and information retrieval. http://lemurproject.org.
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
N. Jardine and C. J. V. Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7:217--240, 1971.
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
| |
12
|
John Lafferty , Guy Lebanon, Diffusion Kernels on Statistical Manifolds, The Journal of Machine Learning Research, 6, p.129-163, 9/1/2005
|
| |
13
|
|
 |
14
|
|
| |
15
|
I. Matveeva. Text representation with the locality preserving projection algorithm for information retrieval task. Master's thesis, University of Chicago, 2004.
|
| |
16
|
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.
|
| |
17
|
|
 |
18
|
Tao Qin , Tie-Yan Liu , Xu-Dong Zhang , Zheng Chen , Wei-Ying Ma, A study of relevance propagation for web search, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076105]
|
| |
19
|
|
| |
20
|
J. J. Rocchio. The SMART Retrieval System: Experiments in Automatic Document Processing, chapter Relevance Feedback in Information Retrieval, pages 313--323. Prentice-Hall Inc., 1971.
|
 |
21
|
|
 |
22
|
|
| |
23
|
U. von Luxburg, O. Bousquet, and M. Belkin. On the convergence of spectral clustering on random samples: The normalized case. In Proceedings of the 17th Annual Conference on Learning Theory, pages 457--471, Berlin, 2004. Springer.
|
| |
24
|
E. Voorhees. Overview of the trec 2004 robust track. In Proceedings of the 13th Text REtrieval Conference (TREC 2004), 2004.
|
 |
25
|
|
 |
26
|
|
| |
27
|
D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Schölkopf. Ranking on data manifolds. In L. S. Thrun, S. and B. Scholkopf, editors, Advances in Neural Information Processing Systems 16, volume 16, pages 169--176, Cambridge, MA, USA, 2004. MIT Press.
|
| |
28
|
|
CITED BY 17
|
|
|
|
|
|
|
|
Lingpeng Yang , Donghong Ji , Guodong Zhou , Yu Nie , Guozheng Xiao, Document re-ranking using cluster validation and label propagation, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Donald Metzler , Jasmine Novak , Hang Cui , Srihari Reddy, Building enriched document representations using aggregated anchor text, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|