|
ABSTRACT
Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage of relevant documents that they contain. While most previous cluster ranking approaches focus on the cluster as a whole, our model also exploits information induced from documents associated with the cluster. Our model substantially outperforms previous approaches for identifying clusters containing a high relevant-document percentage. Furthermore, using the model to produce document ranking yields precision-at-top-ranks performance that is consistently better than that of the initial ranking upon which clustering is performed; the performance also favorably compares with that of a state-of-the-art pseudo-feedback retrieval method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 - novelty and hard. In Proceedings of the Thirteenth Text Retrieval Conference (TREC-13), 2004.
|
| |
2
|
L. Azzopardi, M. Girolami, and K. van Rijsbergen. Topic based language models for ad hoc information retrieval. In Proceedings of International Conference on Neural Networks and IEEE International Conference on Fuzzy Systems, pages 3281--3286, 2004.
|
| |
3
|
|
| |
4
|
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC3. In Proceedings of the Third Text Retrieval Conference (TREC-3), pages 69--80, 1994.
|
| |
5
|
|
| |
6
|
M. Connell, A. Feng, G. Kumaran, H. Raghavan, C. Shah, and J. Allan. UMass at TDT 2004. TDT2004 System Description, 2004.
|
| |
7
|
W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
F. Geraci, M. Pellegrini, M. Maggini, and F. Sebastiani. Cluster generation and cluster labeling for Web snippets: A fast and accurate hierarchical solution. In Proceedings of the 13th international conference on string processing and information retrieval (SPIRE), pages 25--37, 2006.
|
| |
12
|
G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, third edition, 1996.
|
| |
13
|
A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986. Reprinted in Karen Sparck Jones and Peter Willett, eds., Readings in Information Retrieval, Morgan Kaufmann, pp. 365--373, 1997.
|
 |
14
|
|
| |
15
|
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.
|
| |
16
|
J. Kleinberg. Authoritative sources in a hyperlinked environment. Technical Report Research Report RJ 10076, IBM, May 1997.
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
 |
23
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
| |
24
|
Victor Lavrenko , James Allan , Edward DeGuzman , Daniel LaFlamme , Veera Pollard , Stephen Thomas, Relevance models for topic detection and tracking, Proceedings of the second international conference on Human Language Technology Research, March 24-27, 2002, San Diego, California
|
 |
25
|
|
| |
26
|
V. Lavrenko and W. B. Croft. Relevance models in information retrieval. In Croft and Lafferty {8}, pages 11--56.
|
 |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts, 2006.
|
 |
31
|
|
 |
32
|
|
 |
33
|
C. R. Palmer , J. Pesenti , R. E. Valdes-Perez , M. G. Christel , A. G. Hauptmann , D. Ng , H. D. Wactlar, Demonstration of hierarchical document clustering of digital library retrieval results, Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries, p.451, January 2001, Roanoke, Virginia, United States
[doi> 10.1145/379437.379770]
|
 |
34
|
|
| |
35
|
S. E. Preece. Clustering as an output option. In Proceedings of the American Society for Information Science, pages 189--190, 1973.
|
| |
36
|
J. G. Shanahan, J. Bennett, D. A. Evans, D. A. Hull, and J. Montgomery. Clairvoyance Corporation experiments in the TREC 2003. High accuracy retrieval from documents (HARD) track. In Proceedings of the Twelfth Text Retrieval Conference (TREC-12), pages 152--160, 2003.
|
 |
37
|
Luo Si , Rong Jin , Jamie Callan , Paul Ogilvie, A language modeling framework for resource selection and results merging, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584856]
|
| |
38
|
|
 |
39
|
|
| |
40
|
|
 |
41
|
|
 |
42
|
|
| |
43
|
P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.
|
 |
44
|
|
 |
45
|
|
 |
46
|
|
|