|
ABSTRACT
To improve the precision at the very top ranks of a document list presented in response to a query, researchers suggested to exploit information induced from clustering of documents highly ranked by some initial search. We propose a novel model for ranking such (query-specific) clusters by the presumed percentage of relevant documents that they contain. The model is based on (i) proposing a palette of "witness" cluster properties that purportedly correlate with this percentage, (ii) devising concrete quantitative measures for these properties, and (iii) ordering the clusters via aggregation of rankings induced by these individual measures. Empirical evaluation shows that our model is consistently more effective than previously suggested methods in detecting clusters containing a high relevant-document percentage. Furthermore, the precision-at-top-ranks performance of this model transcends that of standard document-based retrieval, and competes with that of a state-of-the-art document-based retrieval approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
N. Abdul-Jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, M. D. Smucker, and C. Wade. UMASS at TREC 2004 - novelty and hard. In Proceedings of TREC-13, 2004.
|
| |
2
|
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC3. In Proceedings of TREC-3, pages 69--80, 1994.
|
 |
3
|
|
| |
4
|
W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.
|
| |
5
|
|
| |
6
|
P. Diaconis. Group Theory in Statistics. Harvard Lecture Notes, 1982.
|
 |
7
|
|
 |
8
|
|
 |
9
|
Cynthia Dwork , Ravi Kumar , Moni Naor , D. Sivakumar, Rank aggregation methods for the Web, Proceedings of the 10th international conference on World Wide Web, p.613-622, May 01-05, 2001, Hong Kong, Hong Kong
[doi> 10.1145/371920.372165]
|
| |
10
|
E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proceedings of TREC-2, 1994.
|
 |
11
|
|
| |
12
|
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.
|
| |
13
|
J. Kleinberg. Authoritative sources in a hyperlinked environment. Technical Report Research Report RJ 10076, IBM, May 1997.
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
 |
17
|
|
 |
18
|
|
 |
19
|
John Lafferty , Chengxiang Zhai, Document language models, query models, and risk minimization for information retrieval, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.111-119, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383970]
|
 |
20
|
|
| |
21
|
V. Lavrenko and W. B. Croft. Relevance models in information retrieval. In Croft and Lafferty {5}, pages 11--56.
|
 |
22
|
|
| |
23
|
|
 |
24
|
|
| |
25
|
X. Liu and W. B. Croft. Experiments on retrieval of optimal clusters. Technical Report IR-478, Center for Intelligent Information Retrieval (CIIR), University of Massachusetts, 2006.
|
 |
26
|
|
 |
27
|
|
| |
28
|
S. E. Preece. Clustering as an output option. In Proceedings of the American Society for Information Science, pages 189--190, 1973.
|
| |
29
|
|
| |
30
|
J. G. Shanahan, J. Bennett, D. A. Evans, D. A. Hull, and J. Montgomery. Clairvoyance Corporation experiments in the TREC 2003. High accuracy retrieval from documents (HARD) track. In Proceedings of TREC-12, pages 152--160, 2003.
|
| |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.
|
 |
35
|
|
 |
36
|
|
| |
37
|
H. P. Young. An axiomatization of Borda's rule. Journal of Economic Theory, 9:43--52, 1974.
|
 |
38
|
|
 |
39
|
|
 |
40
|
|
| |
41
|
H. J. Zimmermann. Fuzzy Set Theory. Kluwer Academic, 3 edition, 1996.
|
|