|
ABSTRACT
Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. (1998). Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194--218.
|
| |
2
|
Croft, W. B. (1980). A model of cluster searching based on classification. Information Systems, Vol. 5, pp. 189--195.
|
| |
3
|
|
| |
4
|
|
| |
5
|
Evans, D.A.; Huettner, A.; Tong, X.; Jansen, P.; & Bennett, J. (1999). Effectiveness of clustering in ad-hoc retrieval. In TREC-7 proceedings, pp. 90--95.
|
| |
6
|
Griffiths, A., Luckhurst, H.C., and Willett, P. (1986). Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science, 37, pp. 3--11.
|
 |
7
|
|
| |
8
|
Jardine, N. and van Rijsbergen, C.J. (1971). The use of hierarchical clustering in information retrieval. Information Storage and Retrieval, 7:217--240.
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
 |
12
|
David R. H. Miller , Tim Leek , Richard M. Schwartz, A hidden Markov model information retrieval system, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.214-221, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312680]
|
| |
13
|
Ponte, J. (2001). Is information retrieval anything more than smoothing? In Proceedings of the Workshop on Language Modeling and Information Retrieval, Carnegie Mellon University, Pittsburgh.
|
 |
14
|
|
| |
15
|
Rosenfeld, R. (2000). Two decades of statistical language modeling: where do we go from here? In Proceedings of the IEEE, 88(8), 2000.
|
| |
16
|
|
| |
17
|
|
| |
18
|
Spitters, M., and Kraaij, W. (2001). TNO at TDT2001: Language model-based topic detection. In Topic Detection and Tracking Workshop Report.
|
| |
19
|
|
| |
20
|
|
| |
21
|
van Rijsbergen, C.J. & Croft, W. B. (1975). Document clustering: An evaluation of some experiments with the Cranfield 1400 collection. Information Processing & Management, 11, pp. 171--182.
|
 |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Willet, P. (1985). Query specific automatic document classification. International Forum on Information and Documentation, 10(2), pp. 28--32.
|
 |
26
|
|
| |
27
|
Yamron, J.P., Carp, I., Gillick, L., Lowe, S.A., and van Mulbregt, P. (1999). Topic tracking in a news stream. In Proceedings of the DARPA Broadcast News Workshop.
|
 |
28
|
|
 |
29
|
|
CITED BY 56
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nachiketa Sahoo , Jamie Callan , Ramayya Krishnan , George Duncan , Rema Padman, Incremental hierarchical clustering of text documents, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
Hongyuan Zha , Zhaohui Zheng , Haoying Fu , Gordon Sun, Incorporating query difference for learning retrieval functions in world wide web search, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
Lingpeng Yang , Donghong Ji , Guodong Zhou , Yu Nie , Guozheng Xiao, Document re-ranking using cluster validation and label propagation, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
|
|
|
|
|
|
Tao Tao , Xuanhui Wang , Qiaozhu Mei , ChengXiang Zhai, Language model information retrieval with document expansion, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.407-414, June 04-09, 2006, New York, New York
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Huanhuan Cao , Daxin Jiang , Jian Pei , Enhong Chen , Hang Li, Towards context-aware search by learning a very large variable length hidden markov model from search logs, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
Donald Metzler , Jasmine Novak , Hang Cui , Srihari Reddy, Building enriched document representations using aggregated anchor text, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models
General Terms:
Experimentation,
Theory
Keywords:
cluster model,
cluster-based language model,
cluster-based retrieval,
hierarchical clustering,
information retrieval,
language model,
query-specific clustering,
smoothing,
static clustering,
topic model
|