| Enhancing cluster labeling using wikipedia |
| Full text |
Pdf
(940 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
table of contents
Boston, MA, USA
SESSION: Web 2.0
table of contents
Pages 139-146
Year of Publication: 2009
ISBN:978-1-60558-483-6
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 163, Downloads (12 Months): 412, Citation Count: 0
|
|
|
ABSTRACT
This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The "labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling. Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
20 News Group (20NG) data. http://people.csail.mit.edu/jrennie/20newsgroups.
|
| |
2
|
T. Brants and A. Franz. Web 1T 5-gram Version 1. 2006.
|
 |
3
|
David Carmel , Elad Yom-Tov , Adam Darlow , Dan Pelleg, What makes a query difficult?, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
[doi> 10.1145/1148170.1148238]
|
| |
4
|
|
| |
5
|
|
 |
6
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
7
|
W. de Winter and M. de Rijke. Identifying facets in query-biased sets of blog posts. In ICWSM'07, pages 251--254, 2007.
|
| |
8
|
|
| |
9
|
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI '07, pages 1606--1611, Hyderabad, India, 2007.
|
| |
10
|
F. Geraci, M. Maggini, M. Pellegrini, and F. Sebastiani. Cluster generation and cluster labelling for web snippets:a fast and accurate hierarchical solution. Internet Mathematics, 2007.
|
 |
11
|
Eric Glover , David M. Pennock , Steve Lawrence , Robert Krovetz, Inferring hierarchical descriptions, Proceedings of the eleventh international conference on Information and knowledge management, November 04-09, 2002, McLean, Virginia, USA
[doi> 10.1145/584792.584876]
|
 |
12
|
Jian Hu , Lujun Fang , Yang Cao , Hua-Jun Zeng , Hua Li , Qiang Yang , Zheng Chen, Enhancing text clustering by leveraging Wikipedia semantics, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390367]
|
| |
13
|
|
| |
14
|
Open Directory Project (ODP). http://www.dmoz.org/.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
M. Strube and S.P. Ponzetto. Wikirelate! computing semantic relatedness using wikipedia. July 2006.
|
| |
19
|
Z.S. Syed, T. Finin, and A. Joshi. Wikipedia as an ontology for describing documents. In ICWSM '08, 2008.
|
 |
20
|
|
 |
21
|
|
|