ACM Home Page
Please provide us with feedback. Feedback
Enhancing cluster labeling using wikipedia
Full text PdfPdf (940 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval table of contents
Boston, MA, USA
SESSION: Web 2.0 table of contents
Pages 139-146  
Year of Publication: 2009
ISBN:978-1-60558-483-6
Authors
David Carmel  IBM Haifa Research Lab, Haifa, Israel
Haggai Roitman  IBM Haifa Research Lab, Haifa, Israel
Naama Zwerdling  IBM Haifa Research Lab, Haifa, Israel
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 163,   Downloads (12 Months): 412,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1571941.1571967
What is a DOI?

ABSTRACT

This work investigates cluster labeling enhancement by utilizing Wikipedia, the free on-line encyclopedia. We describe a general framework for cluster labeling that extracts candidate labels from Wikipedia in addition to important terms that are extracted directly from the text. The "labeling quality" of each candidate is then evaluated by several independent judges and the top evaluated candidates are recommended for labeling.

Our experimental results reveal that the Wikipedia labels agree with manual labels associated by humans to a cluster, much more than with significant terms that are extracted directly from the text. We show that in most cases even when human's associated label appears in the text, pure statistical methods have difficulty in identifying them as good descriptors. Furthermore, our experiments show that for more than 85% of the clusters in our test collection, the manual label (or an inflection, or a synonym of it) appears in the top five labels recommended by our system.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
20 News Group (20NG) data. http://people.csail.mit.edu/jrennie/20newsgroups.
 
2
T. Brants and A. Franz. Web 1T 5-gram Version 1. 2006.
3
 
4
 
5
6
 
7
W. de Winter and M. de Rijke. Identifying facets in query-biased sets of blog posts. In ICWSM'07, pages 251--254, 2007.
 
8
 
9
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI '07, pages 1606--1611, Hyderabad, India, 2007.
 
10
F. Geraci, M. Maggini, M. Pellegrini, and F. Sebastiani. Cluster generation and cluster labelling for web snippets:a fast and accurate hierarchical solution. Internet Mathematics, 2007.
11
12
 
13
 
14
Open Directory Project (ODP). http://www.dmoz.org/.
 
15
 
16
 
17
 
18
M. Strube and S.P. Ponzetto. Wikirelate! computing semantic relatedness using wikipedia. July 2006.
 
19
Z.S. Syed, T. Finin, and A. Joshi. Wikipedia as an ontology for describing documents. In ICWSM '08, 2008.
20
21

Collaborative Colleagues:
David Carmel: colleagues
Haggai Roitman: colleagues
Naama Zwerdling: colleagues