ACM Home Page
Please provide us with feedback. Feedback
A comparative evaluation of different link types on enhancing document clustering
Full text PdfPdf (299 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Singapore, Singapore
SESSION: Clustering--2 table of contents
Pages 555-562  
Year of Publication: 2008
ISBN:978-1-60558-164-4
Authors
Xiaodan Zhang  Drexel University, Philadelphia, PA, USA
Xiaohua Hu  Drexel University, Philadelphia, PA, USA
Xiaohua Zhou  Drexel University, Philadelphia, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 28,   Downloads (12 Months): 305,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390334.1390429
What is a DOI?

ABSTRACT

With a growing number of works utilizing link information in enhancing document clustering, it becomes necessary to make a comparative evaluation of the impacts of different link types on document clustering. Various types of links between text documents, including explicit links such as citation links and hyperlinks, implicit links such as co-authorship links, and pseudo links such as content similarity links, convey topic similarity or topic transferring patterns, which is very useful for document clustering. In this study, we adopt a Relaxation Labeling (RL)-based clustering algorithm, which employs both content and linkage information, to evaluate the effectiveness of the aforementioned types of links for document clustering on eight datasets. The experimental results show that linkage is quite effective in improving content-based document clustering. Furthermore, a series of interesting findings regarding the impacts of different link types on document clustering are discovered through our experiments.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
Cohn, D. and Hofmann,T. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS 13, 2001.
 
5
 
6
Erkan, G., Radev, D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res. (JAIR) 22: 457--479 (2004)
 
7
 
8
 
9
 
10
He, X., Zha, H, Ding, C. and Simon, H. Web document clustering using hyperlink structures, Tech. Rep. CSE-01-006, Dept. of CS and Eng., Pennsylvania State University, 2001.
 
11
 
12
Pelkowitz, L. A continuous relaxation labeling algorithm for markov random fields. IEEE transactions on Systems, Man and Cybernetics, Vol 20 No.3:709--715, 1990.
 
13
Lu,Q. and Getoor, L. Link-based classification. ICML, 2003.
 
14
 
15
16
17
 
18
Page, L., Brin,S., Motwani, R., and Winograd,T. The PageRank citation ranking: Bringing order to the Web. Technical report, 1998.
 
19
 
20
Strehl, A., Ghosh, J. andMooney, R. J. Impact of similarity measures on web-page clustering. In AAAI Workshop, 2000.
21
22
 
23
 
24
Zhao, Y. and Karypis, G. Criterion functions for document clustering: experiments and analysis, Technical Report, Department of Computer Science, Univ. of Minnesota, 2001
 
25
Zhou X., Zhang X. and Hu X., Semantic Smoothing of Document Models for Agglomerative Clustering, IJCAI 2007, 2922--2927.
 
26

Collaborative Colleagues:
Xiaodan Zhang: colleagues
Xiaohua Hu: colleagues
Xiaohua Zhou: colleagues