| A comparative evaluation of different link types on enhancing document clustering |
| Full text |
Pdf
(299 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Singapore, Singapore
SESSION: Clustering--2
table of contents
Pages 555-562
Year of Publication: 2008
ISBN:978-1-60558-164-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 28, Downloads (12 Months): 305, Citation Count: 0
|
|
|
ABSTRACT
With a growing number of works utilizing link information in enhancing document clustering, it becomes necessary to make a comparative evaluation of the impacts of different link types on document clustering. Various types of links between text documents, including explicit links such as citation links and hyperlinks, implicit links such as co-authorship links, and pseudo links such as content similarity links, convey topic similarity or topic transferring patterns, which is very useful for document clustering. In this study, we adopt a Relaxation Labeling (RL)-based clustering algorithm, which employs both content and linkage information, to evaluate the effectiveness of the aforementioned types of links for document clustering on eight datasets. The experimental results show that linkage is quite effective in improving content-based document clustering. Furthermore, a series of interesting findings regarding the impacts of different link types on document clustering are discovered through our experiments.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
 |
3
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
| |
4
|
Cohn, D. and Hofmann,T. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS 13, 2001.
|
| |
5
|
|
| |
6
|
Erkan, G., Radev, D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res. (JAIR) 22: 457--479 (2004)
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
He, X., Zha, H, Ding, C. and Simon, H. Web document clustering using hyperlink structures, Tech. Rep. CSE-01-006, Dept. of CS and Eng., Pennsylvania State University, 2001.
|
| |
11
|
|
| |
12
|
Pelkowitz, L. A continuous relaxation labeling algorithm for markov random fields. IEEE transactions on Systems, Man and Cybernetics, Vol 20 No.3:709--715, 1990.
|
| |
13
|
Lu,Q. and Getoor, L. Link-based classification. ICML, 2003.
|
| |
14
|
|
| |
15
|
|
 |
16
|
|
 |
17
|
|
| |
18
|
Page, L., Brin,S., Motwani, R., and Winograd,T. The PageRank citation ranking: Bringing order to the Web. Technical report, 1998.
|
| |
19
|
|
| |
20
|
Strehl, A., Ghosh, J. andMooney, R. J. Impact of similarity measures on web-page clustering. In AAAI Workshop, 2000.
|
 |
21
|
|
 |
22
|
Ron Weiss , Bienvenido Vélez , Mark A. Sheldon, HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering, Proceedings of the the seventh ACM conference on Hypertext, p.180-193, March 16-20, 1996, Bethesda, Maryland, United States
[doi> 10.1145/234828.234846]
|
| |
23
|
|
| |
24
|
Zhao, Y. and Karypis, G. Criterion functions for document clustering: experiments and analysis, Technical Report, Department of Computer Science, Univ. of Minnesota, 2001
|
| |
25
|
Zhou X., Zhang X. and Hu X., Semantic Smoothing of Document Models for Agglomerative Clustering, IJCAI 2007, 2922--2927.
|
| |
26
|
|
|