|
ABSTRACT
Clustering is currently one of the most crucial techniques for dealing (e.g. resources locating, information interpreting) with massive amount of heterogeneous information on the web. Unlike clustering in other fields, web page clustering separates unrelated pages and clusters related pages (to a specific topic) into semantically meaningful groups, which is useful for discrimination, summarization, organization and navigation of unstructured web pages. We have proposed a contents-link coupled clustering algorithm that clusters web pages by combining contents and link analysis. In this paper, we particularly study the effects of out-links (from the web pages), in-links (to the web page) and terms on the final clustering results as well as how to effectively combine these three parts to improve the quality of clustering results. We apply it to cluster web search results. Preliminary experiments and evaluations are conducted on various topics. As the experimental results show, the proposed clustering algorithm is effective and promising.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
A.V. Leouski et. al. 96 An evaluation of techniques for clustering search results. Technical Report, University of Massachusetts, Amherst
|
| |
3
|
|
| |
4
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
 |
5
|
Douglass R. Cutting , David R. Karger , Jan O. Pedersen , John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.318-329, June 21-24, 1992, Copenhagen, Denmark
[doi> 10.1145/133160.133214]
|
| |
6
|
Daniel Boley et. al. Partitioning-based Clustering for web document Categorization , , it can be found at www.enterpriseware.net/ EWRoot/Files/ Boley1999a.pdf
|
| |
7
|
Dharmendra S Modha et.al 00 Clustering hypertext with applications to web search Research Report of IBM Almaden Research Center
|
| |
8
|
Einat Amitay Using common hypertext links to identify the best phrasal description of target web documents, SIGIR'98 workshop for Hypertext IR for the web
|
 |
9
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
| |
10
|
H. Small, Co-citation in the scientific literature: A new measure of the relationship between two documents, J. American Soc. Info. Sci., 24(1973), pp 265--269
|
 |
11
|
James Pitkow , Peter Pirolli, Life, death, and lawfulness on the electronic frontier, Proceedings of the SIGCHI conference on Human factors in computing systems, p.383-390, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258805]
|
| |
12
|
|
| |
13
|
|
| |
14
|
Lenoard Kaufman and Peter J. Rousseeuw. Finding groups in Data: an introduction to cluster analysis Wiley, 1990
|
| |
15
|
Michael Steinbach, et. al. A Comparison of Document Clustering techniques KDD'2000.
|
| |
16
|
M.M. Kessler, Bibliographic coupling between scientific papers American Documentation, 14(1963), pp 10--25
|
| |
17
|
|
| |
18
|
Oren Zamir and Oren Etzioni 97 Fast and Intuitive clustering of Web documents, KDD'97
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
Ron Weiss , Bienvenido Vélez , Mark A. Sheldon, HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering, Proceedings of the the seventh ACM conference on Hypertext, p.180-193, March 16-20, 1996, Bethesda, Maryland, United States
[doi> 10.1145/234828.234846]
|
| |
23
|
Taher H.Haveliwa et. al. 99 Scalable techniques for Clustering the Web.
|
| |
24
|
Taher H.Haveliwa et. al. Similarity Search on the Web: Evaluation and Scalability Considerations Extended Technical Report, 2000
|
| |
25
|
|
| |
26
|
Zhihua Jiang et. al. Retriever: Improving Web Search Engine Results Using Clustering
|
CITED BY 7
|
|
Gautam Pant , Kostas Tsioutsiouliklis , Judy Johnson , C. Lee Giles, Panorama: extending digital libraries with topical crawlers, Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2004, Tuscon, AZ, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
José A. Camacho-Guerrero , Alex A. Carvalho , Maria G. C. Pimentel , Ethan V. Munson , Alessandra A. Macedo, Clustering as an approach to support the automatic definition of semantic hyperlinks, Proceedings of the eighteenth conference on Hypertext and hypermedia, September 10-12, 2007, Manchester, UK
|
|