| A comparison of implicit and explicit links for web page classification |
| Full text |
Pdf
(178 KB)
|
| Source
|
International World Wide Web Conference
archive
Proceedings of the 15th international conference on World Wide Web
table of contents
Edinburgh, Scotland
SESSION: Data mining classification
table of contents
Pages: 643 - 650
Year of Publication: 2006
ISBN:1-59593-323-9
|
|
Authors
|
|
Dou Shen
|
Hong Kong University of Science and Technology, Kowloon, Hong Kong
|
|
Jian-Tao Sun
|
Microsoft Research Asia, Beijing, P.R.China
|
|
Qiang Yang
|
Hong Kong University of Science and Technology, Kowloon, Hong Kong
|
|
Zheng Chen
|
Microsoft Research Asia, Beijing, P.R.China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 102, Citation Count: 8
|
|
|
ABSTRACT
It is well known that Web-page classification can be enhanced by using hyperlinks that provide linkages between Web pages. However, in the Web space, hyperlinks are usually sparse, noisy and thus in many situations can only provide limited help in classification. In this paper, we extend the concept of linkages from explicit hyperlinks to implicit links built between Web pages. By observing that people who search the Web with the same queries often click on different, but related documents together, we draw implicit links between Web pages that are clicked after the same queries. Those pages are implicitly linked. We provide an approach for automatically building the implicit links between Web pages using Web query logs, together with a thorough comparison between the uses of implicit and explicit links in Web page classification. Our experimental results on a large dataset confirm that the use of the implicit links is better than using explicit links in classification performance, with an increase of more than 10.5% in terms of the Macro-F1 measurement.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Hourly analysis of a very large topically categorized web query log, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
[doi> 10.1145/1008992.1009048]
|
 |
3
|
Soumen Chakrabarti , Byron Dom , Piotr Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.307-318, June 01-04, 1998, Seattle, Washington, United States
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
|
| |
8
|
|
 |
9
|
Eric J. Glover , Kostas Tsioutsiouliklis , Steve Lawrence , David M. Pennock , Gary W. Flake, Using web structure for classifying and describing web pages, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511520]
|
| |
10
|
|
| |
11
|
T. Joachims. Learning to classify text using support vector machines. Dissertation, Kluwer, 2002.
|
| |
12
|
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.
|
| |
13
|
|
 |
14
|
|
| |
15
|
C. Quek. Classification of world wide web documents. Thesis, School of Computer Science, CMU, 1997.
|
 |
16
|
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
Gui-Rong Xue , Dou Shen , Qiang Yang , Hua-Jun Zeng , Zheng Chen , Yong Yu , WenSi Xi , Wei-Ying Ma, IRC: An Iterative Reinforcement Categorization Algorithm for Interrelated Web Objects, Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04), p.273-280, November 01-04, 2004
|
| |
22
|
|
CITED BY 8
|
|
|
|
|
Nayer M. Wanas , Dina A. Said , Nadia H. Hegazy , Nevin M. Darwish, A study of local and global thresholding techniques in text categorization, Proceedings of the fifth Australasian conference on Data mining and analystics, p.91-101, November 29-30, 2006, Sydney, Australia
|
|
|
|
|
|
Xiaoxun Zhang , Xueying Wang , Honglei Guo , Zhili Guo , Xian Wu , Zhong Su, Floatcascade learning for fast imbalanced web mining, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
Carlos Castillo , Claudio Corsi , Debora Donato , Paolo Ferragina , Aristides Gionis, Query-log mining for detecting spam, Proceedings of the 4th international workshop on Adversarial information retrieval on the web, April 22-22, 2008, Beijing, China
|
|
|
Kerstin Bischoff , Claudiu S. Firan , Wolfgang Nejdl , Raluca Paiu, Can all tags be used for search?, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|