ACM Home Page
Please provide us with feedback. Feedback
Using web structure for classifying and describing web pages
Full text PdfPdf (136 KB)
Source International World Wide Web Conference archive
Proceedings of the 11th international conference on World Wide Web table of contents
Honolulu, Hawaii, USA
SESSION: Description and Analysis table of contents
Pages: 562 - 569  
Year of Publication: 2002
ISBN:1-58113-449-5
Authors
Eric J. Glover  NEC Research Institute, Princeton, NJ
Kostas Tsioutsiouliklis  NEC Research Institute, Princeton, NJ and Princeton University, Princeton, NJ
Steve Lawrence  NEC Research Institute, Princeton, NJ
David M. Pennock  NEC Research Institute, Princeton, NJ
Gary W. Flake  NEC Research Institute, Princeton, NJ
Sponsors
ACM: Association for Computing Machinery
: WWW'02
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 74,   Citation Count: 42
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/511446.511520
What is a DOI?

ABSTRACT

The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classification and description. Results show that the text in citing documents, when available, often has greater discriminative and descriptive power than the text in the target document itself. The combination of evidence from a document and citing documents can improve on either information source alone. Moreover, by ranking words and phrases in the citing documents according to expected entropy loss, we are able to accurately name clusters of web pages, even with very few positive examples. Our results confirm, quantify, and extend previous research using web structure in these areas, introducing new methods for classification and description of pages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
N. Abramson. Information Theory and Coding. McGraw-Hill, New York, 1963.
 
2
G. Attardi, A. Gullí, and F. Sebastiani. Automatic Web page categorization by link and context analysis. In C. Hutchison and G. Lanzarone, editors, Proceedings of THAI-99, 1st European Symposium on Telematics, Hypermedia and Artificial Intelligence, pages 105--119, Varese, IT, 1999.
3
 
4
5
 
6
 
7
8
 
9
 
10
 
11
 
12
13
 
14
J. T.-Y. Kwok. Automated text categorization using support vector machine. In Proceedings of the International Conference on Neural Information Processing (ICONIP), pages 347--351, Kitakyushu, Japan, 1999.
 
15
S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400(July 8):107--109, 1999.
 
16
 
17
 
18
 
19
 
20

CITED BY  42

Collaborative Colleagues:
Eric J. Glover: colleagues
Kostas Tsioutsiouliklis: colleagues
Steve Lawrence: colleagues
David M. Pennock: colleagues
Gary W. Flake: colleagues