ACM Home Page
Please provide us with feedback. Feedback
Classifying web sites
Full text PdfPdf (62 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
POSTER SESSION: Search table of contents
Pages: 1143 - 1144  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Christoph Lindemann  University of Leipzig
Lars Littig  University of Leipzig
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 74,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242736
What is a DOI?

ABSTRACT

In this paper, we present a novel method for the classification of Web sites. This method exploits both structure and content of Web sites in order to discern their functionality. It allows for distinguishing between eight of the most relevant functional classes of Web sites. We show that a pre-classification of Web sites utilizing structural properties considerably improves a subsequent textual classification with standard techniques. We evaluate this approach on a dataset comprising more than 16,000 Web sites with about 20 million crawled and 100 million known Web pages. Our approach achieves an accuracy of 92% for the coarse-grained classification of these Web sites.




Collaborative Colleagues:
Christoph Lindemann: colleagues
Lars Littig: colleagues