ACM Home Page
Please provide us with feedback. Feedback
Towards language-independent web genre detection
Full text PdfPdf (512 KB)
Source
International World Wide Web Conference archive
Proceedings of the 18th international conference on World wide web table of contents
Madrid, Spain
POSTER SESSION: Thursday, April 23, 2009 table of contents
Pages 1157-1158  
Year of Publication: 2009
ISBN:978-1-60558-487-4
Authors
Philipp Scholl  Technische Universität Darmstadt, Darmstadt, Germany
Renato Domínguez García  Technische Universität Darmstadt, Darmstadt, Germany
Doreen Böhnstedt  Technische Universität Darmstadt, Darmstadt, Germany
Christoph Rensing  Technische Universität Darmstadt, Darmstadt, Germany
Ralf Steinmetz  Technische Universität Darmstadt, Darmstadt, Germany
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 55,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1526709.1526905
What is a DOI?

ABSTRACT

The term web genre denotes the type of a given web resource, in contrast to the topic of its content. In this research, we focus on recognizing the web genres blog, wiki and forum. We present a set of features that exploit the hierarchical structure of the web page's HTML mark-up and thus, in contrast to related approaches, do not depend on a linguistic analysis of the page's content. Our results show that it is possible to achieve a very good accuracy for a fully language independent detection of structured web genres.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Domínguez García, P. Scholl, D. Böhnstedt, C. Rensing, and R. Steinmetz. Automatic web genre classification using structural features. Technical Report KOM-TR-2008-06, Multimedia Kommunikation -- TU Darmstadt, Germany, July 2008.
 
2
S. Meyer zu Eissen and B. Stein. Genre classification of web pages -- user study and feasibility analysis. In KI 2004: Advances in Artificial Intelligence, volume 3238 of LNCS, pages 256--269. Springer Berlin / Heidelberg, 2004.
 
3
 
4
M. Santini. Automatic Identification of Genre in Web Pages. PhD thesis, University of Brighton, January 2007.

Collaborative Colleagues:
Philipp Scholl: colleagues
Renato Domínguez García: colleagues
Doreen Böhnstedt: colleagues
Christoph Rensing: colleagues
Ralf Steinmetz: colleagues