ACM Home Page
Please provide us with feedback. Feedback
Stylistic and lexical co-training for web block classification
Full text PdfPdf (350 KB)
Source Workshop On Web Information And Data Management archive
Proceedings of the 6th annual ACM international workshop on Web information and data management table of contents
Washington DC, USA
SESSION: Web mining and clustering table of contents
Pages: 136 - 143  
Year of Publication: 2004
ISBN:1-58113-978-0
Authors
Chee How Lee  National University of Singapore
Min-Yen Kan  National University of Singapore
Sandra Lai  National University of Singapore
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 31,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031453.1031478
What is a DOI?

ABSTRACT

Many applications which use web data extract information from a limited number of regions on a web page. As such, web page division into blocks and the subsequent block classification have become a preprocessing step. We introduce PARCELS, an open-source, co-trained approach that performs classification based on separate stylistic and lexical views of the web page. Unlike previous work, PARCELS performs classification on fine-grained blocks. In addition to table-based layout, the system handles real-world pages which feature layout based on divisions and spans as well as stylistic inference for pages using cascaded style sheets. Our evaluation shows that the co-training process results in a reduction of 28.5% in error rate over a single-view classifier and that our approach is comparable to other state-of-the-art systems.




Collaborative Colleagues:
Chee How Lee: colleagues
Min-Yen Kan: colleagues
Sandra Lai: colleagues