ACM Home Page
Please provide us with feedback. Feedback
Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework
Full text PdfPdf (2.90 MB)
Source International World Wide Web Conference archive
Proceedings of the 15th international conference on World Wide Web table of contents
Edinburgh, Scotland
SESSION: Adapitivity & mobility table of contents
Pages: 33 - 42  
Year of Publication: 2006
ISBN:1-59593-323-9
Author
Shumeet Baluja  Google, Inc., Mountain View, CA
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 136,   Citation Count: 12
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1135777.1135788
What is a DOI?

ABSTRACT

Fitting enough information from webpages to make browsing on small screens compelling is a challenging task. One approach is to present the user with a thumbnail image of the full web page and allow the user to simply press a single key to zoom into a region (which may then be transcoded into wml/xhtml, summarized, etc). However, if regions for zooming are presented naively, this yields a frustrating experience because of the number of coherent regions, sentences, images, and words that may be inadvertently separated. Here, we cast the web page segmentation problem into a machine learning framework, where we re-examine this task through the lens of entropy reduction and decision tree learning. This yields an efficient and effective page segmentation algorithm. We demonstrate how simple techniques from computer vision can be used to fine-tune the results. The resulting segmentation keeps coherent regions together when tested on a broad set of complex webpages.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Milic-Frayling, N. and Sommerer, R. (2002) "SmartView: Enhanced Document Viewer for Mobile Devices.ö MSR-TR-2002-114 (2002).
 
2
Milic-Frayling, N. and Sommerer, R., Rodden, K., Blackwell, A. (2003) "SearchMobil: Web Viewing and Search for Mobile Devicesö Proc. WWW 2003.
3
4
 
5
Hedman, A., Carr, D., & Nassla, H. (2004) "Browsing Thumbnails: A Comparison of Three Techniquesöö. Proc. 26th International Conference on Information Technology Interfaces.
 
6
Cai, D., Yu, S., Wen, J.R., Ma, W.Y. (2003), "VIPS: A vision-based segmentation algorithmö. MSR-TR-2003-70. Nov. 2003.
 
7
 
8
Berwick, B. (2003): Lecture Notes, MIT Class 6.034 AI, Recitation #9 "Nearest Neighbors + ID Treesö, Fall 2003 http://www.ai.mit.edu/courses/6.034b/recitation9.pdf
 
9
Moore, A. (2003): "Information Gainö, Lecture Notes. http://www.autonlab.org/tutorials/
 
10
Loper, E. (2003): "Decision Treesö, Lecture Notes, http://www.cis.upenn.edu/ edloper/slides/
11
12
13
14
15
16

CITED BY  12