|
ABSTRACT
Fitting enough information from webpages to make browsing on small screens compelling is a challenging task. One approach is to present the user with a thumbnail image of the full web page and allow the user to simply press a single key to zoom into a region (which may then be transcoded into wml/xhtml, summarized, etc). However, if regions for zooming are presented naively, this yields a frustrating experience because of the number of coherent regions, sentences, images, and words that may be inadvertently separated. Here, we cast the web page segmentation problem into a machine learning framework, where we re-examine this task through the lens of entropy reduction and decision tree learning. This yields an efficient and effective page segmentation algorithm. We demonstrate how simple techniques from computer vision can be used to fine-tune the results. The resulting segmentation keeps coherent regions together when tested on a broad set of complex webpages.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Milic-Frayling, N. and Sommerer, R. (2002) "SmartView: Enhanced Document Viewer for Mobile Devices.ö MSR-TR-2002-114 (2002).
|
| |
2
|
Milic-Frayling, N. and Sommerer, R., Rodden, K., Blackwell, A. (2003) "SearchMobil: Web Viewing and Search for Mobile Devicesö Proc. WWW 2003.
|
 |
3
|
|
 |
4
|
|
| |
5
|
Hedman, A., Carr, D., & Nassla, H. (2004) "Browsing Thumbnails: A Comparison of Three Techniquesöö. Proc. 26th International Conference on Information Technology Interfaces.
|
| |
6
|
Cai, D., Yu, S., Wen, J.R., Ma, W.Y. (2003), "VIPS: A vision-based segmentation algorithmö. MSR-TR-2003-70. Nov. 2003.
|
| |
7
|
|
| |
8
|
Berwick, B. (2003): Lecture Notes, MIT Class 6.034 AI, Recitation #9 "Nearest Neighbors + ID Treesö, Fall 2003 http://www.ai.mit.edu/courses/6.034b/recitation9.pdf
|
| |
9
|
Moore, A. (2003): "Information Gainö, Lecture Notes. http://www.autonlab.org/tutorials/
|
| |
10
|
Loper, E. (2003): "Decision Treesö, Lecture Notes, http://www.cis.upenn.edu/ edloper/slides/
|
 |
11
|
Allison Woodruff , Andrew Faulring , Ruth Rosenholtz , Julie Morrsion , Peter Pirolli, Using thumbnails to search the Web, Proceedings of the SIGCHI conference on Human factors in computing systems, p.198-205, March 2001, Seattle, Washington, United States
[doi> 10.1145/365024.365098]
|
 |
12
|
|
 |
13
|
Staffan Björk , Lars Erik Holmquist , Johan Redström , Ivan Bretan , Rolf Danielsson , Jussi Karlgren , Kristofer Franzén, WEST: a Web browser for small terminals, Proceedings of the 12th annual ACM symposium on User interface software and technology, p.187-196, November 07-10, 1999, Asheville, North Carolina, United States
[doi> 10.1145/320719.322601]
|
 |
14
|
Orkut Buyukkokten , Hector Garcia-Molina , Andreas Paepcke , Terry Winograd, Power browser: efficient Web browsing for PDAs, Proceedings of the SIGCHI conference on Human factors in computing systems, p.430-437, April 01-06, 2000, The Hague, The Netherlands
[doi> 10.1145/332040.332470]
|
 |
15
|
|
 |
16
|
|
CITED BY 12
|
|
Tim Berners-Lee , Wendy Hall , James A. Hendler , Kieron O'Hara , Nigel Shadbolt , Daniel J. Weitzner, A framework for web science, Foundations and Trends in Web Science, v.1 n.1, p.1-130, January 2006
|
|
|
Nilton Bila , Troy Ronda , Iqbal Mohomed , Khai N. Truong , Eyal de Lara, PageTailor: reusable end-user customization for the mobile web, Proceedings of the 5th international conference on Mobile systems, applications and services, June 11-13, 2007, San Juan, Puerto Rico
|
|
|
Gen Hattori , Keiichiro Hoashi , Kazunori Matsumoto , Fumiaki Sugaya, Robust web page segmentation for mobile terminal using content-distances and page layout information, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
Xiangye Xiao , Qiong Luo , Dan Hong , Hongbo Fu , Xing Xie , Wei-Ying Ma, Browsing on small displays by transforming Web pages into hierarchically structured subpages, ACM Transactions on the Web (TWEB), v.3 n.1, p.1-36, January 2009
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
I. V. Ramakrishnan , Jalal Mahmud , Yevgen Borodin , Muhammad Asiful Islam , Faisal Ahmed, Bridging the Web Accessibility Divide, Electronic Notes in Theoretical Computer Science (ENTCS), 235, p.107-124, April, 2009
|
|