ACM Home Page
Please provide us with feedback. Feedback
Automated conversion of table-based websites to structured stylesheets using table recognition and clone detection
Full text PdfPdf (1.81 MB)
Source IBM Centre for Advanced Studies Conference archive
Proceedings of the 2007 conference of the center for advanced studies on Collaborative research table of contents
Richmond Hill, Ontario, Canada
SESSION: Packaging and delivery of web pages and web applications table of contents
Pages: 12 - 26  
Year of Publication: 2007
ISSN:1705-7361
Authors
Andy Y. Mao  Queen's University, Kingston, Ontario, Canada
James R. Cordy  Queen's University, Kingston, Ontario, Canada
Thomas R. Dean  Queen's University, Kingston, Ontario, Canada
Sponsors
: IBM Toronto Software Lab
: IBM Centers for Advanced Studies (CAS)
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 55,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1321211.1321214
What is a DOI?

ABSTRACT

Web standards such as XHTML and CSS are rapidly coming into practice and have many advantages, including compatibility, consistency across browsers, and increased ease of maintenance. Unfortunately large numbers of existing websites still use the deprecated table-based layout style in which page style is unique to each page. Existing tools for automating the transition to stylesheets provide little help, converting page-by-page using a flattened structure and local inline styles rather than a common CSS stylesheet. This approach ignores hierarchical structure and defeats the main purpose of moving to the new standard, losing all of the advantages.

In this work we present an automated method for converting table-based layout websites to standards-compliant modern CSS stylesheet-based websites using a two-step process. Pages of the site are first converted page-by-page using table recognition technology to preserve hierarchical structure and layout semantics in local styles. Software clone detection technology is then utilized to recognize common layout styles in the pages and extract and minimize them to a common CSS stylesheet for the site. The result is a maintainable, efficient modern standards-compliant website with the same look and feel as the original but with all the maintenance advantages of a custom programmed new site.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Adobe. Dreamweaver CS3. http://www.adobe.com/products/dreamweayer/.
 
2
D. T. Barnard and R. C. Holt. Hierarchic Syntax Error Repair for LR Grammars. Int. J. Computing and Information Sciences, 11(4):231--258, 1982.
 
3
 
4
 
5
J. Clark. XSL Transformations (XSLT) v1.0. http://www.w3.org/TR/xslt, 1999.
 
6
 
7
 
8
 
9
J. Handley. Table Analysis for Multi-line Cell Identification. In Document Recognition and Retrieval VIII, volume 4307, pages 34--43, 2001.
 
10
 
11
J. Hu, R. Kashi, D. Lopresti, and G. Wil-fong. Table Structure Recognition and its Evaluation. In Doc. Recog. and Retrieval VIII, volume 4307, pages 44--55, 2001.
 
12
M. Hurst. Layout and Language: Challenges for Table Understanding on the Web. In 1st Int. Workshop on Web Document Analysis, pages 27--30, 2001.
 
13
 
14
R. Koschke. A Survey of Research on Software Clones. In Duplication, Redundancy, and Similarity in Software, number 06301 in Dagstuhl Seminar Proceedings, 2007.
 
15
 
16
 
17
18
 
19
E. Visser. Program Transformation with Stratego/XT: Rules, Strategies, Tools, and Systems in Stratego/XT 0.9. In Domain-Specific Program Generation, volume 3016 of LNCS, pages 216--238, 2004.
 
20
 
21
M. Yoshida, K. Torisawa, and J. Tsujii. A method to integrate tables of the world wide web. In 1st int. Workshop on Web Document Analysis, pages 31--34, 2001.
 
22
 
23
Collaborative Colleagues:
Andy Y. Mao: colleagues
James R. Cordy: colleagues
Thomas R. Dean: colleagues