ACM Home Page
Please provide us with feedback. Feedback
Extracting reusable document components for variable data printing
Full text PdfPdf (749 KB)
Source
Document Engineering archive
Proceedings of the 2007 ACM symposium on Document engineering table of contents
Winnipeg, Manitoba, Canada
SESSION: Variable data printing table of contents
Pages: 44 - 52  
Year of Publication: 2007
ISBN:978-1-59593-776-6
Authors
Steven R. Bagley  University of Nottingham
David F. Brailsford  University of Nottingham
James A. Ollis  University of Nottingham
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 56,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1284420.1284435
What is a DOI?

ABSTRACT

Variable Data Printing (VDP) has brought new flexibility and dynamism to the printed page. Every printed instance of a specific class of document can now have different degrees of customized content within the document template.

This flexibility comes at a cost. If every printed page is potentially different from all others it must be rasterized separately, which is a time-consuming process. Technologies such as PPML (Personalized Print Markup Language) attempt to address this problem by dividing the bitmapped page into components that can be cached at the raster level, thereby speeding up the generation of page instances.

A large number of documents are stored in Page Description Languages at a higher level of abstraction than the bitmapped page. Much of this content could be reused within a VDP environment provided that separable document components can be identified and extracted. These components then need to be individually rasterisable so that each high-level component can be related to its low-level (bitmap) equivalent. Unfortunately, the unstructured nature of most Page Description Languages makes it difficult to extract content easily.

This paper outlines the problems encountered in extracting component-based content from existing page description formats, such as PostScript, PDF and SVG, and how the differences between the formats affects the ease with which content can be extracted. The techniques are illustrated with reference to a tool called COG Extractor, which extracts content from PDF and SVG and prepares it for reuse.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Adobe Systems Inc, PDF Reference (Third Edition; PDF 1.4), Addison Wesley.
 
3
SVG 1.2 - Multiple Pages. http://www.w3.org/TR/2004/WD-SVG12-20041027/multipage.html
 
4
HP Indigo. http://www.hpl.hp.com/news/2006/jan-mar/indigo.html
5
 
6
John Lumley, Roger Gimson, and Owen Rees, "Extensible Layout in Functional Documents," in SPIE/EI 2006 Digital Publishing Conference, January 2006.
 
7
PODi, Print markup language functional specification version 2.1, June 23 2003. http://www.podi.org
8
9
10
 
11

Collaborative Colleagues:
Steven R. Bagley: colleagues
David F. Brailsford: colleagues
James A. Ollis: colleagues