|
ABSTRACT
Document representations can rapidly become unwieldy if they try to encapsulate all possible document properties, ranging from abstract structure to detailed rendering and layout.We present a composite document approach wherein an XML-based document representation is linked via a 'shadow tree' of bi-directional pointers to a PDF representation of the same document. Using a two-window viewer any material selected in the PDF can be related back to the corresponding material in the XML, and vice versa. In this way the treatment of specialist material such as mathematics, music or chemistry (e.g. via 'read aloud' or 'play aloud') can be activated via standard tools working within the XML representation, rather than requiring that application-specific structures be embedded in the PDF itself.The problems of textual recognition and tree pattern matching between the two representations are discussed in detail.Comparisons are drawn between our use of a shadow tree of pointers to map between document representations and the use of a code-replacement shadow tree in technologies such as XBL.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
Adobe Systems Inc, PDF Reference (Third Edition; PDF 1.4), Addison Wesley, 2002. ISBN 0201758393.
|
| |
4
|
OpenDoc Programmers' Guide, Addison Wesley Publishing Company, 1995. ISBN 0-202-47954-0.
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
| |
8
|
Henry S. Thompson and David McKelvie, "Hyperlink semantics for standoff markup of read-only documents," in Proceedings of SGML Europe 1997, May 1997. Barcelona, Spain.
|
| |
9
|
Jung Ding and Daniel Berleant, "Design of a Standoff Object-Oriented Markup Language (SOOML) for Annotating Biomedical Literature," in Proceedings of 7th International Conference on Enterprise Information Systems (ICEIS), May 24-28, 2005. Miami.
|
| |
10
|
Steven DeRose, "Markup Overlap: A Review and a Horse," in Proceedings of Conference on Extreme Markup Languages, 2004.
|
| |
11
|
XBL W3C Note. http://www.w3.org/TR/2001/NOTE-xbl-20010223/
|
| |
12
|
W3C Comment on XBL Submission. http://www.w3.org/Submission/2001/05/Comment http://www.w3.org/Submission/2001/05/Comment
|
| |
13
|
S-XBL Working Draft. http://www.w3.org/TR/sXBL/
|
| |
14
|
Adobe Systems Incorporated, Acrobat Core API Reference., 2002. San Jose, CA: Adobe Systems Incorporated.
|
| |
15
|
W. S. Lovegrove and D. F. Brailsford, " Document analysis of PDF documents: methods, results and implications." Electronic Publishing, Origination, Dissemination and Design. 1995, 8(2 and 3), pp. 207--220.
|
| |
16
|
|
| |
17
|
F. M. Wahl, K. Y. Wong, and R. G. Casey, "Block segmentation and text extraction in mixed text/image documents" Computer Graphics Image Processing, vol. 20, pp. 375--390., 1982.
|
| |
18
|
Text Encoding Initiative Consortium, TEI Workgroup on Stand-Off Markup, XLink and XPointer {online}, October 2004. http://www.tei-c.org/Activities/SO/
|
| |
19
|
World Wide Web Consortium, XML Inclusions (XInclude) Version 1.0 {online}, December 2004.Available at: http://www.w3.org/TR/xinclude/
|
| |
20
|
|
| |
21
|
World Wide Web Consortium, Mathematical Markup Language (MathML) Version 2.0 (2nd ed.) {online}. Available at: http://www.w3.org/TR/MathML2/
|
| |
22
|
Recordare, MusicXML Definition {online}. Available at: http://www.recordare.com/xml.html
|
 |
23
|
Masakazu Suzuki , Fumikazu Tamari , Ryoji Fukuda , Seiichi Uchida , Toshihiro Kanahori, INFTY: an integrated OCR system for mathematical documents, Proceedings of the 2003 ACM symposium on Document engineering, November 20-22, 2003, Grenoble, France
[doi> 10.1145/958220.958239]
|
|