ACM Home Page
Please provide us with feedback. Feedback
GroundTruth tools & technology: applications in real world
Full text PdfPdf (1.16 MB)
Source Document Engineering archive
Proceedings of the 2005 ACM symposium on Document engineering table of contents
Bristol, United Kingdom
DEMONSTRATION SESSION: Demonstrations table of contents
Pages: 223 - 224  
Year of Publication: 2005
ISBN:1-59593-240-2
Authors
Vinay Saxena  Hewlett-Packard TSG, TX
Sherif Yacoub  Hewlett-Packard Labs, Spain
Sponsors
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): n/a,   Downloads (12 Months): n/a,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1096601.1096655
What is a DOI?

ABSTRACT

The process of creating digital archive from paper based document is gaining popularity. Automated systems/frameworks for document analysis techniques have been developed, but still lack in achieving the required accuracy goals in terms of text, article identification etc. Rendering problems, such as missing graphical components, wrong reading ordering in multi columned journals/magazine, missing indentation and broken text lines, hyphenation issues, are basically due to poor layout information extracted from the scanned document during the OCR process. Also lacking are the tools to take the output of these processes and be able to create highly accurate content with associated metadata from the original. The term "Ground Truth" in the current context is used to refer to the process (automatic and manual collectively) by which we ensure that the end result of the process are highly accurate and complete rich text content (articles, papers, etc) generated from the original scanned version of content.We present to the audience PerfectDoc - A suite of tools for manual GroundTruthing. The suite consist of tools to create highly accurate GroundTruth, GT editors and tools to take this data and deliver output suitable for web based viewing.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Vasu Sankhavaram, Vinay Saxena: Work Flow for Printed Media Digitization - Hewlett-Packard TechCon 2005.
 
3
Yacoub, S., P. Faraboschi, J. Burns, D. Ortega, J. Abad, J.A. Sanchez. Chronos: A Document Understanding System for Historical Magazine Collections. 2005 International Journal on Document Analysis and Recognition IJDAR.
4

Collaborative Colleagues:
Vinay Saxena: colleagues
Sherif Yacoub: colleagues