ACM Home Page
Please provide us with feedback. Feedback
Metadata and data structures for the historical newspaper digital library
Full text PdfPdf (732 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eighth international conference on Information and knowledge management table of contents
Kansas City, Missouri, United States
Pages: 147 - 153  
Year of Publication: 1999
ISBN:1-58113-146-1
Authors
Robert B. Allen  College of Library and Information Services, University of Maryland, College Park, MD
John Schalow  University Library, University of Maryland, College Park, MD
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMIS: ACM Special Interest Group on Management Information Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 35,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/319950.319971
What is a DOI?

ABSTRACT

We examine metadata and data-structure issues for the Historical Newspaper Digital Library. This project proposes to digitize and then do OCR and linguisting processing on several years worth of historical newspapers. Newspapers are very complex information objects so developing a rich description of their content is challenging. In addition to frameworks for the logical structure and physical layout, we propose metadata relevant to the image processing and to the historians who will use this collection. Finally, we consider how the metadata infrastructure might be managed as it evolves with improved text processing capabilities and how an infrastructure might be developed to support a community of users.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Working Group 3: Structural and administrative metadata in page-image conversion projects: Discussion summary and recommendations. In TEI and XML in Digital Libraries. Washington, DC.
 
2
ALAM, H., CHANG, C. H., SHI, Z., AND Tu- PAJ, S. Extracting tables from printed documents. In Symposium on Document Image Understanding and Technology (1995), pp. 113-124.
 
3
BASKETTE, F. K., SISSORS, 3. Z., AND BROOKS, J. S. The Art of Editing. Allyn and Bacon, 1996.
 
4
BUNKE, H., AND WANG, P. S. P. Handbook on Character Recognition and Document Image Analysis. World Scientific, 1997.
 
5
 
6
DOCUMENT PROCESSING GROUP. Page decomposition and related research at the University of Maryland. In Symposium on Document Image Understanding and Technology (1995), pp. 39-55.
 
7
HARROWER, T. The Newspaper Designer's Handbook. McGraw Hill, 1997.
 
8
KANUNGO, T., AND ALLEN, R. B. Full-text access to historical newspapers. Tech. Rep. CS-TR- 4014, Laboratory for Language and Media Processing, University of Maryland, Apr. 1999.
 
9
LIBRARY OF CONGRESS. Thesaurus for Graphical Objects. 1995.
 
10
SCHROTH, R. A. The Eagle and Brooklyn: A Community Newspaper, 1841-1955. Greenwood Press, Westport CT, 1974.
 
11
YANIKOGLU, B. A., AND VINCENT, L. Pink Panther: A complete environment for ground-truthing and benchmarking document page segmentation. Pattern Recognition 31 (September 1998), 1191- 204.


Collaborative Colleagues:
Robert B. Allen: colleagues
John Schalow: colleagues