| Metadata and data structures for the historical newspaper digital library |
| Full text |
Pdf
(732 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eighth international conference on Information and knowledge management
table of contents
Kansas City, Missouri, United States
Pages: 147 - 153
Year of Publication: 1999
ISBN:1-58113-146-1
|
|
Authors
|
|
Robert B. Allen
|
College of Library and Information Services, University of Maryland, College Park, MD
|
|
John Schalow
|
University Library, University of Maryland, College Park, MD
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 8, Downloads (12 Months): 35, Citation Count: 6
|
|
|
ABSTRACT
We examine metadata and data-structure issues for the Historical Newspaper Digital Library. This project proposes to digitize and then do OCR and linguisting processing on several years worth of historical newspapers. Newspapers are very complex information objects so developing a rich description of their content is challenging. In addition to frameworks for the logical structure and physical layout, we propose metadata relevant to the image processing and to the historians who will use this collection. Finally, we consider how the metadata infrastructure might be managed as it evolves with improved text processing capabilities and how an infrastructure might be developed to support a community of users.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Working Group 3: Structural and administrative metadata in page-image conversion projects: Discussion summary and recommendations. In TEI and XML in Digital Libraries. Washington, DC.
|
| |
2
|
ALAM, H., CHANG, C. H., SHI, Z., AND Tu- PAJ, S. Extracting tables from printed documents. In Symposium on Document Image Understanding and Technology (1995), pp. 113-124.
|
| |
3
|
BASKETTE, F. K., SISSORS, 3. Z., AND BROOKS, J. S. The Art of Editing. Allyn and Bacon, 1996.
|
| |
4
|
BUNKE, H., AND WANG, P. S. P. Handbook on Character Recognition and Document Image Analysis. World Scientific, 1997.
|
| |
5
|
|
| |
6
|
DOCUMENT PROCESSING GROUP. Page decomposition and related research at the University of Maryland. In Symposium on Document Image Understanding and Technology (1995), pp. 39-55.
|
| |
7
|
HARROWER, T. The Newspaper Designer's Handbook. McGraw Hill, 1997.
|
| |
8
|
KANUNGO, T., AND ALLEN, R. B. Full-text access to historical newspapers. Tech. Rep. CS-TR- 4014, Laboratory for Language and Media Processing, University of Maryland, Apr. 1999.
|
| |
9
|
LIBRARY OF CONGRESS. Thesaurus for Graphical Objects. 1995.
|
| |
10
|
SCHROTH, R. A. The Eagle and Brooklyn: A Community Newspaper, 1841-1955. Greenwood Press, Westport CT, 1974.
|
| |
11
|
YANIKOGLU, B. A., AND VINCENT, L. Pink Panther: A complete environment for ground-truthing and benchmarking document page segmentation. Pattern Recognition 31 (September 1998), 1191- 204.
|
INDEX TERMS
Primary Classification:
J.
Computer Applications
J.7
COMPUTERS IN OTHER SYSTEMS
Subjects:
Publishing
Additional Classification:
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.4
Systems
Subjects:
Textual databases
H.3
INFORMATION STORAGE AND RETRIEVAL
I.
Computing Methodologies
I.7
DOCUMENT AND TEXT PROCESSING
I.7.5
Document Capture
Subjects:
Optical character recognition (OCR)
General Terms:
Design,
Documentation,
Human Factors,
Management,
Measurement,
Performance,
Theory
Keywords:
OCR,
digital libraries,
history,
metadata,
newspapers
|