|
ABSTRACT
This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond. This paper analyzes success with ten categories of entities prominent in these newspapers and the particular problems that these classes of named entities raise. Personal and place names are familiar but some more important categories (such as ship names and military units) illustrate some of the challenges that named entity identification confronts as it evolves into a fundamental tool not only for automatic metadata generation but also for searching and browsing as well. We conclude by suggesting the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Harith Alani , Sanghee Kim , David E. Millard , Mark J. Weal , Wendy Hall , Paul H. Lewis , Nigel R. Shadbolt, Automatic Ontology-Based Knowledge Extraction from Web Documents, IEEE Intelligent Systems, v.18 n.1, p.14-21, January 2003
[doi> 10.1109/MIS.2003.1179189]
|
 |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
K. Arlitsch and J. Herbert. Digitalnewspapers.org: The digital newspapers program at the University of Utah. Serials Librarian, 47(1-2):2--6, 2003.
|
| |
6
|
K. Arlitsch and J. Herbert. Microfilm, paper, and OCR: issues in newspaper digitization. Microform and Imaging Review, 33(2):58--67, 2004.
|
| |
7
|
M. Bates, D. Wilde, and S. Siegfried. An analysis of search terminology used by humanities scholars: The Getty Online Search Project report number 1. Library Quarterly, 63(1):1--39, 1993.
|
| |
8
|
|
| |
9
|
|
| |
10
|
G. Buchanan, S. J. Cunningham, A. Blandford, J. Rimmer, and C. Warwick. Information seeking by humanities scholars. In ECDL '05: Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, pages 218--229. Springer, 2005.
|
| |
11
|
|
| |
12
|
M. Deegan, E. King, A. Men, and E. Steinvel. The British Library newspaper pilot. 2002.
|
| |
13
|
M. Deegan, E. Steinvel, and E. King. Digitizing historic newspapers: progress and prospects. RLG Diginews, 6(4), August 2002.
|
| |
14
|
W. M. Duff and C. A. Johnson. Accidentally found on purpose: information-seeking behavior of historians in archives. Library Quarterly, 72(4):472--96, October 2002.
|
| |
15
|
W. M. Duff and C. A. Johnson. Where is the list with all the names: information seeking behavior of genealogists. American Archivist, 66(1):79--95, 2003.
|
| |
16
|
L. J. Gilboe. The challenges of digitization: libraries are finding that newspaper projects are not for the faint of heart. Serials Librarian, 49(1-2):155--63, 2005.
|
| |
17
|
R. Hoekstra. Integrating structured and unstructured searching in historical sources. In Proceedings of the XVI International Conference of the Association for History and Computing, pages 149--54, 2005.
|
| |
18
|
S. Jones, M. Jones, M. Barr, and T. T. Keegan. Searching and browsing in a digital library of historical maps and newspapers. Journal of Digital Information, 2004. Article No. 324, 2004-12-19.
|
| |
19
|
E. Kesse. Ephemeral cities. RLG Diginews, 8(6), 2004.
|
| |
20
|
E. King. Digitisation of newspapers at the British Library. Serials Librarian, 49(1-2):165--81, 2005.
|
| |
21
|
D. S. MacQueen. Developing methods for very-large-scale searches in Proquest Historical Newspapers collection and Infotrac The Times Digital Archive: The case of two million versus two millions. Journal of English Linguistics, 32(2):124--43, 2004.
|
 |
22
|
|
| |
23
|
D. W. Oard. Language technologies for scalable digital libraries. New Delhi, India, 2004. Presented at the International Conference on Digital Libraries.
|
| |
24
|
B. Popik. Digital historical newspapers: A review of the powerful new research tools. Journal of English Linguistics, 32(2):114--23, 2004.
|
| |
25
|
A. P. Porrata, R. Llavori, and J. R. Shulcloper. Building a hierarchy of events and topics for newspaper digital libraries. In Advances in Information Retrieval, 25th European Conference on IR Research, ECIR 2003, Pisa, Italy, April 14-16, 2003, Proceedings, pages 588--596. Springer, 2003.
|
| |
26
|
R. Readings and M. Holland. 'The Thunderer' on the Web - the Times Digital Archive 1785-1985. Library + Information Update, July 2003.
|
| |
27
|
R. Rosenzweig. Scarcity or abundance? Preserving the past in a digital era. American Historical Review, 108(3):735--762, 2003.
|
| |
28
|
R. Shoemaker. Digital London: Creating a searchable web of interlinked resources on eighteenth century London. Program: Electronic Library and Information Systems, 39(4):297--311, 2005.
|
 |
29
|
|
 |
30
|
|
| |
31
|
|
| |
32
|
J. K. Terpstra, F. Zarndt, D. Ongley, and S. Boddie. The Tundra Times newspaper digitization project. RLG Diginews, 9(1), February 2005.
|
| |
33
|
|
| |
34
|
R. W. Zweig. Lessons from the Palestine Post project. Literary and Linguistic Computing, 13(2):94--7, 1998.
|
|