|
ABSTRACT
Extracting and reading handwritten data from medical forms is an important task in medical informatics as it paves the way for efficient archival, indexing, and retrieval. This paper addresses two important challenges: (i) extraction of handwritten text data from images of carbon copies, and (ii) intelligent use of context to reduce lexicons to make the task of handwriting recognition tractable. We have developed a smart binarization algorithm targeted to carbon copy images that outperforms methods reported in the literature. The lexicon reduction method is based on learning the medical concept, and hence the probable medical terms to be encountered in the narrative part that describes the chief complaint of the patient by training on OCR output. In our experiments, we have worked with about 600 medical forms, 20 medical concepts, and a lexicon size of 4,700. We have observed that if the concept is one of top 3 choices, the lexicon can be reduced by two-thirds on an unseen form.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
R. Milewski and V. Govindaraju, "Extraction of handwritten Text from Carbon Copy medical Form Images", Document Analysis Systems, Nelson, NZ, 2006.
|
| |
4
|
N. Otsu. A Threshold Selection Method from Gray-Level Histogram. IEEE Transactions on System Man Cybernetics, Vol. SMC-9, No. 1. C1979.
|
| |
5
|
|
| |
6
|
|
|