ACM Home Page
Please provide us with feedback. Feedback
Integrating geometrical and linguistic analysis for email signature block parsing
Full text PdfPdf (192 KB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 17 ,  Issue 4  (October 1999) table of contents
Pages: 343 - 366  
Year of Publication: 1999
ISSN:1046-8188
Authors
Hao Chen  Univ. of California at Berkeley, Berkeley
Jianying Hu  Lucent Technologies Bell Labs, Murray Hill, NJ
Richard W. Sproat  AT&T Labs—Research, Florham Park, NJ
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 45,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/326440.326442
What is a DOI?

ABSTRACT

The signature block is a common structured component found in email messages. Accurate identification and analysis of signature blocks is important in many multimedia messaging and information retrieval applications such as email text-to-speech rendering, automatic construction of personal address databases, and interactive message retrieval. It is also a very challenging task, because signature blocks often appear in complex two-dimensional layouts which are guided only by loose conventions. Traditional text analysis methods designed to deal with sequential text cannot handle two-dimensional structures, while the highly unconstrained nature of signature blocks makes the application of two-dimensional grammars very difficult. In this article, we describe an algorithm for signature block analysis which combines two-dimensional structural segmentation with one-dimensional grammatical constraints. The information obtained from both layout and linguistic analysis is integrated in the form of weighted finite-state transducers. The algorithm is currently implemented as a component in a preprocessing system for email text-to-speech rendering.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ANTONACOPOULOS,A.AND RITCHINGS, R. T. 1994. Flexible page segmentation using the background. In Proceedings of the 11th International Conference on Pattern Recognition (Jerusalem, Oct.) 339-344.
 
2
BAIRD, H. S. 1992. Anatomy of a versatile page reader. Proc. IEEE 80, 7, 1059-1065.
 
3
BAIRD,H.S.,JONES,S.E.,AND FORTUNE, S. J. 1990. Image segmentation using shape-directed covers. In Proceedings of the 10th Internation Conference on Pattern Recognition (Atlantic City, NJ, June)
 
4
DENGEL,A.AND BARTH, G. 1988. High level document analysis guided by geometric aspects. Int. J. Pattern Recogn. Artif. Intell. 2, 4, 641-655.
 
5
 
6
GUYON, I., SCHENKEL, M., AND DENKER, J. 1996. Overview and synthesis of on-line cursive handwriting recognition techniques. In Handbook on Optical Character Recognition and Document Image Analysis World Scientific Publishing Co., Inc., River Edge, NJ, 1-43.
 
7
 
8
JAIN,A.K.AND BHATTACHARJEE, S. K. 1992. Address block location on envelopes using Gabor filters. Pattern Recogn. 25, 12, 1459-1477.
 
9
MIZUNO, M., TSUJI, Y., TANAKA, T., IWASHITA, M., AND TEMMA, T. 1991. Document recognition system with layout structure generator. NEC Res. Dev. 32, 2, 430-437.
 
10
 
11
NAGGY,S.C.S.AND STODDARD, S. D. 1985. Document analysis with an expert system. In Proceedings of Pattern Recognition in Practice II (Amsterdam, June)
 
12
 
13
 
14
PAVLIDIS, T. 1991. Page segmentation by white streams. In Proceedings of the International Conference on Document Analysis and Recognition (St. Malo, France) 945-953.
 
15
PEREIRA,F.C.N.AND RILEY, M. D. 1996. Speech recognition by composition of weighted finite automata. CMP-LG archive paper 9603001. Los Alamos National Laboratory, Los Alamos, NM. Available via http://xxx.lanl.gov/abs/cmp-lg/9603001
 
16
PORTER,G.B.AND RAINERO, E. V. 1992. Document reconstruction. In Proceedings of the Conference on Electronic Publishing 127-141.
 
17
RAHGOZAR,M.A.,FAN, A., AND RAINERO, E. V. 1994. Tabular document recognition. In Proceedings of SPIE (San Jose, CA, Feb.) 87-96.
 
18
 
19
 
20
 
21
SPROAT, R., CHEN, H., AND HU, J. 1998. Emu: An e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing (Los Angeles, CA, Dec.) IEEE Press, Piscataway, NJ, 239-244.
 
22
 
23
 
24
TAKASU, A., SATOH, S., AND KATSURA, E. 1994. A document understanding method for database construction of an electronic library. In Proceedings of the 12th IEEE Conference on Computer Vision and Pattern Recognition IEEE Press, Piscataway, NJ, 263-466.
 
25


Collaborative Colleagues:
Hao Chen: colleagues
Jianying Hu: colleagues
Richard W. Sproat: colleagues