|
ABSTRACT
The signature block is a common structured component found in email messages. Accurate identification and analysis of signature blocks is important in many multimedia messaging and information retrieval applications such as email text-to-speech rendering, automatic construction of personal address databases, and interactive message retrieval. It is also a very challenging task, because signature blocks often appear in complex two-dimensional layouts which are guided only by loose conventions. Traditional text analysis methods designed to deal with sequential text cannot handle two-dimensional structures, while the highly unconstrained nature of signature blocks makes the application of two-dimensional grammars very difficult. In this article, we describe an algorithm for signature block analysis which combines two-dimensional structural segmentation with one-dimensional grammatical constraints. The information obtained from both layout and linguistic analysis is integrated in the form of weighted finite-state transducers. The algorithm is currently implemented as a component in a preprocessing system for email text-to-speech rendering.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
ANTONACOPOULOS,A.AND RITCHINGS, R. T. 1994. Flexible page segmentation using the background. In Proceedings of the 11th International Conference on Pattern Recognition (Jerusalem, Oct.) 339-344.
|
| |
2
|
BAIRD, H. S. 1992. Anatomy of a versatile page reader. Proc. IEEE 80, 7, 1059-1065.
|
| |
3
|
BAIRD,H.S.,JONES,S.E.,AND FORTUNE, S. J. 1990. Image segmentation using shape-directed covers. In Proceedings of the 10th Internation Conference on Pattern Recognition (Atlantic City, NJ, June)
|
| |
4
|
DENGEL,A.AND BARTH, G. 1988. High level document analysis guided by geometric aspects. Int. J. Pattern Recogn. Artif. Intell. 2, 4, 641-655.
|
| |
5
|
|
| |
6
|
GUYON, I., SCHENKEL, M., AND DENKER, J. 1996. Overview and synthesis of on-line cursive handwriting recognition techniques. In Handbook on Optical Character Recognition and Document Image Analysis World Scientific Publishing Co., Inc., River Edge, NJ, 1-43.
|
| |
7
|
|
| |
8
|
JAIN,A.K.AND BHATTACHARJEE, S. K. 1992. Address block location on envelopes using Gabor filters. Pattern Recogn. 25, 12, 1459-1477.
|
| |
9
|
MIZUNO, M., TSUJI, Y., TANAKA, T., IWASHITA, M., AND TEMMA, T. 1991. Document recognition system with layout structure generator. NEC Res. Dev. 32, 2, 430-437.
|
| |
10
|
|
| |
11
|
NAGGY,S.C.S.AND STODDARD, S. D. 1985. Document analysis with an expert system. In Proceedings of Pattern Recognition in Practice II (Amsterdam, June)
|
| |
12
|
|
| |
13
|
|
| |
14
|
PAVLIDIS, T. 1991. Page segmentation by white streams. In Proceedings of the International Conference on Document Analysis and Recognition (St. Malo, France) 945-953.
|
| |
15
|
PEREIRA,F.C.N.AND RILEY, M. D. 1996. Speech recognition by composition of weighted finite automata. CMP-LG archive paper 9603001. Los Alamos National Laboratory, Los Alamos, NM. Available via http://xxx.lanl.gov/abs/cmp-lg/9603001
|
| |
16
|
PORTER,G.B.AND RAINERO, E. V. 1992. Document reconstruction. In Proceedings of the Conference on Electronic Publishing 127-141.
|
| |
17
|
RAHGOZAR,M.A.,FAN, A., AND RAINERO, E. V. 1994. Tabular document recognition. In Proceedings of SPIE (San Jose, CA, Feb.) 87-96.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
SPROAT, R., CHEN, H., AND HU, J. 1998. Emu: An e-mail preprocessor for text-to-speech. In Proceedings of the IEEE Workshop on Multimedia Signal Processing (Los Angeles, CA, Dec.) IEEE Press, Piscataway, NJ, 239-244.
|
| |
22
|
|
| |
23
|
|
| |
24
|
TAKASU, A., SATOH, S., AND KATSURA, E. 1994. A document understanding method for database construction of an electronic library. In Proceedings of the 12th IEEE Conference on Computer Vision and Pattern Recognition IEEE Press, Piscataway, NJ, 263-466.
|
| |
25
|
|
CITED BY 3
|
|
|
|
|
Natalie Glance , Matthew Hurst , Kamal Nigam , Matthew Siegler , Robert Stockton , Takashi Tomokiyo, Deriving marketing intelligence from online discussion, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
Manish A. Bhide , Ajay Gupta , Rahul Gupta , Prasan Roy , Mukesh K. Mohania , Zenita Ichhaporia, LIPTUS: associating structured and unstructured information in a banking environment, Proceedings of the 2007 ACM SIGMOD international conference on Management of data, June 11-14, 2007, Beijing, China
|
|