| Automatic document orientation detection and categorization through document vectorization |
| Full text |
Pdf
(558 KB)
|
| Source
|
International Multimedia Conference
archive
Proceedings of the 14th annual ACM international conference on Multimedia
table of contents
Santa Barbara, CA, USA
POSTER SESSION: Short papers session 1
table of contents
Pages: 113 - 116
Year of Publication: 2006
ISBN:1-59593-447-2
|
|
Authors
|
|
Shijian Lu
|
National University of Singapore, Singapore
|
|
Chew Lim Tan
|
National University of Singapore, Singapore
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 5, Downloads (12 Months): 54, Citation Count: 0
|
|
|
ABSTRACT
This paper presents an automatic orientation detection and categorization technique that is capable of detecting the orientation of multilingual documents with arbitrary skew and categorizing document images according to the underlying languages. We carry out orientation detection and categorization through document vectorization, which encodes document orientation and language information and converts each document image into an electronic document vector through the exploitation of the density and distribution of vertical component runs. For each language of interest, a pair of vector templates is first constructed through a training process. Orientation and category of the query image are then determined based on distances between the query document vector and the constructed vector templates. Experiments over 492 testing document images show that the average orientation detection and categorization rates reach up to 97.56% and 99.59%, respectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
D. S. Le and G. R. Thoma and H. Wechsler, Automated Page Orientation and Skew Angle Detection for Binary Document Images, Pattern Recognition, 27(10):1325--1344, 1994.
|
 |
3
|
|
| |
4
|
D. Bloomberg and G. Kopec and L. Dasari, Measuring document image skew and orientation, SPIE 2422, pages 302--316, 1995.
|
| |
5
|
|
| |
6
|
A. Vailaya and H. Zhang and C. Yang and F. Liu and A. K. Jain, Automatic image orientation detection, IEEE Transactions on Image Processing, 11(7):746--755, 2002.
|
 |
7
|
|
| |
8
|
S. Lu and C. L. Tan, Script and language identification in degraded and distorted document images, Proceedings of the 21th National Conference on Artificial Intelligence (AAAI), 2006, Accepted.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
N. Otsu, A Threshold Selection Method from Graylevel Histogram, IEEE Transactions on System, Man, Cybernetics, 19(1):62--66, 1978.
|
| |
13
|
J. J. Hull and S. L. Taylor, Document image skew detection: Survey and annotated bibliography, Document Analysis Systems, pages 40--64, World Scientific, 1998.
|
| |
14
|
|
|