ACM Home Page
Please provide us with feedback. Feedback
Optical character recognition for typeset mathematics
Full text PdfPdf (742 KB)
Source International Conference on Symbolic and Algebraic Computation archive
Proceedings of the international symposium on Symbolic and algebraic computation table of contents
Oxford, United Kingdom
Pages: 348 - 353  
Year of Publication: 1994
ISBN:0-89791-638-7
Authors
Benjamin P. Berman  Computer Science Division, EECS Department, University of California at Berkeley
Richard J. Fateman  Computer Science Division, EECS Department, University of California at Berkeley
Sponsor
SIGSAM: ACM Special Interest Group on Symbolic and Algebraic Manipulation
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 40,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/190347.190438
What is a DOI?

ABSTRACT

There is a wealth of mathematical knowledge that could be potentially very useful in many computational applications, but is not available in electronic form. This knowledge comes in the form of mechanically typeset books and journals going back more than a hundred years. Besides these older sources, there are a great many current publications, filled with useful mathematical information, which are difficult if not impossible to obtain in electronic form. What we would like to do is extract character information from these documents, which could then be passed to higher-level parsing routines for further extraction of mathematical content (or any other useful 2-dimensional semantic content). Unfortunately, current commercial OCR (optical character recognition) software packages are quite unable to handle mathematical formulas, since their algorithms at all levels use heuristics developed for other document styles. We are concerned with the development of OCR methods that are able to handle this specialized task of mathematical expression recognition.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
P. Chou, "Recognition of equations using a twodimensional stochastic context-free grammar", SPIE Conf. on Visual Communications and Image Processing, Philadelphia, PA, Nov. 1989.
 
3
D. Bierens de Haan. NouveUes Tables d'lnteOrales Definies edition of 1867 corrected, with an English Translation of the Introduction by J.F. Ritt. G. E. Stechert & Co. NY. 1939.
 
4
Richard J. Fateman. "Recognition and Parsing of Typeset Mathematics," UCB CS Division internal report, January 1994.
 
5
Richard J. Fateman and Theodore H. Einwohner. "Automated Integral Tables" U CB CS Division internal report, January 1994.
 
6
D.P. Huttenlocher, Gregory A. Klandermand, William J. Rucklidge. "Comparing Images Using the Hausdorff Distance," Cornell Univ. CS Dept. Tech. Rpt.
 
7
Melvin Klerer and Fred Grossman. "Error Rates in Tables of Indefinite Integrals." Industrial Math. 18 (1968) 31-62. See also, by the same authors, A new table of indefinite integrals; computer processed Dover, New York, 1971.
 
8
G. Kopec and P. Chou, "Automatic generation of custom document image decoders", Proc. Second lntl. Conf. on Doc. Anal. and Recog., Tsukuba Science City, Japan, Oct. 20-22, 1993.
 
9
Nicholas Mitchell. "A Parser for Two-Dimensional OCR of Mathematics," UCB progress report 1993.
 
10


Collaborative Colleagues:
Benjamin P. Berman: colleagues
Richard J. Fateman: colleagues