ACM Home Page
Please provide us with feedback. Feedback
Learning to recognize tables in free text
Full text PdfPdf (736 KB)
Source Annual Meeting of the ACL archive
Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics table of contents
College Park, Maryland
Pages: 443 - 450  
Year of Publication: 1999
ISBN:1-55860-609-3
Authors
Hwee Tou Ng  DSO National Laboratories, Singapore
Chung Yong Lim  DSO National Laboratories, Singapore
Jessica Li Teng Koo  DSO National Laboratories, Singapore
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 25,   Citation Count: 13
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/1034678.1034746

ABSTRACT

Many real-world texts contain tables. In order to process these texts correctly and extract the information contained within the tables, it is important to identify the presence and structure of tables. In this paper, we present a new approach that learns to recognize tables in free text, including the boundary, rows and columns of tables. When tested on Wall Street Journal news documents, our learning approach outperforms a deterministic table recognition algorithm that identifies table recognition algorithm that identifies tables based on a fixed set of conditions. Our learning approach is also more flexible and easily adaptable to texts in different domains with different table characteristics.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Douglas Appelt and David Israel. 1997. Tutorial notes on building information extraction systems. Tutorial held at the Fifth Conference on Applied Natural Language Processing.
 
2
Shona Douglas and Matthew Hurst. 1996. Layout & language: Lists and tables in technical documents. In Proceedings of the ACL SIGPARSE Workshop on Punctuation in Computational Linguistics, pages 19--24.
 
3
Shona Douglas, Matthew Hurst, and David Quinn. 1995. Using natural language processing for identifying and interpreting tables in plain text. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 535--545.
 
4
 
5
Richard Power and Donia Scott. 1999. Using layout for the generation, understanding or retrieval of documents. Call for participation at the 1999 AAAI Fall Symposium Series.
 
6
 
7

CITED BY  13
 
 
 
 
 
Collaborative Colleagues:
Hwee Tou Ng: colleagues
Chung Yong Lim: colleagues
Jessica Li Teng Koo: colleagues