| Learning to recognize tables in free text |
| Full text |
Pdf
(736 KB)
|
| Source
|
Annual Meeting of the ACL
archive
Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
table of contents
College Park, Maryland
Pages: 443 - 450
Year of Publication: 1999
ISBN:1-55860-609-3
|
|
Authors
|
|
| Publisher |
Association for Computational Linguistics
Morristown, NJ, USA
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 25, Citation Count: 13
|
|
|
ABSTRACT
Many real-world texts contain tables. In order to process these texts correctly and extract the information contained within the tables, it is important to identify the presence and structure of tables. In this paper, we present a new approach that learns to recognize tables in free text, including the boundary, rows and columns of tables. When tested on Wall Street Journal news documents, our learning approach outperforms a deterministic table recognition algorithm that identifies table recognition algorithm that identifies tables based on a fixed set of conditions. Our learning approach is also more flexible and easily adaptable to texts in different domains with different table characteristics.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Douglas Appelt and David Israel. 1997. Tutorial notes on building information extraction systems. Tutorial held at the Fifth Conference on Applied Natural Language Processing.
|
| |
2
|
Shona Douglas and Matthew Hurst. 1996. Layout & language: Lists and tables in technical documents. In Proceedings of the ACL SIGPARSE Workshop on Punctuation in Computational Linguistics, pages 19--24.
|
| |
3
|
Shona Douglas, Matthew Hurst, and David Quinn. 1995. Using natural language processing for identifying and interpreting tables in plain text. In Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 535--545.
|
| |
4
|
|
| |
5
|
Richard Power and Donia Scott. 1999. Using layout for the generation, understanding or retrieval of documents. Call for participation at the 1999 AAAI Fall Symposium Series.
|
| |
6
|
|
| |
7
|
|
CITED BY 13
|
|
|
|
|
|
Ying Liu , Kun Bai , Prasenjit Mitra , C. Lee Giles, TableSeer: automatic table metadata extraction and searching in digital libraries, Proceedings of the 2007 conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Jie Tang , Hang Li , Yunbo Cao , Zhaohui Tang, Email data cleaning, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
|
|
|
|
|
|
|
|
|
|
Aleksander Pivk , Philipp Cimiano , York Sure , Matjaz Gams , Vladislav Rajkovič , Rudi Studer, Transforming arbitrary tables into logical form with TARTAR, Data & Knowledge Engineering, v.60 n.3, p.567-595, March, 2007
|
|