| TEG: a hybrid approach to information extraction |
| Full text |
Pdf
(202 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the thirteenth ACM international conference on Information and knowledge management
table of contents
Washington, D.C., USA
SESSION: KM-3 (knowledge management): knowledge extraction
table of contents
Pages: 589 - 596
Year of Publication: 2004
ISBN:1-58113-874-1
|
|
Authors
|
|
Benjamin Rosenfeld
|
Bar-Ilan University, Ramat Gan, ISRAEL
|
|
Ronen Feldman
|
Bar-Ilan University, Ramat Gan, ISRAEL
|
|
Moshe Fresko
|
Bar-Ilan University, Ramat Gan, ISRAEL
|
|
Jonathan Schler
|
Bar-Ilan University, Ramat Gan, ISRAEL
|
|
Yonatan Aumann
|
Bar-Ilan University, Ramat Gan, ISRAEL
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 55, Citation Count: 3
|
|
|
ABSTRACT
This paper describes a hybrid statistical and knowledge-based information extraction model, able to extract entities and relations at the sentence level. The model attempts to retain and improve the high accuracy levels of knowledge-based systems while drastically reducing the amount of manual labor by relying on statistics drawn from a training corpus. The implementation of the model, called TEG (Trainable Extraction Grammar), can be adapted to any IE domain by writing a suitable set of rules in a SCFG (Stochastic Context Free Grammar) based extraction language, and training them using an annotated corpus. The system does not contain any purely linguistic components, such as PoS tagger or parser. We demonstrate the performance of the system on several named entity extraction and relation extraction tasks. The experiments show that our hybrid approach outperforms both purely statistical and purely knowledge-based systems, while requiring orders of magnitude less manual rule writing and smaller amount of training data. The improvement in accuracy is slight for named entity extraction task and more pronounced for relation extraction.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
ACE. http://www.itl.nist.gov/iad/894.01/tests/ace/. in ACE - Automatic Content Extraction. 2002.
|
 |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Ronen Feldman , Yonatan Aumann , Michal Finkelstein-Landau , Eyal Hurvitz , Yizhar Regev , Ariel Yaroshevich, A Comparative Study of Information Extraction Strategies, Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, p.349-359, February 17-23, 2002
|
| |
8
|
Kushmerick, N. Finite-state approaches to Web information extraction. in 3rd Summer Convention on Information Extraction. 2002. Rome.
|
| |
9
|
Freitag, D., Using grammatical inference to improve precision in information extraction, in Workshop on Grammatical Inference, Automata Induction, and Language Acquisition (ICML'97), Nashville, TN. 1997.
|
| |
10
|
Aitken, J.S. Learning Information Extraction Rules: An Inductive Logic Programming approach. in 15th European Conference on Artificial Intelligence. 2002: IOS Press.
|
| |
11
|
|
| |
12
|
|
| |
13
|
Leek, T.R., Information extraction using hidden Markov models. M.Sc.Thesis, UC San Diego, 1997.
|
| |
14
|
|
| |
15
|
Freitag, D. and A.K. McCallum, Information extraction with HMMs and shrinkage, in Proceedings of the AAAI-99 Workshop on Machine Learning for Informatino Extraction. 1999.
|
| |
16
|
De Sitter, A. and W. Daelemans. Information Extraction via Double Classification. in International Workshop on Adaptive Text Extraction and Mining. 2003. Dubrovnik.
|
| |
17
|
Sun, A., et al. Using Support Vector Machine for Terrorism Information Extraction. in 1st NSF/NIJ Symposium on Intelligence and Security Informatics. 2003.
|
| |
18
|
Kushmerick, N., E. Johnston, and S. McGuinness. Information extraction by text classification. in IJCAI-01 Workshop on Adaptive Text Extraction and Mining. 2001. Seattle, WA.
|
| |
19
|
|
| |
20
|
Miller, S., et al., Algorithms that learn to extract information-BBN: Description of the SIFT system as used for MUC, in Proceedings of the Seventh Message Understanding Conference (MUC-7). 1998.
|
| |
21
|
Collins, M. and S. Miller. Semantic Tagging using a Probabilistic Context Free Grammar. in Proceedings of the Sixth Workshop on Very Large Corpora. 1998.
|
| |
22
|
|
| |
23
|
|
| |
24
|
|
| |
25
|
Ronen Feldman , Yonatan Aumann , Michal Finkelstein-Landau , Eyal Hurvitz , Yizhar Regev , Ariel Yaroshevich, A Comparative Study of Information Extraction Strategies, Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, p.349-359, February 17-23, 2002
|
| |
26
|
Daniel M. Bikel , Scott Miller , Richard Schwartz , Ralph Weischedel, Nymble: a high-performance learning name-finder, Proceedings of the fifth conference on Applied natural language processing, p.194-201, March 31-April 03, 1997, Washington, DC
[doi> 10.3115/974557.974586]
|
| |
27
|
Klein, D. and C. Manning, An O(n3) Agenda-Based Chart Parser for Arbitrary Probabilistic Context-Free Grammars., in Technical Report dbpubs/2001. 2001, Stanford University.
|
|