|
ABSTRACT
Most approaches for protein interaction mining from biomedical texts use both lexical and syntactic features. However, the individual impact of these two kinds of features on the effectiveness of the mining process has not yet been thoroughly studied. In this paper, we perform such a study on a recently published state of the art support vector machine approach that uses both lexical and syntactic features. To this end, we strip this approach down to an algorithm that uses only a subset of the initial syntactic features. Next, we compare the original and the stripped-down method by evaluating them on 5 benchmark datasets as well as by performing 5 additional cross-dataset experiments. Although the original method exploits a very rich feature set including words, parts-of-speech and grammatical relations, it is not significantly better than the stripped-down version; in fact, the former does not even consistently outperform the latter.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Airola, S. Pyysalo, J. Björne, T. Pahikkala, F. Ginter, and T. Salakoski. A graph kernel for protein-protein interaction extraction. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing at ACL'08, p. 1--9, 2008.
|
| |
2
|
R. Bunescu, R. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, A. K. Ramani, and Y. W. Wong. Comparative experiments on learning information extractors for proteins and their interactions. Artificial Intelligence in Medicine, 33(2):139--155, 2005.
|
| |
3
|
R. C. Bunescu and R. J. Mooney. Subsequence kernels for relation extraction. Advances in Neural Information Processing Systems, 18:171--178, 2006.
|
| |
4
|
M. Collins and N. Duffy. Convolution kernels for natural language. Advances in Neural Information Processing Systems, 14:625--632, 2001.
|
| |
5
|
|
| |
6
|
|
 |
7
|
|
| |
8
|
M. de Marneffe, B. MacCartney and C. D. Manning. Generating Typed Dependency Parses from Phrase Structure Parses. In Proceedings of LREC-06, 2006.
|
| |
9
|
J. Ding, D. Berleant, D. Nettleton, and E. S. Wurtele. Mining medline: Abstracts, sentences, or phrases? In Proceedings of the Pacific Symposium on Biocomputing, p. 326--337, 2002.
|
| |
10
|
T. Fayruzov, M. De Cock, C. Cornelis, and V. Hoste. Deeper: A full parsing based approach to protein relation extraction. Lecture Notes in Computer Science, 4973: 36--47, 2008.
|
| |
11
|
|
| |
12
|
C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), p. 401--408, 2006.
|
| |
13
|
J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29--36, 1982.
|
| |
14
|
D. Haussler. Convolution kernels on discrete structures, Technical report, University of California at Santa Cruz, 1999.
|
| |
15
|
S. Katrenko and P. Adriaans. Learning relations from biomedical corpora using dependency tree levels. In Proceedings of the Fifteenth Dutch-Belgian Conference on Machine Learning (Benelearn), 2006.
|
| |
16
|
|
| |
17
|
C. Nedellec. Learning language in logic - genic interaction extraction challenge. In Proceedings of the ICML-2005 Workshop on Learning Language in Logic (LLL05), p. 31--37, 2005.
|
| |
18
|
S. Pyysalo, A. Airola, J. Heimonen, J. Björne, F. Ginter, and T. Salakoski. Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics, 9 (Suppl 3):S6, 2008.
|
| |
19
|
S. Pyysalo, F. Ginter, J. Heimonen, J. Björne, J. Boberg, J. Järvinen, and T. Salakoski. BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinformatics, 8:50, 2007.
|
| |
20
|
R. Saetre, K. Sagae, and J. Tsujii. Syntactic features for protein-protein interaction extraction. In Short Paper Proceedings of the Second International Symposium on Languages in Biology and Medicine (LBM), 2007.
|
| |
21
|
|
| |
22
|
S. Van Landeghem, Y. Saeys, B. De Baets, and Y. Van de Peer. Extracting protein-protein interactions from text using rich feature vectors and feature selection. To appear in Proceedings of Third International Symposium on Semantic Mining in Biomedicine (SMBM), 2008.
|
| |
23
|
J. Xiao, J. Su, G. Zhou, and C. Tan. Protein-protein interaction extraction: a supervised learning approach. In Proceedings of the 1st International Symposium on Semantic Mining in Biomedicine (SMBM), 2005.
|
| |
24
|
A. Yakushiji, Y. Miyao, T. Ohta, Y. Tateisi, and J. Tsujii. Automatic construction of predicate-argument structure patterns for biomedical information extraction. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, p. 284--292, 2006.
|
|