|
ABSTRACT
From the standpoint of the automated extraction of scientific knowledge, an important but little-studied part of scientific publications are the figures and accompanying captions. Captions are dense in information, but also contain many extra-grammatical constructs, making them awkward to process with standard information extraction methods. We propose a scheme for "understanding" captions in biomedical publications by extracting and classifying "image pointers" (references to the accompanying image). We evaluate a number of automated methods for this task, including hand-coded methods, methods based on existing learning techniques, and methods based on novel learning techniques. The best of these methods leads to a usefully accurate tool for caption-understanding, with both recall and precision in excess of 94% on the most important single class in a combined extraction/classification task.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
M. V. Boland and R. F. Murphy. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of hela cells. Bioinformatics, 17(12):1213--1223, December 2001.
|
| |
3
|
R. Bunescu, R. Ge, R. J. Mooney, E. Marcotte, and A. K. Ramani. Extracting gene and protein names from biomedical abstracts. Unpublished Technical Note, Available from http://www.cs.utexas.edu/users/ml/publication/ie.html, 2002.
|
| |
4
|
W. W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference, Lake Tahoe, California, 1995. Morgan Kaufmann.
|
| |
5
|
W. W. Cohen. Infrastructure components for large-scale information extraction systems. In Proceedings of The Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-2003), Acapulco, Mexico, 2003.
|
| |
6
|
William W. Cohen , Yoram Singer, A simple, fast, and effective rule learner, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, p.335-342, July 18-22, 1999, Orlando, Florida, United States
|
| |
7
|
|
| |
8
|
|
| |
9
|
K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi. Toward information extraction: Identifying protein names from biological papers. In Proceedings of 1998 the Pacific Symposium on Biocomputing (PSB-1998), pages 707--718, 1998.
|
| |
10
|
K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000), pages 502--513, 2000.
|
| |
11
|
|
| |
12
|
|
| |
13
|
R. F. Murphy, M. Velliste, and G. Porreca. Robust classification of subcellular location patterns in fluorescence microscope images. In Proceedings of the 2002 IEEE International Workshop on Neural Networks for Signal Processing, pages 67--76, September 2002.
|
| |
14
|
J. Pustejovsky, J. Castaño, J. Zhang, M. Kotecki, and B. Cochran. Robust relational parsing over biomedical literature: Extracting inhibit relations. In Proceedings of 2002 the Pacific Symposium on Biocomputing (PSB-2002), pages 362--373, 2002.
|
| |
15
|
T. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter. Edgar: Extraction of drugs, genes and relations from the biomedical literature. In Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000), pages 514--525, 2000.
|
| |
16
|
T. Sekimizu, H. Park, and J. Tsujii. Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. In Genome Informatics, pages 62--71. Universal Academy Press, Inc, 1998.
|
| |
17
|
B. Stapley, L. Kelley, and M. Sternberg. Predicting the sub-cellular location of proteins from text using support vector machines. In Proceedings of the 2002 Pacific Symposium on Biocomputing, pages 374--385, 2002.
|
| |
18
|
M. Stephens, M. Palakal, S. Mukhopadhyay, R. Raje, and J. Mostafa. Detecting gene relations from medline abstracts. In Pacific Symposium on Biocomputing, pages 483--496, 2001.
|
| |
19
|
J. Thomas, D. Milward, C. Ouzounis, S. Pulman, and M. Carroll. Automatic extraction of protein interactions from scientific abstracts. In Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000), pages 538--549, 2000.
|
| |
20
|
M. Velliste and R. F. Murphy. Automated determination of protein subcellular locations from 3d fluorescence microscope images. In Proceedings of the 2002 IEEE International Symposium on Biomedical Imaging (ISBI-2002), pages 867--870, July 2002.
|
CITED BY 5
|
|
|
|
|
Marti A. Hearst , Anna Divoli , Jerry Ye , Michael A. Wooldridge, Exploring the efficacy of caption search for bioscience journal search interfaces, Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, June 29-29, 2007, Prague, Czech Republic
|
|
|
|
|
|
|
|
|
|
|