ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Understanding captions in biomedical publications
Full text PdfPdf (362 KB)
Source International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
POSTER SESSION: Research track table of contents
Pages: 499 - 504  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
William W. Cohen  Carnegie Mellon University, Pittsburgh, PA
Richard Wang  Carnegie Mellon University, Pittsburgh, PA
Robert F. Murphy  Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 26,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956809
What is a DOI?

ABSTRACT

From the standpoint of the automated extraction of scientific knowledge, an important but little-studied part of scientific publications are the figures and accompanying captions. Captions are dense in information, but also contain many extra-grammatical constructs, making them awkward to process with standard information extraction methods. We propose a scheme for "understanding" captions in biomedical publications by extracting and classifying "image pointers" (references to the accompanying image). We evaluate a number of automated methods for this task, including hand-coded methods, methods based on existing learning techniques, and methods based on novel learning techniques. The best of these methods leads to a usefully accurate tool for caption-understanding, with both recall and precision in excess of 94% on the most important single class in a combined extraction/classification task.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
M. V. Boland and R. F. Murphy. A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of hela cells. Bioinformatics, 17(12):1213--1223, December 2001.
 
3
R. Bunescu, R. Ge, R. J. Mooney, E. Marcotte, and A. K. Ramani. Extracting gene and protein names from biomedical abstracts. Unpublished Technical Note, Available from http://www.cs.utexas.edu/users/ml/publication/ie.html, 2002.
 
4
W. W. Cohen. Fast effective rule induction. In Machine Learning: Proceedings of the Twelfth International Conference, Lake Tahoe, California, 1995. Morgan Kaufmann.
 
5
W. W. Cohen. Infrastructure components for large-scale information extraction systems. In Proceedings of The Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-2003), Acapulco, Mexico, 2003.
 
6
 
7
 
8
 
9
K. Fukuda, T. Tsunoda, A. Tamura, and T. Takagi. Toward information extraction: Identifying protein names from biological papers. In Proceedings of 1998 the Pacific Symposium on Biocomputing (PSB-1998), pages 707--718, 1998.
 
10
K. Humphreys, G. Demetriou, and R. Gaizauskas. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000), pages 502--513, 2000.
 
11
 
12
 
13
R. F. Murphy, M. Velliste, and G. Porreca. Robust classification of subcellular location patterns in fluorescence microscope images. In Proceedings of the 2002 IEEE International Workshop on Neural Networks for Signal Processing, pages 67--76, September 2002.
 
14
J. Pustejovsky, J. Castaño, J. Zhang, M. Kotecki, and B. Cochran. Robust relational parsing over biomedical literature: Extracting inhibit relations. In Proceedings of 2002 the Pacific Symposium on Biocomputing (PSB-2002), pages 362--373, 2002.
 
15
T. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter. Edgar: Extraction of drugs, genes and relations from the biomedical literature. In Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000), pages 514--525, 2000.
 
16
T. Sekimizu, H. Park, and J. Tsujii. Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. In Genome Informatics, pages 62--71. Universal Academy Press, Inc, 1998.
 
17
B. Stapley, L. Kelley, and M. Sternberg. Predicting the sub-cellular location of proteins from text using support vector machines. In Proceedings of the 2002 Pacific Symposium on Biocomputing, pages 374--385, 2002.
 
18
M. Stephens, M. Palakal, S. Mukhopadhyay, R. Raje, and J. Mostafa. Detecting gene relations from medline abstracts. In Pacific Symposium on Biocomputing, pages 483--496, 2001.
 
19
J. Thomas, D. Milward, C. Ouzounis, S. Pulman, and M. Carroll. Automatic extraction of protein interactions from scientific abstracts. In Proceedings of 2000 the Pacific Symposium on Biocomputing (PSB-2000), pages 538--549, 2000.
 
20
M. Velliste and R. F. Murphy. Automated determination of protein subcellular locations from 3d fluorescence microscope images. In Proceedings of the 2002 IEEE International Symposium on Biomedical Imaging (ISBI-2002), pages 867--870, July 2002.


Collaborative Colleagues:
William W. Cohen: colleagues
Richard Wang: colleagues
Robert F. Murphy: colleagues