ACM Home Page
Please provide us with feedback. Feedback
Structured correspondence topic models for mining captioned figures in biological literature
Full text MovMov (19:30),  PdfPdf (1.40 MB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Paris, France
SESSION: Research track papers table of contents
Pages 39-48  
Year of Publication: 2009
ISBN:978-1-60558-495-9
Authors
Amr Ahmed  Carnegie Mellon University, Pittsburgh, PA, USA
Eric P. Xing  Carnegie Mellon University, Pittsburgh, PA, USA
William W. Cohen  Carnegie Mellon University, Pittsburgh, PA, USA
Robert F. Murphy  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 36,   Downloads (12 Months): 111,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1557019.1557031
What is a DOI?

ABSTRACT

A major source of information (often the most crucial and informative part) in scholarly articles from scientific journals, proceedings and books are the figures that directly provide images and other graphical illustrations of key experimental results and other scientific contents. In biological articles, a typical figure often comprises multiple panels, accompanied by either scoped or global captioned text. Moreover, the text in the caption contains important semantic entities such as protein names, gene ontology, tissues labels, etc., relevant to the images in the figure. Due to the avalanche of biological literature in recent years, and increasing popularity of various bio-imaging techniques, automatic retrieval and summarization of biological information from literature figures has emerged as a major unsolved challenge in computational knowledge extraction and management in the life science. We present a new structured probabilistic topic model built on a realistic figure generation scheme to model the structurally annotated biological figures, and we derive an efficient inference algorithm based on collapsed Gibbs sampling for information retrieval and visualization. The resulting program constitutes one of the key IR engines in our SLIF system that has recently entered the final round (4 out 70 competing systems) of the Elsevier Grand Challenge on Knowledge Enhancement in the Life Science. Here we present various evaluations on a number of data mining tasks to illustrate our method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Ahmed, E. P. Xing, W. W. Cohen, and R. F. Murphy. Structured correspondence topic models for mining captioned figures in biological literature. Technical report, CMU, 2009.
 
2
3
 
4
C. Chemudugunta, P. Smyth, and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. NIPS, 2006.
5
 
6
S. Deerwester, S. Dumais, G. Furnas, T. Lanouauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.
 
7
A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis 2nd edition. Chapman-Hall, 2003.
 
8
T. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.
 
9
V. Jain, E. Learned-Miller, and A. McCallum. People-lda: Anchoring topics to people using face recognition. ICCV, 2007.
10
 
11
Z. Kou, W. W. Cohen, and R. F. Murphy. High-recall protein entity recognition using a dictionary. ISMB, 2005.
 
12
F. Leitner and A. Valencia. A text-mining perspective on the requirements for electronically annotated abstracts. FEBS Letters, 582(8):1178--1181, 2008.
 
13
 
14
15
 
16
J. Pan, H. Yang, C. Faloutsos, and P. Duygulu. Gcap: Graph-based automatic image captioning. Workshop on Multimedia Data and Document Engineering, 2004.
17
 
18
J. Yang, Y. Liu, E. P. Xing, and A. Hauptmann. Harmonium-based models for semantic video representation and classification. SDM, 2005.

Collaborative Colleagues:
Amr Ahmed: colleagues
Eric P. Xing: colleagues
William W. Cohen: colleagues
Robert F. Murphy: colleagues