ACM Home Page
Please provide us with feedback. Feedback
BioMap: toward the development of a knowledge base of biomedical literature
Full text PdfPdf (213 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2004 ACM symposium on Applied computing table of contents
Nicosia, Cyprus
SESSION: Bioinformatics (BIO) table of contents
Pages: 121 - 127  
Year of Publication: 2004
ISBN:1-58113-812-1
Authors
Kamal Kumar  Indiana University Purdue University, Indianapolis, IN
Mathew J. Palakal  Indiana University Purdue University, Indianapolis, IN
Snehasis Mukhopadhyay  Indiana University Purdue University, Indianapolis, IN
Mathew J. Stephens  Indiana University School of Medicine, Indianapolis, IN
Huian Li  Indiana University Purdue University, Indianapolis, IN
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 71,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/967900.967927
What is a DOI?

ABSTRACT

Biological literature databases continue to grow rapidly with vital information that is important for conducting sound biomedical research. As data and information space continue to grow exponentially, the need for rapidly surveying the published literature, synthesizing, and discovering the embedded "knowledge" is becoming critical to allow the researchers to conduct "informed" work, avoid repetition, and generate new hypotheses. Knowledge, in this case, is defined as one-to-many and many-to-many relationships among biological entities such as gene, protein, drug, disease, etc. The knowledge discovery process basically involves identification of biological object names, reference resolution, ontology and synonym discovery, and finally extracting object-object relationships. The overall goal of this work is to investigate and develop a complete knowledge base, called BioMap, using the entire MEDLINE collection of (over 12 million) bibliographic citations and author abstracts from over 4600 biomedical journals worldwide and to develop an interactive knowledge network for users to access this secondary knowledge (BioMap) along with its primary databases such as the MEDLINE. In this paper we present the organization of a distributed database system to maintain the knowledge base of BioMap and some preliminary results on biological object name identification problem based on an initial set of 30,000 MEDLINE abstracts.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Tan, A-H. Text mining: The state of the art and the challenges. In Proc of the Pacific Asia Conf on Knowledge Discovery and Data Mining PAKDD'99 workshop on Knowledge Discovery from Advanced Databases, 65--70.
 
2
Smalheiser, N.R. Predicting emerging technologies with the aid of text-based data mining: a micro approach. Technovation 21, 689--693, 2001.
 
3
Fukuda, K., Tsunoda. T., Tamura, A., and Takagi, T. Torward Information Extraction: Identifying Protein Names From Biological Papers. PSB 1998, 705--716.
 
4
One T., Hishigaki H., Tanigami A., and Takagi T. Automatic extraction of information on protein-protein interactions from the biological literature. Bioinformatics, 17 no. 2, 155--161, 2001.
 
5
Humphreys K., Demetrios G., and Gaizauskas R. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In Proceedings of the Pacific Symposium on Biocomputing, 505--516, Hawaii, Jan 2000.
 
6
Thomas J. Milward D., Ouzounis C., Pulman S., and Carroll M. Automatic extraction of protein interactions from scientific abstracts. In Proceedings of the pacific Symposium on biocomputing, 541--551, Hawaii, Jan 2000.
 
7
 
8
Leroy G. and Chen H. Filling preposition-based templates to capture information from medical abstracts. In Pacific Symposium on Biocomputing 7, pages 350--361, 2002.
 
9
Iliopoulos I., Enright A. J., and Ouzounis C.A.Textquest: Document clustering of medline abstracts for concept discovery in molecular biology, Oct. 2001.
 
10
Marcotte E. M. Xenarios L., and Eisenberg D. Mining literature for protein-protein interactions. Bioinformatics, 17, 359--363, 2001.
 
11
Sanchez C., Lachaize C., Janody F., Bellon B., Roder L., Euzenat J., Rechenmann F., and Jacq B. Grasping at molecular interactions and genetic networks in drosophila melanogaster using flynets, an internet database. Nucleic Acids Res, 27 no. 1, 89--94, 1999.
 
12
Tanabe L. and Wilbur W. J. Tagging gene and protein names in biomedical text. Bioinformatics, 18 no 8, 1124--1132, 2002.
 
13
Park J. C., Kim H. S., and Kim J. J. Bidirectional incremental paring for automatic pathway identification with combinatory categorical grammar, Oct 2001. http://citeseer.nj.nec.com/384291.html
 
14
Oyama T., Kitano K., Satou K., and Ito T. Extraction of knowledge on protein-protein interaction by association rule discovery. Bioinformatics, 18 no 5, 705--714 2002.
 
15
Ng S. and Wong M. Toward routing automatic pathway discovery from on-line scientific text abstracts. Genome Informatics, 10, 104--112, 1999.
 
16
Hatzivassiloglou, V., Duboue, P., and Rzhetsky, A. Disambiguating Proteins, Genes, and RNA in Text: A Machine Learning Approach. Bioinformatics, 17 Suppl. 1, S97--S106, 2001.
 
17
Krauthammer, M., Rzhetsky, A., Morozov, P., and Friedman, C. Using BLAST for Identifying Gene and Protein Names in Journal Articles. Gene 259, 245--252, 2000.
 
18
Nobata, C., Collier, N., and Tsujii, J. Automatic Term Identification and Classification in Biology Texts. in Proceedings of the Natural Language Pacific Rim Symposium (NLPRS 2000), 369--375.
 
19
 
20
 
21
 
22
Yakushiji, A., Tateisi, Y., Tsujii, J., and Miyao, Y. Use of a Full Parser for Information Extraction in Molecular Biology Domain. Genome Informatics II, 446--447, 2000.
 
23
Friedman, C., Kra, P., Yu, H., Krauthamrner, M., and Rzhetsky, A. Genies: A Natural-Language Processing System for the Extraction of Molecular Pathways From Journal Articles. Bioinformatics 17 Suppl. 1, S74, -S82, 2001.
 
24
Atschul, S. F., Gish, W., Miller, W., Myers, E., and Lipman, D. Basic Local Alignment Search Tool. J. Mol Biol. 215, 403--410, 1990.
 
25
Stephens, M., Palakal, M., Mukhopadhyay, S., Raje, R., and Mostafa, J. Detecting Gene Relations From Medline Abstracts. PSB 2001: 483--495.
 
26
 
27
Craven, M. Learning to Extract Relations from MEDLINE. AAAI, 1999.
 
28
Rindflesch, T.C., Tanabe, L., Weinstein, J. N., and Hunter, L., EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pac Symp Biocomput. 2000, 517--28.
 
29
OMIM: http://www3.ncbi.nlm.nih.gov/htbin-post/Omim/
 
30
PubMed: http://www.ncbi.nlm.nih.gov/entrez
 
31
MedMiner: http://discover.nci.nih.gov/textmining/filters.html
 
32
Geneards: http://machl.nci.nih.gov/cards/index.html
 
33
Jenssen T. K. Laegreid A. Komorowaki J., and Hovig E. A. literature network of human genes for high-throughput analysis of gene expression. Nature Genetics. 2001 May; 28 (1); 21--8. http://www.pubgene.org/
 
34
HUGO: http://www.gene.ucl.ac.uk/nomenclature/
 
35
Shrager J. The fiction of function. Bioinformatics, 19 (15): 1934--1936, 2003.
 
36
Ashburner, M. Gene Ontology: tool for the unification of biology. Nature Genetics, vol. 25, 25--29, 2000.
 
37
 
38
UMLS: http://www.nlm.nih.gov/research/umls/umlsmain.html
 
39
euGenes: http://iubio.bio.indiana.edu:8089 and http://iubio.bio.indiana.edu:8089/docs/eugenes-nar02-doc.pdf
 
40
LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink
 
41
GENATLAS: http://www.dsi.univ-paris5.fr/genatlas
 
42
MetaMap Transfer: http://mmtx.nlm.nih.gov


Collaborative Colleagues:
Kamal Kumar: colleagues
Mathew J. Palakal: colleagues
Snehasis Mukhopadhyay: colleagues
Mathew J. Stephens: colleagues
Huian Li: colleagues