|
ABSTRACT
Biological literature databases continue to grow rapidly with vital information that is important for conducting sound biomedical research. As data and information space continue to grow exponentially, the need for rapidly surveying the published literature, synthesizing, and discovering the embedded "knowledge" is becoming critical to allow the researchers to conduct "informed" work, avoid repetition, and generate new hypotheses. Knowledge, in this case, is defined as one-to-many and many-to-many relationships among biological entities such as gene, protein, drug, disease, etc. The knowledge discovery process basically involves identification of biological object names, reference resolution, ontology and synonym discovery, and finally extracting object-object relationships. The overall goal of this work is to investigate and develop a complete knowledge base, called BioMap, using the entire MEDLINE collection of (over 12 million) bibliographic citations and author abstracts from over 4600 biomedical journals worldwide and to develop an interactive knowledge network for users to access this secondary knowledge (BioMap) along with its primary databases such as the MEDLINE. In this paper we present the organization of a distributed database system to maintain the knowledge base of BioMap and some preliminary results on biological object name identification problem based on an initial set of 30,000 MEDLINE abstracts.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Tan, A-H. Text mining: The state of the art and the challenges. In Proc of the Pacific Asia Conf on Knowledge Discovery and Data Mining PAKDD'99 workshop on Knowledge Discovery from Advanced Databases, 65--70.
|
| |
2
|
Smalheiser, N.R. Predicting emerging technologies with the aid of text-based data mining: a micro approach. Technovation 21, 689--693, 2001.
|
| |
3
|
Fukuda, K., Tsunoda. T., Tamura, A., and Takagi, T. Torward Information Extraction: Identifying Protein Names From Biological Papers. PSB 1998, 705--716.
|
| |
4
|
One T., Hishigaki H., Tanigami A., and Takagi T. Automatic extraction of information on protein-protein interactions from the biological literature. Bioinformatics, 17 no. 2, 155--161, 2001.
|
| |
5
|
Humphreys K., Demetrios G., and Gaizauskas R. Two applications of information extraction to biological science journal articles: Enzyme interactions and protein structures. In Proceedings of the Pacific Symposium on Biocomputing, 505--516, Hawaii, Jan 2000.
|
| |
6
|
Thomas J. Milward D., Ouzounis C., Pulman S., and Carroll M. Automatic extraction of protein interactions from scientific abstracts. In Proceedings of the pacific Symposium on biocomputing, 541--551, Hawaii, Jan 2000.
|
| |
7
|
|
| |
8
|
Leroy G. and Chen H. Filling preposition-based templates to capture information from medical abstracts. In Pacific Symposium on Biocomputing 7, pages 350--361, 2002.
|
| |
9
|
Iliopoulos I., Enright A. J., and Ouzounis C.A.Textquest: Document clustering of medline abstracts for concept discovery in molecular biology, Oct. 2001.
|
| |
10
|
Marcotte E. M. Xenarios L., and Eisenberg D. Mining literature for protein-protein interactions. Bioinformatics, 17, 359--363, 2001.
|
| |
11
|
Sanchez C., Lachaize C., Janody F., Bellon B., Roder L., Euzenat J., Rechenmann F., and Jacq B. Grasping at molecular interactions and genetic networks in drosophila melanogaster using flynets, an internet database. Nucleic Acids Res, 27 no. 1, 89--94, 1999.
|
| |
12
|
Tanabe L. and Wilbur W. J. Tagging gene and protein names in biomedical text. Bioinformatics, 18 no 8, 1124--1132, 2002.
|
| |
13
|
Park J. C., Kim H. S., and Kim J. J. Bidirectional incremental paring for automatic pathway identification with combinatory categorical grammar, Oct 2001. http://citeseer.nj.nec.com/384291.html
|
| |
14
|
Oyama T., Kitano K., Satou K., and Ito T. Extraction of knowledge on protein-protein interaction by association rule discovery. Bioinformatics, 18 no 5, 705--714 2002.
|
| |
15
|
Ng S. and Wong M. Toward routing automatic pathway discovery from on-line scientific text abstracts. Genome Informatics, 10, 104--112, 1999.
|
| |
16
|
Hatzivassiloglou, V., Duboue, P., and Rzhetsky, A. Disambiguating Proteins, Genes, and RNA in Text: A Machine Learning Approach. Bioinformatics, 17 Suppl. 1, S97--S106, 2001.
|
| |
17
|
Krauthammer, M., Rzhetsky, A., Morozov, P., and Friedman, C. Using BLAST for Identifying Gene and Protein Names in Journal Articles. Gene 259, 245--252, 2000.
|
| |
18
|
Nobata, C., Collier, N., and Tsujii, J. Automatic Term Identification and Classification in Biology Texts. in Proceedings of the Natural Language Pacific Rim Symposium (NLPRS 2000), 369--375.
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
Yakushiji, A., Tateisi, Y., Tsujii, J., and Miyao, Y. Use of a Full Parser for Information Extraction in Molecular Biology Domain. Genome Informatics II, 446--447, 2000.
|
| |
23
|
Friedman, C., Kra, P., Yu, H., Krauthamrner, M., and Rzhetsky, A. Genies: A Natural-Language Processing System for the Extraction of Molecular Pathways From Journal Articles. Bioinformatics 17 Suppl. 1, S74, -S82, 2001.
|
| |
24
|
Atschul, S. F., Gish, W., Miller, W., Myers, E., and Lipman, D. Basic Local Alignment Search Tool. J. Mol Biol. 215, 403--410, 1990.
|
| |
25
|
Stephens, M., Palakal, M., Mukhopadhyay, S., Raje, R., and Mostafa, J. Detecting Gene Relations From Medline Abstracts. PSB 2001: 483--495.
|
| |
26
|
|
| |
27
|
Craven, M. Learning to Extract Relations from MEDLINE. AAAI, 1999.
|
| |
28
|
Rindflesch, T.C., Tanabe, L., Weinstein, J. N., and Hunter, L., EDGAR: extraction of drugs, genes and relations from the biomedical literature. Pac Symp Biocomput. 2000, 517--28.
|
| |
29
|
OMIM: http://www3.ncbi.nlm.nih.gov/htbin-post/Omim/
|
| |
30
|
PubMed: http://www.ncbi.nlm.nih.gov/entrez
|
| |
31
|
MedMiner: http://discover.nci.nih.gov/textmining/filters.html
|
| |
32
|
Geneards: http://machl.nci.nih.gov/cards/index.html
|
| |
33
|
Jenssen T. K. Laegreid A. Komorowaki J., and Hovig E. A. literature network of human genes for high-throughput analysis of gene expression. Nature Genetics. 2001 May; 28 (1); 21--8. http://www.pubgene.org/
|
| |
34
|
HUGO: http://www.gene.ucl.ac.uk/nomenclature/
|
| |
35
|
Shrager J. The fiction of function. Bioinformatics, 19 (15): 1934--1936, 2003.
|
| |
36
|
Ashburner, M. Gene Ontology: tool for the unification of biology. Nature Genetics, vol. 25, 25--29, 2000.
|
| |
37
|
|
| |
38
|
UMLS: http://www.nlm.nih.gov/research/umls/umlsmain.html
|
| |
39
|
euGenes: http://iubio.bio.indiana.edu:8089 and http://iubio.bio.indiana.edu:8089/docs/eugenes-nar02-doc.pdf
|
| |
40
|
LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink
|
| |
41
|
GENATLAS: http://www.dsi.univ-paris5.fr/genatlas
|
| |
42
|
MetaMap Transfer: http://mmtx.nlm.nih.gov
|
CITED BY 3
|
|
|
|
|
Anna Antony , Srilaxmi Basetty , Shielly Hartanto , Mathew Palakal, Computational approach to biological validation of protein-protein interactions discovered using literature mining, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|