| Probabilistic term variant generator for biomedical terms |
| Full text |
Pdf
(165 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
table of contents
Toronto, Canada
SESSION: Text representation
table of contents
Pages: 167 - 173
Year of Publication: 2003
ISBN:1-58113-646-3
|
|
Authors
|
|
Yoshimasa Tsuruoka
|
CREST, JST (Japan Science and Technology Corporation, Saitama, Japan and University of Tokyo, Tokyo, Japan
|
|
Jun'ichi Tsujii
|
University of Tokyo, Tokyo, Japan and CREST, JST (Japan Science and Technology Corporation, Saitama, Japan
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 37, Citation Count: 3
|
|
|
ABSTRACT
This paper presents an algorithm to generate possible variants for biomedical terms. The algorithm gives each variant its generation probability representing its plausibility, which is potentially useful for query and dictionary expansions. The probabilistic rules for generating variants are automatically learned from raw texts using an existing abbreviation extraction technique. Our method, therefore, requires no linguistic knowledge or labor-intensive natural language resource. We conducted an experiment using 83,142 MEDLINE abstracts for rule induction and 18,930 abstracts for testing. The results indicate that our method will significantly increase the number of retrieved documents for long biomedical terms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
C. Jacquemin and E. Tzoukermann. NLP for term variant extraction: Synergy between Morphology, Lexicon and Syntax, pages 25--74. Kluwer Academic Publishers, 1999.
|
| |
5
|
Jun'ichi Kazama , Takaki Makino , Yoshihiro Ohta , Jun'ichi Tsujii, Tuning support vector machines for biomedical named entity recognition, Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain, p.1-8, July 11-11, 2002, Phildadelphia, Pennsylvania
[doi> 10.3115/1118149.1118150]
|
| |
6
|
J. D. Kim and J. Tsujii. Corpus-based approach to biological entity recognition. In Proceedings of the Second Meeting of the Special Interest Group on Text Data Mining of ISMB 2002, 2002.
|
| |
7
|
M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman. Using blast for identifying gene and protein names in journal articles. GENE, 259:245--252, 2000.
|
 |
8
|
|
 |
9
|
|
| |
10
|
T. Ohta, Y. Tateisi, J.-D. Kim, and J. Tsujii. Genia corpus: an annotated research abstract corpus in molecular biology domain. In Proceedings of the Human Language Technology Conference (HLT 2002), 2002.
|
| |
11
|
T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi. Automated extraction of information on protein-protein interactions from the biological literature. BIOINFORMATICS, 17(2):155--161, 2001.
|
| |
12
|
A. Schwartz and M. Hearst. A simple algorithm for identifying abbreviation definitions in biomedical texts,. In Proceedings of the Pacific Symposium on Biocomputing (PSB 2003), 2003.
|
| |
13
|
|
 |
14
|
Evelyne Tzoukermann , Judith L. Klavans , Christian Jacquemin, Effective use of natural language processing techniques for automatic conflation of multi-word terms: the role of derivational morphology, part of speech tagging, and shallow parsing, Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, p.148-155, July 27-31, 1997, Philadelphia, Pennsylvania, United States
|
 |
15
|
|
| |
16
|
|
|