ACM Home Page
Please provide us with feedback. Feedback
Probabilistic term variant generator for biomedical terms
Full text PdfPdf (165 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: Text representation table of contents
Pages: 167 - 173  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Yoshimasa Tsuruoka  CREST, JST (Japan Science and Technology Corporation, Saitama, Japan and University of Tokyo, Tokyo, Japan
Jun'ichi Tsujii  University of Tokyo, Tokyo, Japan and CREST, JST (Japan Science and Technology Corporation, Saitama, Japan
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 44,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860467
What is a DOI?

ABSTRACT

This paper presents an algorithm to generate possible variants for biomedical terms. The algorithm gives each variant its generation probability representing its plausibility, which is potentially useful for query and dictionary expansions. The probabilistic rules for generating variants are automatically learned from raw texts using an existing abbreviation extraction technique. Our method, therefore, requires no linguistic knowledge or labor-intensive natural language resource. We conducted an experiment using 83,142 MEDLINE abstracts for rule induction and 18,930 abstracts for testing. The results indicate that our method will significantly increase the number of retrieved documents for long biomedical terms.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
3
 
4
C. Jacquemin and E. Tzoukermann. NLP for term variant extraction: Synergy between Morphology, Lexicon and Syntax, pages 25--74. Kluwer Academic Publishers, 1999.
 
5
 
6
J. D. Kim and J. Tsujii. Corpus-based approach to biological entity recognition. In Proceedings of the Second Meeting of the Special Interest Group on Text Data Mining of ISMB 2002, 2002.
 
7
M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman. Using blast for identifying gene and protein names in journal articles. GENE, 259:245--252, 2000.
8
9
 
10
T. Ohta, Y. Tateisi, J.-D. Kim, and J. Tsujii. Genia corpus: an annotated research abstract corpus in molecular biology domain. In Proceedings of the Human Language Technology Conference (HLT 2002), 2002.
 
11
T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi. Automated extraction of information on protein-protein interactions from the biological literature. BIOINFORMATICS, 17(2):155--161, 2001.
 
12
A. Schwartz and M. Hearst. A simple algorithm for identifying abbreviation definitions in biomedical texts,. In Proceedings of the Pacific Symposium on Biocomputing (PSB 2003), 2003.
 
13
14
15
 
16


Collaborative Colleagues:
Yoshimasa Tsuruoka: colleagues
Jun'ichi Tsujii: colleagues