ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Learning word sense disambiguation in biomedical text with difference between training and test distributions
Full text PdfPdf (414 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the third international workshop on Data and text mining in bioinformatics table of contents
Hong Kong, China
SESSION: Bio-text mining table of contents
Pages: 59-66  
Year of Publication: 2009
ISBN:978-1-60558-803-2
Authors
Jeong-Woo Son  Kyungpook National University, Daegu, South Korea
Seong-Bae Park  Kyungpook National University, Daegu, South Korea
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 8,   Downloads (12 Months): 20,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1651318.1651330
What is a DOI?

ABSTRACT

Word sense disambiguation (WSD) is a crucial issue in bio-medical text mining since the performance of diverse biomedical text mining techniques strongly depends on the senses of lexicons. Thus, it is natural to consider lexicons as the most crucial features in WSD. However, due to the diversity of lexical space, WSD methods based on machine learning techniques with lexical features suffer from the difference between distributions of training and test documents. To tackle this problem, this paper proposes support vector machines with example-wise weights. In this method, the training distribution is made coincide with the test distribution by weighting training examples according to their similarity to all test data. The experimental results show that the distribution change between training and test data is actually recognized and the proposed method which considers this change in its training phase outperforms ordinary support vector machines.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Arnold, R. Nallapati, and W. Cohen: Exploiting Feature Hierarchy for Transfer Learning in Naemd Entity Recognition: In Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics, 245--253, 2008.
 
2
O. Bodenreider: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32:267--270, 2004.
 
3
O. Chapelle, B. Schölkopf, and A. Zien: Semi-Supervised Learning: MIT Press, 2006.
 
4
C. Chelba and A. Acero: Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lot: In Proceedings of Empirical Methods in Natural Language Processing and Very Large Corpora, 285--292, 2004.
 
5
H. Daumé and D. Marcu: Frustratingly Easy Domain Adaptation: In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 256--263, 2007.
 
6
W. Duan, M. Song, and A. Yates: Fast Max-margin Clustering for Unsupervised Word Sense Disambiguation in Biomedical Texts: BMC Bioinformatics, 10(3), 2009
 
7
 
8
 
9
R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, and S. Roukos: A Statistical Model for Multilingual Entity Detection and Tracking: In Proceedings of Human Language Technology and North American Chapter of the Association for Computational Linguistics Annual Meeting, 1--8, 2004.
 
10
V. Hatzivassiloglou, PA. Duboue, and A. Rzhetsky: Disambiguating proteins, genes, and RNA in text: A machine learning approach: In Proceedings of the Ninth International Conference on Intelligent System for Molecular Biology, 2001.
 
11
J. Heckman: Sample Selection Bias as a Specification Error: Econometrica, 47(1):153--162, 1979.
 
12
J. Huang, A. Smola, A. Gretton, K. Borgwardt, and B. Schölkopf: Correcting Sample Selection Bias by Unlabeled Data: Advances in Neural Information Processing Systems 19, 601--608, MIT Press, 2007.
 
13
J. Jiang and C. Zhai: Instance Weighting for Domain Adaptation in NLP: In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 264--271, 2007.
 
14
T. Joachims: Making Large-Scale SVM Learning Practical, LS8, Universitaet Dortmund, 1998.
 
15
 
16
H. Liu, V. Teller, and C. Friedman: A Multi-aspect Comparison Study of Supervised Word Sense Disambiguation. Journal of the Americal Medical Informatics Association, 11:320--331, 2004.
 
17
 
18
R. Milidíu, C. Santos, and J. Duarte: Phrase Chunking Using Entropy Guided Transformation Leanring: In Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics, 647--655, 2008.
 
19
T. Pahikkala, S. Pyysalo, F. Ginter, J. Boberg, J. Jarvinen, and T. Salakoski: Kernels Incorporating Word Positional Information in Natural Language Disambiguation Tasks: In Proceedings of FLAIRS, 2005.
 
20
 
21
 
22
J. Shawe-Taylor and N. Cristianini: Support Vector Machines and Other Kernel-based Learning Methods: Cambridge University Press, 2000.
 
23
B. Schijvenaars, B. Mons, M. Weeber, M. Schumie, E. Mulligen, H. Wain, and J. Kors: Thesaurus-based disambiguation of gene symbols. BMC Bioinformatics, 6, 2005.
 
24
 
25
M. Schuemie, J. Kors, and B. Mons: Word Sense Disambiguation in the Biomedical domain: An overview: journal of Computational Biology, 27(3):321--349, 2001.
 
26
H. Shimodaira: Improving Predictive Inference Under Covariate Shift by Weighting the Log-Likelihood Function: Journal of Statistical Planning and Inference, 90(2):227--244, 2000.
 
27
M. Sugiyama, S. Nakajima, H. Kashima, P. Bünau, and M. Kawanabe: Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation: Advances in Neural Information Processing Systems 20, 1433--1440, MIT Press, 2008.
 
28
 
29
M. Weeber, J. Mork, and A. Aronson: Developing a Test Collection for Biomedical Word Sense Disambiguation: In Proceedings of the AMIA 2001 Symposium, 2001.
30

Collaborative Colleagues:
Jeong-Woo Son: colleagues
Seong-Bae Park: colleagues