ACM Home Page
Please provide us with feedback. Feedback
Corpus microsurgery: criteria optimization for medical cross-language ir
Full text PdfPdf (188 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
POSTER SESSION: Poster session 1/information retrieval table of contents
Pages 1365-1366  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Monica Rogati  LinkedIn, Mountain View, CA, USA
Yiming Yang  Carnegie Mellon University, Pittsburgh, PA, USA
Jaime Carbonell  Carnegie Mellon University, Pittsburgh, PA, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 45,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458281
What is a DOI?

ABSTRACT

Automatic subset selection from a parallel corpus significantly cross-lingual information retrieval (CLIR) performance, in addition to increasing its efficiency. Our selection method extracts relevant training data by incorporating additional criteria (i.e. estimated corpus quality, taxonomy projection and size) in addition to lexical-based criteria. The challenge lies in combining these criteria using a meaningful scoring function that can be used for ranking parallel sentence candidates. We choose weighted geometric mean for its soft-AND properties, and we optimize criteria weights by wrapping the CLIR task in an optimization shell. Due to the indeterminate nature of the search space convexity properties, we have explored continuous reactive tabu search (CRTS), a global optimization method. We use a large parallel corpus in the medical domain to examine the effect of adaptation criteria and their combination on CLIR performance. In our experiments, 100 selected sentences yield 90% of the performance obtained with 5,000 times more in-domain parallel sentences. Our optimized criteria weights considerably outperform the uniform distribution baseline, as well as lexical

similarity adaptation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Battiti and G. Tecchiolli. The continuous reactive tabu search: blending combinatorial optimization and stochastic search for global optimization, 1995.
 
2
M. Rogati. Domain Adaptation of Translation Models for Multilingual Applications. PhD Thesis, unpublished http://www.cs.cmu.edu/~mrogati/thesis.pdf, 2008.
3

Collaborative Colleagues:
Monica Rogati: colleagues
Yiming Yang: colleagues
Jaime Carbonell: colleagues