ACM Home Page
Please provide us with feedback. Feedback
Automatic transliteration for Japanese-to-English text retrieval
Full text PdfPdf (243 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval table of contents
Toronto, Canada
SESSION: Cross-lingual information retrieval table of contents
Pages: 353 - 360  
Year of Publication: 2003
ISBN:1-58113-646-3
Authors
Yan Qu  Clairvoyance Corporation, Pittsburgh, PA
Gregory Grefenstette  Clairvoyance Corporation, Pittsburgh, PA
David A. Evans  Clairvoyance Corporation, Pittsburgh, PA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 77,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/860435.860499
What is a DOI?

ABSTRACT

For cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between Japanese and English. In this paper, we describe a method for automatically creating and validating candidate Japanese transliterated terms of English words. A phonetic English dictionary and a set of probabilistic mapping rules are used for automatically generating transliteration candidates. A monolingual Japanese corpus is then used for automatically validating the transliterated terms. We evaluate the usage of the extracted English-Japanese transliteration pairs with Japanese to English retrieval experiments over the CLEF bilingual test collections. The use of our automatically derived extension to a bilingual translation dictionary improves average precision, both before and after pseudo-relevance feedback, with gains ranging from 2.5% to 64.8%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Buckley, C., Mitra, M., Walz, J., and Cardie, C. Using Clustering and SuperConcepts within SMART: TREC 6. In Voorhees Ellen M. and Donna K. Harman (editors). The Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240, Gaithersburg, MD, 1998, 107--124.
 
3
 
4
Davis, M. On the Effective Use of Large Parallel Corpora in Cross-language Text Retrieval. In G. Grefenstette, ed., Cross-Language Information Retrieval, Kluwer Academic Publishers, 1998, 12--22.
 
5
Docherty, V. and Heid, U. Computational Metalexicography in Practice - Corpus-Based Support for the Revision of a Commercial Dictionary. In Proceedings of the VIIIth EURALEX International Congress, 1998, 333--346.
 
6
 
7
Fujii, A., and Ishikawa, T. Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration. Computer and the Humanities, Vol 35, No. 4, 2001, 389--420.
8
 
9
Grefenstette, G. The Problem of Cross Language Information Retrieval. In G. Grefenstette, ed., Cross Language Information Retrieval, Kluwer Academic Publishers, 1998, 1--9.
 
10
 
11
Kando, N., Nozue, T. (ed.) NTCIR Workshop 1: Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. Tokyo, Japan, 1999.
 
12
Kang, B. J. and Choi, K. S. Automatic English- Korean Back-transliteration. In Proceedings of 11th Conference on Hangul and Korean Information Processing, 1999.
 
13
14
 
15
Meng. H. M., Lo, W., Chen, B., and Tang, K. Generating Phonetic Cognates to Handel Named Entities in English-Chinese Cross-Language Spoken Document Retrieval. In The Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU 2001), 2001.
 
16
Milic-Frayling, N., Tong, X., Zhai, C., Evans, D.A. CLARIT Compound Queries and Constraint-Controlled Feedback in TREC-5 Ad-Hoc Experiments. In E.M. Voorhees and D.K. Harman (Editors), The Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238. Washington, DC: U.S. Government Printing Office, 1997, 315--334.
 
17
Peters, C., Braschler, M., Gonzalo, J., Kluck, M. Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Revised Papers. Springer, 2002.
 
18
Peters, C., Braschler, M., Gonzalo, J., Kluck, M. Evaluation of Cross-Language Information Retrieval Systems, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002, Revised Papers. Springer, to appear.
 
19
Qu, Y., Gefenstette, G., and Evans, D. A. Resolving Translation Ambiguity using Monolingual Corpora. In the Working Notes for the CLEF 2002 Workshop, 2002, 115--126.
 
20
Stalls, B. G., and Knight, K. Translating Names and Technical Terms in Arabic Text. In Proceedings of the COLNG/ACL Workshop on Computational Approaches to Semitic Languages, 1998.

CITED BY  7

Collaborative Colleagues:
Yan Qu: colleagues
Gregory Grefenstette: colleagues
David A. Evans: colleagues