|
ABSTRACT
For cross language information retrieval (CLIR) based on bilingual translation dictionaries, good performance depends upon lexical coverage in the dictionary. This is especially true for languages possessing few inter-language cognates, such as between Japanese and English. In this paper, we describe a method for automatically creating and validating candidate Japanese transliterated terms of English words. A phonetic English dictionary and a set of probabilistic mapping rules are used for automatically generating transliteration candidates. A monolingual Japanese corpus is then used for automatically validating the transliterated terms. We evaluate the usage of the extracted English-Japanese transliteration pairs with Japanese to English retrieval experiments over the CLEF bilingual test collections. The use of our automatically derived extension to a bilingual translation dictionary improves average precision, both before and after pseudo-relevance feedback, with gains ranging from 2.5% to 64.8%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Buckley, C., Mitra, M., Walz, J., and Cardie, C. Using Clustering and SuperConcepts within SMART: TREC 6. In Voorhees Ellen M. and Donna K. Harman (editors). The Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240, Gaithersburg, MD, 1998, 107--124.
|
| |
3
|
|
| |
4
|
Davis, M. On the Effective Use of Large Parallel Corpora in Cross-language Text Retrieval. In G. Grefenstette, ed., Cross-Language Information Retrieval, Kluwer Academic Publishers, 1998, 12--22.
|
| |
5
|
Docherty, V. and Heid, U. Computational Metalexicography in Practice - Corpus-Based Support for the Revision of a Commercial Dictionary. In Proceedings of the VIIIth EURALEX International Congress, 1998, 333--346.
|
| |
6
|
|
| |
7
|
Fujii, A., and Ishikawa, T. Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration. Computer and the Humanities, Vol 35, No. 4, 2001, 389--420.
|
 |
8
|
Jianfeng Gao , Ming Zhou , Jian-Yun Nie , Hongzhao He , Weijun Chen, Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564409]
|
| |
9
|
Grefenstette, G. The Problem of Cross Language Information Retrieval. In G. Grefenstette, ed., Cross Language Information Retrieval, Kluwer Academic Publishers, 1998, 1--9.
|
| |
10
|
|
| |
11
|
Kando, N., Nozue, T. (ed.) NTCIR Workshop 1: Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. Tokyo, Japan, 1999.
|
| |
12
|
Kang, B. J. and Choi, K. S. Automatic English- Korean Back-transliteration. In Proceedings of 11th Conference on Hangul and Korean Information Processing, 1999.
|
| |
13
|
|
 |
14
|
Akira Maeda , Fatiha Sadat , Masatoshi Yoshikawa , Shunsuke Uemura, Query term disambiguation for Web cross-language information retrieval using a search engine, Proceedings of the fifth international workshop on on Information retrieval with Asian languages, p.25-32, September 30-October 01, 2000, Hong Kong, China
[doi> 10.1145/355214.355218]
|
| |
15
|
Meng. H. M., Lo, W., Chen, B., and Tang, K. Generating Phonetic Cognates to Handel Named Entities in English-Chinese Cross-Language Spoken Document Retrieval. In The Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU 2001), 2001.
|
| |
16
|
Milic-Frayling, N., Tong, X., Zhai, C., Evans, D.A. CLARIT Compound Queries and Constraint-Controlled Feedback in TREC-5 Ad-Hoc Experiments. In E.M. Voorhees and D.K. Harman (Editors), The Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238. Washington, DC: U.S. Government Printing Office, 1997, 315--334.
|
| |
17
|
Peters, C., Braschler, M., Gonzalo, J., Kluck, M. Evaluation of Cross-Language Information Retrieval Systems, Second Workshop of the Cross-Language Evaluation Forum, CLEF 2001, Revised Papers. Springer, 2002.
|
| |
18
|
Peters, C., Braschler, M., Gonzalo, J., Kluck, M. Evaluation of Cross-Language Information Retrieval Systems, Third Workshop of the Cross-Language Evaluation Forum, CLEF 2002, Revised Papers. Springer, to appear.
|
| |
19
|
Qu, Y., Gefenstette, G., and Evans, D. A. Resolving Translation Ambiguity using Monolingual Corpora. In the Working Notes for the CLEF 2002 Workshop, 2002, 115--126.
|
| |
20
|
Stalls, B. G., and Knight, K. Translating Names and Technical Terms in Arabic Text. In Proceedings of the COLNG/ACL Workshop on Computational Approaches to Semitic Languages, 1998.
|
|