|
ABSTRACT
Bilingual dictionaries have been commonly used for query translation in cross-language information retrieval (CLIR). However, we are faced with the problem of translation selection. Several recent studies suggested the utilization of term co-occurrences in this selection. This paper presents two extensions to improve them. First, we extend the basic co-occurrence model by adding a decaying factor that decreases the mutual information when the distance between the terms increases. Second, we incorporate a triple translation model, in which syntactic dependence relations (represented as triples) are integrated. Our evaluation on translation accuracy shows that translating triples as units is more precise than a word-by-word translation. Our CLIR experiments show that the addition of the decaying factor leads to substantial improvements of the basic co-occurrence model; and the triple translation model brings further improvements.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
Davis, M. W., and Ogden, W. C. (1997). Free resources and advanced alignment for cross-language text retrieval. In: TREC-6, pp. 285--402.
|
| |
7
|
|
 |
8
|
Jianfeng Gao , Jian-Yun Nie , Endong Xun , Jian Zhang , Ming Zhou , Changning Huang, Improving query translation for cross-language information retrieval using statistical models, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.96-104, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.383966]
|
| |
9
|
Gao, J., Nie, J. Y., Zhang, J., Xun, E., Su, Y., Zhou, M., and Huang, C. (2000). TREC-9 CLIR experiments at MSRCN. In TREC-9, pp. 343--353.
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
Peters, C., and Picchi, E. (1996). Cross language information retrieval: A system for comparable corpus querying. In SIGIR'96 Workshop on Cross-linguistic Information Retrieval, pp. 24--33.
|
| |
15
|
Robertson, S. E., and Walker, S. (2000). Microsoft Cambridge at TREC-9: Filtering track. In TREC-9, pp. 361--368.
|
| |
16
|
Voorhees, E., Harman, D. (2001). Overview of the ninth text retrieval conference (TREC-9). In TREC-9 pp. 1--14.
|
| |
17
|
Xu, J., and Weischedel, R. (2000). TREC-9 cross-lingual retrieval at BBN. In TREC-9, pp. 106--116.
|
| |
18
|
Zhou, M., Ding, Y., and Huang, C. (2001). Improving translation selection with a new translation model trained by independent monolingual corpora. Computational linguistics and Chinese Language Processing. Vol. 6, No. 1, pp 1--26.
|
 |
19
|
Rila Mandala , Takenobu Tokunaga , Hozumi Tanaka, Combining multiple evidence from different types of thesaurus for query expansion, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.191-197, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312677]
|
CITED BY 27
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Gao , Cheng Niu , Jian-Yun Nie , Ming Zhou , Jian Hu , Kam-Fai Wong , Hsiao-Wuen Hon, Cross-lingual query suggestion using query logs of different languages, Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, July 23-27, 2007, Amsterdam, The Netherlands
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dong Zhou , Mark Truran , Tim Brailsford , Helen Ashman , James Goulding, Gcon: a graph-based technique for resolving ambiguity in query translation candidates, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
Additional Classification:
F.
Theory of Computation
F.1
COMPUTATION BY ABSTRACT DEVICES
F.1.2
Modes of Computation
Subjects:
Probabilistic computation
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.2
User Interfaces (D.2.2, H.1.2, I.3.6)
Subjects:
Natural language
General Terms:
Algorithms,
Design,
Documentation,
Experimentation,
Languages,
Measurement,
Performance,
Reliability,
Theory
Keywords:
CLIR,
co-occurrence,
parse,
query translation,
statistical model
|