|
ABSTRACT
This paper presents a novel statistical model for cross-language information retrieval. Given a written query in the source language, documents in the target language are ranked by integrating probabilities computed by two statistical models: a query-translation model, which generates most probable term-by-term translations of the query, and a query-document model, which evaluates the likelihood of each document and translation. Integration of the two scores is performed over the set of N most probable translations of the query. Experimental results with values N=1, 5, 10 are presented on the Italian-English bilingual track data used in the CLEF 2000 and 2001 evaluation campaigns.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
 |
6
|
|
| |
7
|
S. Johnson, P. Jourlin, K. S. Jones, and P. Woodland. Spoken document retrieval for TREC-8 at Cambridge University. In Proceedings of the 8th Text REtrieval Conference, Gaithersburg, MD, 1999.
|
| |
8
|
W. Kraaij, R. Pohlmann, and D. Hiemstra. Twenty-One at TREC-8: using Language Technology for Information Retrieval. In Proceedings of the 8th Text Retrieval Conference TREC-8, pages 285--300, 2000.
|
| |
9
|
D. R. H. Miller, T. Leek, and R. M. Schwartz. BBN at TREC-7: Using hidden Markov models for information retrieval. In Proceedings of the 7th Text REtrieval Conference, pages 133--142, Gaithersburg, MD, 1998.
|
| |
10
|
H. Ney, U. Essen, and R. Kneser. On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8:1--38, 1994.
|
| |
11
|
K. Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the 8th Text REtrieval Conference, Gaithersburg, MD, 1999.
|
| |
12
|
|
| |
13
|
|
| |
14
|
C. Peters, editor. Working notes for the CLEF 2001 Workshop. Darmstatd, Germany, 2001.
|
| |
15
|
|
| |
16
|
F. K. Soong and E. F. Huang. A tree-trellis based fast search for finding the n-best sentence hypotheses in continuous speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 705--708, Toronto, Canada, 1991.
|
| |
17
|
I. H. Witten and T. C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inform. Theory, IT-37(4):1085--1094, 1991.
|
 |
18
|
|
 |
19
|
|
|