ACM Home Page
Please provide us with feedback. Feedback
Evaluating a probabilistic model for cross-lingual information retrieval
Full text PdfPdf (204 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
New Orleans, Louisiana, United States
Pages: 105 - 110  
Year of Publication: 2001
ISBN:1-58113-331-6
Authors
Jinxi Xu  BBN Technologies, Cambridge, MA
Ralph Weischedel  BBN Technologies, Cambridge, MA
Chanh Nguyen  BBN Technologies, Cambridge, MA
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 89,   Citation Count: 24
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/383952.383968
What is a DOI?

ABSTRACT

This work proposes and evaluates a probabilistic cross-lingual retrieval system. The system uses a generative model to estimate the probability that a document in one language is relevant, given a query in another language. An important component of the model is translation probabilities from terms in documents to terms in a query. Our approach is evaluated when 1) the only resource is a manually generated bilingual word list, 2) the only resource is a parallel corpus, and 3) both resources are combined in a mixture model. The combined resources produce about 90% of monolingual performance in retrieving Chinese documents. For Spanish the system achieves 85% of monolingual performance using only a pseudo-parallel Spanish-English corpus. Retrieval results are comparable with those of the structural query translation technique (Pirkola, 1998) when bilingual lexicons are used for query translation. When parallel texts in addition to conventional lexicons are used, it achieves better retrieval results but requires more computation than the structural query translation technique. It also produces slightly better results than using a machine translation system for CLIR, but the improvement over the MT system is not significant.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Allan, J., Callan, J., Feng, F-F, and Malin, D. 2000. "INQUERY at TREC8." In TREC8 Proceedings, Special publication by NIST, 2000.
2
3
 
4
 
5
6
 
7
Hull, D. 1997. "Using structured queries for disambiguation in cross-language information retrieval." In AAAI Symposium on Cross-Language Text and Speech Retrieval, 1997.
 
8
Klavans, J. and Hovy, E. 1999. "Multilingual (or Crosslingual) Information Retrieval". Chapter 2, Multilingual Information Management, current levels and future abilities. Editors, E. Hovy, N. Ide, R. Frederking, J. Mariani and A. Zampolli, Arpil, 1999.
9
 
10
Kwok, K.L. 2000. "TREC9 Cross-language, questionanswering track experiments using PIRCS." TREC9 Proceedings published by NIST, 2000.
 
11
Lafferty, J. 1999. Personal communications.
 
12
Levow, G.A. and Oard, D. 1999. "Evaluating lexical coverage for cross-language information retrieval." In Workshop on Multilingual Information Processing and Asian Language Processing, Beijing, 1999.
 
13
14
 
15
16
17
 
18
Porter, M. 1980. "An algorithm for suffix stripping." Program 14, 3(1980), pages 130-137.
 
19
Rabiner, L. 1989. "A tutorial on Hidden Markov models and selected applications in speech recognition", In Proceedings of IEEE 77, pages 257-286, 1989.
20
21
 
22
Voorhees, E. and Harman, D. 1997. TREC-5 Proceedings. NIST special publication, 1997.
 
23
Voorhees, E. and Harman, D. 2000. TREC-9 Proceedings. To be published by NIST.
24

CITED BY  24

Collaborative Colleagues:
Jinxi Xu: colleagues
Ralph Weischedel: colleagues
Chanh Nguyen: colleagues