ACM Home Page
Please provide us with feedback. Feedback
Cross-lingual relevance models
Full text PdfPdf (288 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Tampere, Finland
SESSION: Cross-language Information Retrieval table of contents
Pages: 175 - 182  
Year of Publication: 2002
ISBN:1-58113-561-0
Authors
Victor Lavrenko  Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, MA
Martin Choquette  Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, MA
W. Bruce Croft  Center for Intelligent Information Retrieval, University of Massachusetts, Amherst, MA
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 11,   Downloads (12 Months): 93,   Citation Count: 32
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564376.564408
What is a DOI?

ABSTRACT

We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the source language. The model integrates popular techniques of disambiguation and query expansion in a unified formal framework. We describe how the topic model can be estimated with either a parallel corpus or a dictionary. We test the framework by constructing Chinese topic models from English queries and using them in the CLIR task of TREC9. The model achieves performance around 95% of the strong mono-lingual baseline in terms of average precision. In initial precision, our model outperforms the mono-lingual baseline by 20%. The main contribution of this work is the unified formal model which integrates techniques that are essential for effective Cross-Language Retrieval.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, J. Callan, W. B. Croft, L. A. Ballesteros, D. Byrd, R. Swan, and J. Xu. INQUERY does battle with TREC-6. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Sixth Text REtrieval Conference (TREC-6), pages 169--206, Gaithersburg, MD, November 1997. National Institute of Standards and Technology (NIST) and Defense Advanced Research Projects Agency (DARPA), Department of Commerce, National Institute of Standards and Technology.
2
3
4
 
5
 
6
W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors. Proceedings of the Twenty-Fourth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, September 2001. ACM Press.
 
7
J. Gao, J.-Y. Nie, J. Zhang, E. Xun, Y. Su, M. Zhou, and C. Huang. TREC-9 CLIR experiments at MSRCN. In Voorhees and Harman {17}, pages 343--354.
8
 
9
 
10
D. Hiemstra, F. de Jong, and W. Kraaij. A domain specific lexicon acquisition tool for cross-language information retrieval. In L. Devroye and C. Chrisment, editors, Proceedings of the Fifth RIAO International Conference, pages 255--270, Montréal, Canada, 1997. Centre de Hautes Études Internationales d'Informatique Documentaire (C.I.D).
11
12
 
13
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, July 1980.
 
14
S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33:294--304, 1977. Reprinted in {16}.
15
 
16
 
17
E. M. Voorhees and D. K. Harman, editors. Proceedings of the Ninth Text REtrieval Conference (TREC-9), Gaithersburg, MD, November 2000. Department of Commerce, National Institute of Standards and Technology.
 
18
J. Xu and R. Weischedel. TREC-9 cross-lingual retrieval at BBN. In Voorhees and Harman {17}, pages 106--116.
19

CITED BY  32

Collaborative Colleagues:
Victor Lavrenko: colleagues
Martin Choquette: colleagues
W. Bruce Croft: colleagues