ACM Home Page
Please provide us with feedback. Feedback
Simultaneous multilingual search for translingual information retrieval
Full text PdfPdf (359 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: IR: multilingual & multimedia table of contents
Pages 719-728  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Kristen Parton  Columbia University, New York, NY, USA
Kathleen R. McKeown  Columbia University, New York, NY, USA
James Allan  University of Massachusetts Amherst, Amherst, MA, USA
Enrique Henestroza  Columbia University, New York, NY, USA
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 147,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458179
What is a DOI?

ABSTRACT

We consider the problem of translingual information retrieval, where monolingual searchers issue queries in a different language than the document language(s) and the results must be returned in the language they know, the query language. We present a framework for translingual IR that integrates document translation and query translation into the retrieval model. The corpus is represented as an aligned, jointly indexed "pseudo-parallel" corpus, where each document contains the text of the document along with its translation into the query language. The queries are formulated as multilingual structured queries, where each query term and its translations into the document language(s) are treated as synonym sets. This model leverages simultaneous search in multiple languages against jointly indexed documents to improve the accuracy of results over search using document translation or query translation alone. For query translation, we compared a statistical machine translation (SMT) approach to a dictionary-based approach. We found that using a Wikipedia-derived dictionary for named entities combined with an SMT-based dictionary worked better than SMT alone. Simultaneous multilingual search also has other important features suited to translingual search, since it can provide an indication of poor document translation when a match with the source document is found. We show how close integration of CLIR and SMT allows us to improve result translation in addition to IR results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Chen, A. and F. Gey: Combining Query Translation and Document Translation in Cross-Language Retrieval. CLEF 2003: 108--121.
2
 
3
Ferrández, S., Toral, A., Ferrández, O., Ferrández, A., Muñoz, R. Applying Wikipedia's Multilingual Knowledge to Cross-Lingual Question Answering. In Proceedings of the 12th International Conference on applications of Natural Language to Information Systems. Paris (France). pp. 352--363. June 2007.
 
4
Habash, N. and Ghoneim, Personal communication, 2007.
5
 
6
Ji, H. and R. Grishman. Collaborative Entity Extraction and Translation. Proc. International Conference on Recent Advances in Natural Language Processing 2007. Borovets, Bulgaria. Sept 2007.
7
 
8
Kraaij, W., Variations on Language Modeling on Information Retrieval, Ph.D. thesis, University of Twente, 2004.
9
10
 
11
 
12
 
13
Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. "Introduction to WordNet: an on-line lexical database." In: International Journal of Lexicography 3 (4), 1990, pp. 235 -- 244.
 
14
Nie, J-Y. and Jin, F. A Multilingual Approach to Multilingual Information Retrieval. Proceedings of the Cross-Language Evaluation Forum, 2003.
 
15
 
16
 
17
18
19
 
20
Wikipedia. "Wikipedia: Multilingual Statistics.", January, 2008.http://en.wikipedia.org/wiki/Wikipedia:Multilingual_statistics
 
21
Xu, J. and R. Weischedel, "TREC-9 Cross-lingual Retrieval at BBN," In Proceedings of The Ninth Text Retrieval Conference, National Institutes of Standards and Technology, 2000.

Collaborative Colleagues:
Kristen Parton: colleagues
Kathleen R. McKeown: colleagues
James Allan: colleagues
Enrique Henestroza: colleagues