ACM Home Page
Please provide us with feedback. Feedback
Towards effective strategies for monolingual and bilingual information retrieval: Lessons learned from NTCIR-4
Full text PdfPdf (503 KB)
Source ACM Transactions on Asian Language Information Processing (TALIP) archive
Volume 4 ,  Issue 2  (June 2005) table of contents
Pages: 78 - 110  
Year of Publication: 2005
ISSN:1530-0226
Authors
Yan Qu  Clairvoyance Corporation, Pittsburgh, PA
David A. Hull  Clairvoyance Corporation, Pittsburgh, PA
Gregory Grefenstette  Clairvoyance Corporation, Pittsburgh, PA
David A. Evans  Clairvoyance Corporation, Pittsburgh, PA
Motoko Ishikawa  Justsystem Corporation, Tokushima-city, Japan
Setsuko Nara  Justsystem Corporation, Tokushima-city, Japan
Toshiya Ueda  Justsystem Corporation, Tokushima-city, Japan
Daisuke Noda  Justsystem Corporation, Tokushima-city, Japan
Kousaku Arita  Justsystem Corporation, Tokushima-city, Japan
Yuki Funakoshi  Justsystem Corporation, Tokushima-city, Japan
Hiroshi Matsuda  Justsystem Corporation, Tokushima-city, Japan
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 82,   Citation Count: 0
Additional Information:

abstract   references   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1105696.1105698
What is a DOI?

ABSTRACT

At the NTCIR-4 workshop, Justsystem Corporation (JSC) and Clairvoyance Corporation (CC) collaborated in the cross-language retrieval task (CLIR). Our goal was to evaluate the performance and robustness of our recently developed commercial-grade CLIR systems for English and Asian languages. The main contribution of this article is the investigation of different strategies, their interactions in both monolingual and bilingual retrieval tasks, and their respective contributions to operational retrieval systems in the context of NTCIR-4. We report results of Japanese and English monolingual retrieval and results of Japanese-to-English bilingual retrieval. In monolingual retrieval analysis, we examine two special properties of the NTCIR experimental design (two levels of relevance and identical queries in multiple languages) and explore how they interact with strategies of our retrieval system, including pseudo-relevance feedback, multi-word term down-weighting, and term weight merging strategies. Our analysis shows that the choice of language (English or Japanese) does not have a significant impact on retrieval performance. Query expansion is slightly more effective with relaxed judgments than with rigid judgments. For better retrieval performance, weights of multi-word terms should be lowered. In the bilingual retrieval analysis, we aim to identify robust strategies that are effective when used alone and when used in combination with other strategies. We examine cross-lingual specific strategies such as translation disambiguation and translation structuring, as well as general strategies such as pseudo-relevance feedback and multi-word term down-weighting. For shorter title topics, pseudo-relevance feedback is a major performance enhancer, but translation structuring affects retrieval performance negatively when used alone or in combination with other strategies. All experimented strategies improve retrieval performance for the longer description topics, with pseudo-relevance feedback and translation structuring as the major contributors.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Allan, J., Connell, M. E., Croft, W. B., Feng, F., Fisher, D., and Li, X. 2000. Inquery and TREC-9. In Proceedings of the 9th Text REtrieval Conference (TREC 2000). National Institute of Standards and Technology (NIST), Gaithersburg, MD.
 
2
3
4
5
 
6
 
7
 
8
Fujita, S. 1999. Notes on phrasal indexing: JSCB evaluation experiments at NTCIR AD HOC. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. National Center for Science Information Systems (NACSIS), Tokyo, Japan.
 
9
Grefenstette, G. 1998. The problem of cross-language information retrieval. In Cross-Language Information Retrieval. G. Grefenstette (ed). Kluwer Academic, Boston, MA, 1--9.
10
 
11
Kando, N., Kuriyama, K., Nozue, T., Eguchi, K., Kato, H., and Hidaka, S. 1999. Overview of IR tasks at the first NTCIR workshop. In Proceedings of the First NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. National Center for Science Information Systems (NACSIS), Tokyo, Japan, 11--44.
 
12
Kishida, K., Chen, K., Lee, S., Kuriyama, K., Kando, N., Chen, H., Myaeng, S. H., and Eguchi, K. 2004. Overview of CLIR task at the fourth NTCIR workshop. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting. National Institute of Informatics, Tokyo, Japan, 1--60.
 
13
Kishida, K. and Kando, N. 2004. Two-stages refinement of query translation for pivot language approach to cross-lingual information retrieval. In Comparative Evaluation of Multilingual Information Access Systems, 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003 (Trondheim, Norway, Aug. 21-22, 2003). Revised selected papers. P. C. Gonzalo et al. (eds). Lecture Notes in Computer Science 3237, Springer, New York, 253--262.
14
 
15
16
 
17
Lindman, H. R. 1974. Analysis of Variances in Complex Experimental Designs. Freeman, New York.
 
18
Littman, M. L., Dumais, S. T., and Landauer, T. K. 1998. Automatic cross-language information retrieval using latent semantic indexing. In Cross-Language Information Retrieval. G. Grefenstette (ed). Kluwer Academic, Boston, MA, 51--62.
 
19
 
20
 
21
Nakagawa, T. and Kitamura, M. 2004. NTCIR-4 CLIR experiments at Oki. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting. National Institute of Informatics, Tokyo, 96--99.
 
22
Oyama, K., Ishida, E., and Kando, N. (Eds) 2003. NTCIR Workshop 3: Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering, National Institute of Informatics, Tokyo, Japan.
 
23
 
24
Oard, D. W. and Wang, J. 2001. NTCIR-2 experiments at Maryland: Comparing structured queries and balanced translation. In Proceedings of the 2nd NTCIR Workshop on Research in Chinese & Japanese Text Retrieval and Text Summarization. National Institute of Informatics, Tokyo, Japan.
25
 
26
 
27
Qu, Y., Grefenstette, G., and Evans, D. A. 2003. Resolving translation ambiguity using monolingual corpora. In Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum (CLEF 2002, Rome, Italy, Sept. 19-20, 2002), C. Peters et al. (eds). Lecture Notes in Computer Science 2785, Springer, New York, 223--241.
 
28
 
29
Savoy, J. 2004. Report on CLIR task for the NTCIR-4 evaluation campaign. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting. National Institute of Informatics, Tokyo, Japan, 178--192.
30
 
31
Sakai, T., Koyama, M., Kumano, A., and Manabe, T. 2004. Toshiba BIRDJE at NTCIR-4 CLIR: Monolingual/bilingual IR and flexible feedback. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting. National Institute of Informatics, Tokyo, Japan, 65--72.
32
 
33
Tong, X., Zhai, C., Milic-Frayling, N., and Evans, D. A. 1996. Experiments on Chinese text indexing---CLARIT TREC-5 Chinese track report. In Proceedings of the Fifth Text REtrieval Conference (TREC-5, Gaithersburg, MD). National Institute of Standards and Technology (NIST), Special Publication 500-238.
 
34
 
35
 
36
 
37
Yang, L., Ji, D., and Tang, L. 2004. Chinese information retrieval based on terms and ontology. In NTCIR-4 Workshop Meeting: Working Notes of the Fourth NTCIR Workshop Meeting, National Institute of Informatics, Tokyo, Japan, 136--142.


REVIEW

"Dagobert Soergel : Reviewer"

This substantial paper will be very useful for researchers working in automated information retrieval (IR), but not for a general audience. It describes, in great detail, techniques for both monolingual IR in English and Japanese, and Japanese-Eng  more...

Collaborative Colleagues:
Yan Qu: colleagues
David A. Hull: colleagues
Gregory Grefenstette: colleagues
David A. Evans: colleagues
Motoko Ishikawa: colleagues
Setsuko Nara: colleagues
Toshiya Ueda: colleagues
Daisuke Noda: colleagues
Kousaku Arita: colleagues
Yuki Funakoshi: colleagues
Hiroshi Matsuda: colleagues