ACM Home Page
Please provide us with feedback. Feedback
Adapting pivoted document-length normalization for query size: Experiments in Chinese and English
Full text PdfPdf (394 KB)
Source ACM Transactions on Asian Language Information Processing (TALIP) archive
Volume 5 ,  Issue 3  (September 2006) table of contents
Pages: 245 - 263  
Year of Publication: 2006
ISSN:1530-0226
Authors
Tze Leung Chung  The Hong Kong Polytechnic University
Robert Wing Pong Luk  The Hong Kong Polytechnic University
Kam Fai Wong  The Chinese University of Hong Kong
Kui Lam Kwok  Queens College, City University of New York
Dik Lun Lee  The Hong Kong University of Science and Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 50,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1194936.1194941
What is a DOI?

ABSTRACT

The vector space model (VSM) is one of the most widely used information retrieval (IR) models in both academia and industry. It was less effective at the Chinese ad hoc retrieval tasks than other retrieval models in the NTCIR-3 evaluation workshop, but comparable to those in the NTCIR-4 and NTCIR-5 workshops. We do not know whether the lower level performance was due to the VSM's inherent deficiencies or to a less effective normalization of document length. Hence we evaluated the VSM with various pivoted normalizations of document length using the NTCIR-3 collection for confirmation. We found that VSM's retrieval effectiveness with pivoted normalization was comparable to other competitive retrieval models (for example, 2-Poisson), and that VSM's retrieval speed with pivoted normalization was similar to competitive retrieval models (2-Poisson). We proposed a novel adaptive scheme that automatically estimates the (near) best parameters for pivoted document-length normalization based on query size; the new normalization is called adaptive pivoted document-length normalization. This scheme achieved good retrieval effectiveness, sometimes for short (title) queries and sometimes for long queries, without manually adjusting parameter values. We found that unique, adaptive pivoted normalization can enhance fixed pivoted normalizations for different test collections (TREC-5 and TREC-6). We also evaluated the VSM with the adaptive pivoted normalization using the pseudo-relevance feedback (PRF) and found that this type of VSM performs similarly to the competitive retrieval models (2-Poisson) with PRF. Hence, we conclude that the VSM with unique (adaptive) pivoted document-length normalization is effective for Chinese IR and that its retrieval effectiveness is comparable to that of other competitive retrieval models with or without PRF for the reference test collections used in this evaluation.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Abdou, S. and Savoy, J. 2005. Report on CLIR task for the NTCIR-5 evaluation campaign. In Proceedings of the Fifth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access (National Center of Sciences, Tokyo, Dec.), N. Kando and M. Takaku, eds. Nihon Printing, Tokyo, 44--51.
 
2
Allan, J., Callan, J., Croft, B.W., Ballesteros, L., Byrd, D., and Xu, J. 1997. Inquery does battle with TREC-6. In Proceedings of the Sixth Text Retrieval Conference (Gaithersburg, MD, Nov.), E.M. Voorhees and D.K. Harman, eds. National Institute of Standards and Technology, 169--206.
3
 
4
Buckley, C., Singhal, A., and Mitra, M. 1996. Using query zoning and correlation within SMART: TREC-5. In Proceedings of the TREC-5 Conference (Gaithersburg, MD, Nov.), E.M. Voorhees and D.K. Harman,eds. National Institute of Standards and Technology, 105--118.
 
5
Chen, K.J. and Huang, C.R., Eds. 1993. Chinese word class analysis. Tech. Rep. 93-05, Institute of Information Science, Academia Sinica, Taiwan.
6
 
7
Cooper, W.S., Chen, A., and Gey, F.C. 1993. Full text retrieval based on probabilistic equations with coefficients fitted by logistic regression. In Proceedings of the Second Text Retrieval Conference (Gaithersburg, MD, Nov.), D.K. Harman, ed. National Institute of Standards and Technology, 57--66.
8
 
9
Juang, D-W. and Tseng, Y.H. 2003. Uniform indexing and retrieval scheme for Chinese, Japanese and Korean. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (National Center of Sciences, Tokyo, Sept.--Oct.), K. Oyama et al., eds., Nihon Printing, Tokyo, 132--140.
 
10
 
11
Kit, C., Liu, Y., and Liang, N. 1989. On methods of Chinese automatic word segmentation. J. Chinese Inf. Process. 3, 1, 13--20.
12
 
13
 
14
Luk, R.W.P. 2003. Different retrieval models and hybrid term indexing. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (National Center of Sciences, Tokyo, Sept.--Oct.), K. Oyama et al., eds. Nihon Printing, Tokyo, 91--100.
15
 
16
Luk, R.W.P. and Wong, K.F. 2003. Hybrid Chinese term indexing and the 2-Poisson model. IEICE Trans. Inf. Syst. E86-D, 9, 1745--1752.
 
17
Nie, J-Y., Chevallet, J-P., and Bruandet, M-F. 1997. Between terms and words for European IR and between words and bigrams for Chinese IR. In Proceedings of the Sixth Text Retrieval Conference (Gaithersburg, MD, Nov.), E.M. Voorhees and D.K. Harman, eds. National Institute of Standards and Technology, 697--710.
18
 
19
Nie, J.-Y. and Ren, F. 1997. Chinese information retrieval: using characters or words. Inf. Process. Manage. 35, 4, 443--462.
 
20
Robertson, S.E., Walker, S., Jones, S., Hancock-Beualiue. M., and Gatford, M. 1994. Okapi at TREC-3. In Proceedings of the Third Text Retrieval Conference (Gaithersburg, MD, Nov.), D.K. Harman, ed., National Institute of Standards and Technology, 109--128.
 
21
 
22
23
 
24
Savoy, J. 2005b. Report on CLIR task for the NTCIR-4 evaluation campaign. In Proceedings of the Fourth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization (National Center of Sciences, Tokyo, June), N. Kando and H. Ishikawa, eds., National Institute of Informatics, Tokyo, 178--185.
25
 
26
Vines, P. and Zobel, J. 1999. Efficient building and querying Asian language document databases. In Proceedings of the Fourth International Workshop on Information Retrieval with Asian Languages (Taipei, Nov.), 118--125.
27
 
28
Voorhees, E.M. and Harman, D.K. 1997. Overview of the sixth text retrieval conference. In Proceedings of the Sixth Text Retrieval Conference (Gaithersburg, MD, Nov.), E.M. Voorhees and D. K. Harman, eds., National Institute of Standards and Technology, 1--24.
 
29
Yang, Y. and Ma, N. 2003. CMU in cross-lingual information retrieval at NTCIR-3. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (National Center of Sciences, Tokyo, Sept.--Oct.), K. Oyama et al., eds., Nihon Printing, Tokyo, 113--117.
 
30
Zhang, J., Sun, L., Qu W., Du, L., Sun, Y., Fan, Y., and Lin, Z. 2003. ISCAS at NTCIR-3: Monolingual, bilingual and multiLingual IR tasks. In Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering (National Center of Sciences, Tokyo, Sept.--Oct.), K. Oyama et al., eds., Nihon Printing, Tokyo, 118--125.


Collaborative Colleagues:
Tze Leung Chung: colleagues
Robert Wing Pong Luk: colleagues
Kam Fai Wong: colleagues
Kui Lam Kwok: colleagues
Dik Lun Lee: colleagues