ACM Home Page
Please provide us with feedback. Feedback
A novel statistical chinese language model and its application in pinyin-to-character conversion
Full text PdfPdf (108 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
POSTER SESSION: Poster session 2/information retrieval table of contents
Pages 1433-1434  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Bo Lin  Nanyang Technological University, Singapore, Singapore
Jun Zhang  Nanyang Technological University, Singapore, Singapore
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 44,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458318
What is a DOI?

ABSTRACT

In this paper, we present a novel Chinese language model, and study its applications, in particular in Chinese pinyin-to-character conversion. In the new model, each word is associated with supporting context constructed by mining the frequent sets of nearby phrases and their distances to the word. Such information was usually overlooked in previous n-gram model and its variants. We apply the model to Chinese pinyin-to-character conversion and find that it offers a better solution to Chinese input. The model has lower perplexity in our evaluation and higher prediction accuracy than the state-of-the-art n-gram Markov model for Chinese language.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Intelligent Pinyin Input Method Editor Demo Website, http://www.cais.ntu.edu.sg/~jzhang/pinyin/index_en.html.
2
 
3
 
4
H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for effective classification. In ICDE, pages 716--725, 2007.
5
 
6
 
7