ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Named entity transliteration with comparable corpora
Full text Publisher SitePublisher Site PdfPdf (228 KB)
Source Annual Meeting of the ACL archive
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics table of contents
Sydney, Australia
Pages: 73 - 80  
Year of Publication: 2006
Authors
Richard Sproat  University of Illinois at Urbana-Champaign, Urbana, IL
Tao Tao  University of Illinois at Urbana-Champaign, Urbana, IL
ChengXiang Zhai  University of Illinois at Urbana-Champaign, Urbana, IL
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 49,   Citation Count: 10
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/1220175.1220185

ABSTRACT

In this paper we investigate Chinese-English name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics --- and therefore share references to named entities --- but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs. This propagation method achieves further improvement over the best results from the previous step.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
A. Carlson, C. Cumby, J. Rosen, and D. Roth. 1999. The SNoW learning architecture. Technical Report UIUCDCS-R-99-2101, UIUC CS Dept.
 
4
Martin Franz, J. Scott McCarley, and Salim Roukos. 1998. Ad hoc and multilingual information retrieval at IBM. In Text REtrieval Conference, pages 104--115.
 
5
 
6
W. Gao, K.-F. Wong, and W. Lam. 2004. Phoneme-based transliteration of foreign names for OOV problem. In IJCNLP, pages 374--381, Sanya, Hainan.
 
7
 
8
 
9
 
10
J. Kruskal. 1999. An overview of sequence comparison. In D. Sankoff and J. Kruskal, editors, Time Warps, String Edits, and Macromolecules, chapter 1, pages 1--44. CSLI, 2nd edition.
 
11
X. Li, P. Morie, and D. Roth. 2004. Robust reading: Identification and tracing of ambiguous names. In NAACL-2004.
 
12
J. lin. 1991. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151.
 
13
H. Masuichi, R. Flournoy, S. Kaufmann, and S. Peters. 2000. A bootstrapping method for extracting bilingual text pairs.
 
14
H. M. Meng, W. K Lo, B. Chen, and K. Tang. 2001. Generating phonetic cognates to handle named entities in English-Chinese cross-languge spoken document retrieval. In Proceedings of the Automatic Speech Recognition and Understanding Workshop.
 
15
 
16
 
17
 
18
 
19
20
 
21
Tao Tao, Su-Youn Yoon, Andrew Fister, Richard Sproat, and ChengXiang Zhai. 2006. Unsupervised named entity transliteration using temporal and phonetic correlation. In EMNLP 2006, Sydney, July.
 
22
P. Taylor, A. Black, and R. Caley. 1998. The architecture of the Festival speech synthesis system. In Proceedings of the Third ESCA Workshop on Speech Synthesis, pages 147--151, Jenolan Caves, Australia.
23

CITED BY  10
Collaborative Colleagues:
Richard Sproat: colleagues
Tao Tao: colleagues
ChengXiang Zhai: colleagues