| Named entity transliteration with comparable corpora |
| Full text |
Publisher Site
,
Pdf
(228 KB)
|
| Source
|
Annual Meeting of the ACL
archive
Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
table of contents
Sydney, Australia
Pages: 73 - 80
Year of Publication: 2006
|
|
Authors
|
|
Richard Sproat
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
Tao Tao
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
ChengXiang Zhai
|
University of Illinois at Urbana-Champaign, Urbana, IL
|
|
| Publisher |
Association for Computational Linguistics
Morristown, NJ, USA
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 49, Citation Count: 10
|
|
|
ABSTRACT
In this paper we investigate Chinese-English name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics --- and therefore share references to named entities --- but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs. This propagation method achieves further improvement over the best results from the previous step.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
| |
3
|
A. Carlson, C. Cumby, J. Rosen, and D. Roth. 1999. The SNoW learning architecture. Technical Report UIUCDCS-R-99-2101, UIUC CS Dept.
|
| |
4
|
Martin Franz, J. Scott McCarley, and Salim Roukos. 1998. Ad hoc and multilingual information retrieval at IBM. In Text REtrieval Conference, pages 104--115.
|
| |
5
|
|
| |
6
|
W. Gao, K.-F. Wong, and W. Lam. 2004. Phoneme-based transliteration of foreign names for OOV problem. In IJCNLP, pages 374--381, Sanya, Hainan.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
J. Kruskal. 1999. An overview of sequence comparison. In D. Sankoff and J. Kruskal, editors, Time Warps, String Edits, and Macromolecules, chapter 1, pages 1--44. CSLI, 2nd edition.
|
| |
11
|
X. Li, P. Morie, and D. Roth. 2004. Robust reading: Identification and tracing of ambiguous names. In NAACL-2004.
|
| |
12
|
J. lin. 1991. Divergence measures based on the shannon entropy. IEEE Transactions on Information Theory, 37(1):145--151.
|
| |
13
|
H. Masuichi, R. Flournoy, S. Kaufmann, and S. Peters. 2000. A bootstrapping method for extracting bilingual text pairs.
|
| |
14
|
H. M. Meng, W. K Lo, B. Chen, and K. Tang. 2001. Generating phonetic cognates to handle named entities in English-Chinese cross-languge spoken document retrieval. In Proceedings of the Automatic Speech Recognition and Understanding Workshop.
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
|
 |
20
|
|
| |
21
|
Tao Tao, Su-Youn Yoon, Andrew Fister, Richard Sproat, and ChengXiang Zhai. 2006. Unsupervised named entity transliteration using temporal and phonetic correlation. In EMNLP 2006, Sydney, July.
|
| |
22
|
P. Taylor, A. Black, and R. Caley. 1998. The architecture of the Festival speech synthesis system. In Proceedings of the Third ESCA Workshop on Speech Synthesis, pages 147--151, Jenolan Caves, Australia.
|
 |
23
|
|
CITED BY 10
|
|
|
|
|
|
|
|
Xuanhui Wang , ChengXiang Zhai , Xiao Hu , Richard Sproat, Mining correlated bursty topic patterns from coordinated text streams, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Tao Tao , Su-Youn Yoon , Andrew Fister , Richard Sproat , ChengXiang Zhai, Unsupervised named entity transliteration using temporal and phonetic correlation, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, July 22-23, 2006, Sydney, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mehdi M. Kashani , Eric Joanis , Roland Kuhn , George Foster , Fred Popowich, Integration of an Arabic transliteration module into a statistical machine translation system, Proceedings of the Second Workshop on Statistical Machine Translation, p.17-24, June 23-23, 2007, Prague, Czech Republic
|
|