ACM Home Page
Please provide us with feedback. Feedback
Crawling English-Japanese person-name transliterations from the web
Full text PdfPdf (595 KB)
Source
International World Wide Web Conference archive
Proceedings of the 18th international conference on World wide web table of contents
Madrid, Spain
POSTER SESSION: Thursday, April 23, 2009 table of contents
Pages 1151-1152  
Year of Publication: 2009
ISBN:978-1-60558-487-4
Author
Satoshi Sato  Nagoya University, Nagoya, Japan
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 53,   Citation Count: 0
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1526709.1526902
What is a DOI?

ABSTRACT

Automatic compilation of lexicon is a dream of lexicon compilers as well as lexicon users. This paper proposes a system that crawls English-Japanese person-name transliterations from the Web, which works a back-end collector for automatic compilation of bilingual person-name lexicon. Our crawler collected 561K transliterations in five months. From them, an English-Japanese person-name lexicon with 406K entries has been compiled by an automatic post processing. This lexicon is much larger than other similar resources including English-Japanese lexicon of HeiNER obtained from Wikipedia.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Kaide and S. Sato. A person-name classifier by using probability difference (in Japanese). In Proc. of NLP-09, 2009.
 
2
 
3
Y. Sakakibara and S. Sato. Automatic compilation of a bilingual person-name lexicon (in Japanese). In Proc. of NLP-07, pages 879--882, 2007.
 
4
W. Wentland, J. Knopp, C. Silberer, and M. Hartung. Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In Proc. of LREC-08, 2008.