ACM Home Page
Please provide us with feedback. Feedback
Using Short Dependency Relations from Auto-Parsed Data for Chinese Dependency Parsing
Full text PdfPdf (631 KB)
Source
ACM Transactions on Asian Language Information Processing (TALIP) archive
Volume 8 ,  Issue 3  (August 2009) table of contents
Article No. 10  
Year of Publication: 2009
ISSN:1530-0226
Authors
Wenliang Chen  National Institute of Information and Communications Technology
Daisuke Kawahara  National Institute of Information and Communications Technology
Kiyotaka Uchimoto  National Institute of Information and Communications Technology
Yujie Zhang  National Institute of Information and Communications Technology
Hitoshi Isahara  National Institute of Information and Communications Technology
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 35,   Downloads (12 Months): 71,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1568292.1568293
What is a DOI?

ABSTRACT

Dependency parsing has become increasingly popular for a surge of interest lately for applications such as machine translation and question answering. Currently, several supervised learning methods can be used for training high-performance dependency parsers if sufficient labeled data are available.

However, currently used statistical dependency parsers provide poor results for words separated by long distances. In order to solve this problem, this article presents an effective dependency parsing approach of incorporating short dependency information from unlabeled data. The unlabeled data is automatically parsed by using a deterministic dependency parser, which exhibits a relatively high performance for short dependencies between words. We then train another parser that uses the information on short dependency relations extracted from the output of the first parser. The proposed approach achieves an unlabeled attachment score of 86.52%, an absolute 1.24% improvement over the baseline system on the Chinese Treebank data set. The results indicate that the proposed approach improves the parsing performance for longer distance words.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Brants, T. 2000. TnT--A statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP’00). 224--231.
 
2
Buchholz, S. and Marsi, E. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Conference on Natural Language Learning (CoNLL’06).
 
3
Chang, C. and Lin, C. 2001. LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.
 
4
Cheng, Y., Asahara, M., and Matsumoto, Y. 2005a. Chinese deterministic dependency analyzer: Examining effects of global features and root node finder. In Proceedings of the Association of Computer Linguistics Special Interest Group on Chinese Language Processing (ACL-SIGHAN’05).
 
5
Cheng, Y., Asahara, M., and Matsumoto, Y. 2005b. Machine learning-based dependency analyzer for Chinese. J. Chinese Lang. Comput. 13--24.
 
6
Cui, H., Sun, R., Li, K., Kan, M.-Y., and Chua, T.-S. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International Conference on Research and Development in Information Retrieval (SIGIR’05). 400--407.
 
7
Ding, Y. and Palmer, M. 2005. Machine translation using probabilistic synchronous dependency insertion grammars. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL’05). 541--548.
 
8
Hall, J., Nilsson, J., Nivre, J., Eryigit, G., Megyesi, B., Nilsson, M., and Saers, M. 2007. Single malt or blended? A study in multilingual parser optimization. In Proceedings of the Conference on Natural Language Learning Shared Task Session of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL’07). 933--939.
 
9
Kawahara, D. and Kurohashi, S. 2006. A fully-lexicalized probabilistic model for Japanese syntactic and case structure analysis. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT’06). 176--183.
 
10
McClosky, D., Charniak, E., and Johnson, M. 2006. Reranking and self-training for parser adaptation. In Proceedings of the International Conference on Computer Linguistics (COLING’06). 337--344.
 
11
McDonald, R., Lerman, K., and Pereira, F. 2006. Multilingual dependency analysis with a two-stage discriminative parser. In Proceedings of the Conference on Natural Language Learning (CoNLL’06).
 
12
McDonald, R. and Nivre, J. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of the Conference on Natural Language Learning Shared Task Session of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL’07). 122--131.
 
13
Nakagawa, T. and Uchimoto, K. 2007. A hybrid approach to word segmentation and pos tagging. In Proceedings of the Association of Computer Learning (ACL’07). 217--220.
 
14
Nivre, J. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the International Conference on Parsing Technologies (IWPT’03). 149--160.
 
15
Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., and Yuret, D. 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the Conference on Natural Language Learning Shared Task Session of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL’07). 915--932.
 
16
Nivre, J., Hall, J., Nilsson, J., Eryigit, G., and Marinov, S. 2006. Labeled pseudo-projective dependency parsing with support vector machines. In Proceedings of the Conference on Natural Language Learning (CoNLL’06).
 
17
Nivre, J. and Kubler, S. 2006. Dependency parsing: Tutorial at COLING-ACL 2006. In Proceedings of the International Conference on Computer Linguistics (COLING’06).
 
18
Reichart, R. and Rappoport, A. 2007. Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets. In Proceedings of the Association for Computer Learning (ACL’07).
 
19
Sagae, K. and Tsujii, J. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. In Proceedings of the Conference on Natural Language Learning Shared Task Session of the Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL’07). 1044--1050.
 
20
Smith, N. A. and Eisner, J. 2006. Annealing structural bias in multilingual weighted grammar induction. In Proceedings of the International Conference on Computer Linguistics (COLING’06).
 
21
Steedman, M., Osborne, M., Sarkar, A., Clark, S., Hwa, R., Hockenmaier, J., Ruhlen, P., Baker, S., and Crim, J. 2003. Bootstrapping statistical parsers from small datasets. http://www.cs.pitt.edu/~hwa/eaclo3.ps.
 
22
Wang, M., Sagae, K., and Mitamura, T. 2006. A fast, accurate deterministic parser for Chinese. In Proceedings of the International Conference on Computer Linguistics (COLING’06).
 
23
Wang, Q. I., Lin, D., and Schuurmans, D. 2007. Simple training of dependency parsers via structured boosting. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’07).
 
24
Wang, Q. I., Schuurmans, D., and Lin, D. 2005. Strictly lexical dependency parsing. In Proceedings of the International Conference on Parsing Technologies (IWPT’05).
 
25
Yamada, H. and Matsumoto, Y. 2003. Statistical dependency analysis with support vector machines. In Proceedings of the International Conference on Parsing Technologies (IWPT’03). 195--206.
 
26
Yu, K., Kurohashi, S., and Liu, H. 2007. A three-step deterministic parser for Chinese dependency parsing. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (HLT’07). 201--204.