|
ABSTRACT
The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium- to large-scale treebanks exist for English (and for other major languages), pre-existing publicly available resources exhibit the following limitations: (i) annotation is mono-stratal, either encoding topological (phrase structure) or tectogrammatical (dependency) information, (ii) the depth of linguistic information recorded is comparatively shallow, (iii) the design and format of linguistic representation in the treebank hard-wires a small, predefined range of ways in which information can be extracted from the treebank, and (iv) representations in existing treebanks are static and over the (often year- or decade-long) evolution of a large-scale treebank tend to fall behind the development of the field. LinGO Redwoods aims at the development of a novel treebanking methodology, rich in nature and dynamic both in the ways linguistic data can be retrieved from the treebank in varying granularity and in the constant evolution and regular updating of the treebank itself. Since October 2001, the project is working to build the foundations for this new type of treebank, to develop a basic set of tools for treebank construction and maintenance, and to construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Agresti, A. (1990). Categorical data analysis. John Wiley & Sons.
|
| |
3
|
Bouma, G., Noord, G. van, & Malouf, R. (2001). Alpino. Wide-coverage computational analysis of Dutch. In W. Daelemans, K. Sima-an, J. Veenstra, & J. Zavrel (Eds.), Computational linguistics in the Netherlands (pp. 45--59). Amsterdam, The Netherlands: Rodopi.
|
| |
4
|
Carter, D. (1997). The TreeBanker. A tool for supervised training of parsed corpora. In Proceedings of the Workshop on Computational Environments for Grammar Development and Linguistic Engineering. Madrid, Spain.
|
| |
5
|
Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (pp. 598--603). Providence, RI.
|
| |
6
|
|
| |
7
|
|
| |
8
|
Copestake, A. (2002). Implementing typed feature structure grammars. Stanford, CA: CSLI Publications.
|
| |
9
|
|
| |
10
|
Dipper, S. (2000). Grammar-based corpus annotation. In Workshop on linguistically interpreted corpora LINC-2000 (pp. 56--64). Luxembourg.
|
| |
11
|
|
| |
12
|
Harris, T. E. (1963). The theory of branching processes. Berlin, Germany: Springer.
|
| |
13
|
Mark Johnson , Stuart Geman , Stephen Canon , Zhiyi Chi , Stefan Riezler, Estimators for stochastic "Unification-Based" grammars, Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, p.535-541, June 20-26, 1999, College Park, Maryland
[doi> 10.3115/1034678.1034758]
|
| |
14
|
|
| |
15
|
Mullen. T., Malouf, R., & Noord, G. van. (2001). Statistical parsing of Dutch using Maximum Entropy models with feature merging. In Proceedings of the Natural Language Processing Pacific Rim Symposium. Tokyo, Japan.
|
| |
16
|
Oepen, S., & Callmeier, U. (2000). Measure for measure: Parser cross-fertilization. Towards increased component comparability and exchange. In Proceedings of the 6th International Workshop on Parsing Technologies (pp. 183--194). Trento, Italy.
|
| |
17
|
|
| |
18
|
Wahlster, W. (Ed.) (2000). Verbmobil. Foundations of speech-to-speech translation. Berlin, Germany: Springer.
|
CITED BY 13
|
|
Sanae Fujita , Takaaki Tanaka , Francis Bond , Hiromi Nakaiwa, An implemented description of Japanese: the Lexeed dictionary and the Hinoki treebank, Proceedings of the COLING/ACL on Interactive presentation sessions, p.65-68, July 17-18, 2006, Sydney, Australia
|
|
|
Ruth Fuchss , Alexander Koller , Joachim Niehren , Stefan Thater, Minimal recursion semantics as dominance constraints: translation, evaluation, and analysis, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p.247-es, July 21-26, 2004, Barcelona, Spain
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Takaaki Tanaka , Francis Bond , Stephan Oepen , Sanae Fujita, High precision treebanking: blazing useful trees using POS information, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, p.330-337, June 25-30, 2005, Ann Arbor, Michigan
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|