ACM Home Page
Please provide us with feedback. Feedback
The LinGO Redwoods treebank motivation and preliminary applications
Full text PdfPdf (104 KB)
Source International Conference On Computational Linguistics archive
Proceedings of the 19th international conference on Computational linguistics - Volume 2 table of contents
Taipei, Taiwan
Pages: 1 - 5  
Year of Publication: 2002
Authors
Sponsor
: ACL
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 17,   Citation Count: 13
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/1071884.1071909

ABSTRACT

The LinGO Redwoods initiative is a seed activity in the design and development of a new type of treebank. While several medium- to large-scale treebanks exist for English (and for other major languages), pre-existing publicly available resources exhibit the following limitations: (i) annotation is mono-stratal, either encoding topological (phrase structure) or tectogrammatical (dependency) information, (ii) the depth of linguistic information recorded is comparatively shallow, (iii) the design and format of linguistic representation in the treebank hard-wires a small, predefined range of ways in which information can be extracted from the treebank, and (iv) representations in existing treebanks are static and over the (often year- or decade-long) evolution of a large-scale treebank tend to fall behind the development of the field. LinGO Redwoods aims at the development of a novel treebanking methodology, rich in nature and dynamic both in the ways linguistic data can be retrieved from the treebank in varying granularity and in the constant evolution and regular updating of the treebank itself. Since October 2001, the project is working to build the foundations for this new type of treebank, to develop a basic set of tools for treebank construction and maintenance, and to construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Agresti, A. (1990). Categorical data analysis. John Wiley & Sons.
 
3
Bouma, G., Noord, G. van, & Malouf, R. (2001). Alpino. Wide-coverage computational analysis of Dutch. In W. Daelemans, K. Sima-an, J. Veenstra, & J. Zavrel (Eds.), Computational linguistics in the Netherlands (pp. 45--59). Amsterdam, The Netherlands: Rodopi.
 
4
Carter, D. (1997). The TreeBanker. A tool for supervised training of parsed corpora. In Proceedings of the Workshop on Computational Environments for Grammar Development and Linguistic Engineering. Madrid, Spain.
 
5
Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (pp. 598--603). Providence, RI.
 
6
 
7
 
8
Copestake, A. (2002). Implementing typed feature structure grammars. Stanford, CA: CSLI Publications.
 
9
 
10
Dipper, S. (2000). Grammar-based corpus annotation. In Workshop on linguistically interpreted corpora LINC-2000 (pp. 56--64). Luxembourg.
 
11
 
12
Harris, T. E. (1963). The theory of branching processes. Berlin, Germany: Springer.
 
13
 
14
 
15
Mullen. T., Malouf, R., & Noord, G. van. (2001). Statistical parsing of Dutch using Maximum Entropy models with feature merging. In Proceedings of the Natural Language Processing Pacific Rim Symposium. Tokyo, Japan.
 
16
Oepen, S., & Callmeier, U. (2000). Measure for measure: Parser cross-fertilization. Towards increased component comparability and exchange. In Proceedings of the 6th International Workshop on Parsing Technologies (pp. 183--194). Trento, Italy.
 
17
 
18
Wahlster, W. (Ed.) (2000). Verbmobil. Foundations of speech-to-speech translation. Berlin, Germany: Springer.

CITED BY  13
 
 
 
 
 
 
 
 
 
 
 
 
 
Collaborative Colleagues:
Stephan Oepen: colleagues
Kristina Toutanova: colleagues
Stuart Shieber: colleagues
Christopher Manning: colleagues
Dan Flickinger: colleagues
Thorsten Brants: colleagues