|
ABSTRACT
We describe a set of experiments using machine learning techniques for the task of extractive summarisation. The research is part of a summarisation project for which we use a corpus of judgments of the UK House of Lords. We present classification results for naïve Bayes and maximum entropy and we explore methods for scoring the summary-worthiness of a sentence. We present sample output from the system, illustrating the utility of rhetorical status information, which provides a means for structuring summaries and tailoring them to different types of users.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Aone, M. E. Okurowski, J. Gorlinsky, and B. Larsen. A trainable summarizer with knowledge acquired from robust NLP techniques. In I. Mani and M. T. Maybury, editors, Advances in Automatic Text Summarization, pages 71--80. MIT Press, Cambridge, Massechusetts, 1999.
|
| |
2
|
M. Banko, V. Mittal, M. Kantrowitz, and J. Goldstein. Generating extraction-based summaries from hand-written summaries by aligning text spans. In Proceedings of the Pacific Association for Computational Linguistics, 1999.
|
| |
3
|
J. Carletta, S. Evert, U. Heid, J. Kilgour, J. Robertson, and H. Voormann. The nite xml toolkit: flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, and Computers, special issue on Measuring Behavior, 35(3), 2003.
|
| |
4
|
U. Fayyad and K. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1993.
|
| |
5
|
C. Grover, B. Hachey, and I. Hughson. The HOLJ corpus: supporting summarisation of legal texts. In Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora, Geneva, Switzerland, 2004.
|
 |
6
|
|
| |
7
|
B. Hachey and C. Grover. A rhetorical status classifier for legal text summarisation. In Proceedings of the ACL-2004 Text Summarization Branches Out Workshop, 2004.
|
 |
8
|
|
 |
9
|
|
| |
10
|
G. H. John and P. Langley. Esitmating continuous distributions in bayesian classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 1995.
|
 |
11
|
Julian Kupiec , Jan Pedersen , Francine Chen, A trainable document summarizer, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.68-73, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215333]
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
S. Teufel and M. Moens. Sentence extraction as a classification task. In ACL-1997 Workshop on Intelligent and Scalable Text Summarization, 1997.
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
|