| Extracting sentence segments for text summarization: a machine learning approach |
| Full text |
Pdf
(945 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Athens, Greece
Pages: 152 - 159
Year of Publication: 2000
ISBN:1-58113-226-3
|
|
Authors
|
|
Wesley T. Chuang
|
Computer Science Department, UCLA, Los Angeles, CA and HRL Laboratories, LLC, 3011 Malibu Canyon Road, Malibu, CA
|
|
Jihoon Yang
|
HRL Laboratories, LLC, 3011 Malibu Canyon Road, Malibu, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 100, Citation Count: 15
|
|
|
ABSTRACT
With the proliferation of the Internet and the huge amount of data it transfers, text summarization is becoming more important. We present an approach to the design of an automatic text summarizer that generates a summary by extracting sentence segments. First, sentences are broken into segments by special cue markers. Each segment is represented by a set of predefined features (e.g. location of the segment, average term frequencies of the words occurring in the segment, number of title words in the segment, and the like). Then a supervised learning algorithm is used to train the summarizer to extract important sentence segments, based on the feature vector. Results of experiments on U.S. patents indicate that the performance of the proposed approach compares very favorably with other approaches (including Microsoft Word summarizer) in terms of precision, recall, and classification accuracy.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
|
 |
3
|
Julian Kupiec , Jan Pedersen , Francine Chen, A trainable document summarizer, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.68-73, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215333]
|
| |
4
|
H. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159-165, 1958.
|
| |
5
|
W. Mann and S. Thompson. Rhetorical structure theory: Toward a functional theory of text. Text, 8(3):243-281, 1988.
|
| |
6
|
|
| |
7
|
|
| |
8
|
T. Nguyen and V. Srinivasan. Accessing a relational database over the internet using macro language files, 1998. http://www.uspto.gov/.
|
| |
9
|
|
| |
10
|
|
| |
11
|
S. Teufel and M. Moens. Sentence extraction and rhetorical classification for flexible abstracts. In D. Radev and E. Hovy, editors, Intelligent Text Summarization, AAAI Spring Symposium, pages 16-25. AAAI Press, Menlo Park, CA, 1998.
|
| |
12
|
J. Yang, R. Parekh, and V. Honavar. DistAl: An inter-pattern distance-based constructive learning algorithm. Intelligent Data Analysis, 3:55- 73, 1999.
|
CITED BY 15
|
|
|
|
|
|
|
|
|
|
|
Dou Shen , Zheng Chen , Qiang Yang , Hua-Jun Zeng , Benyu Zhang , Yuchang Lu , Wei-Ying Ma, Web-page classification through summarization, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Ian Ruthven , Mark Baillie , Leif Azzopardi , Ralf Bierig , Emma Nicol , Simon Sweeney , Murat Yaciki, Contextual factors affecting the utility of surrogates within exploratory search, Information Processing and Management: an International Journal, v.44 n.2, p.437-462, March, 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|