|
ABSTRACT
TXTRACTOR is a tool that uses established sentence-selection heuristics to rank text segments, producing summaries that contain a user-defined number of sentences. The purpose of identifying text segments is to maximize topic diversity, which is an adaptation of the Maximal Marginal Relevance criterion used by Carbonell and Goldstein [5]. Sentence selection heuristics are then used to rank the segments. We hypothesize that ranking text segments via traditional sentence-selection heuristics produces a balanced summary with more useful information than one produced by using segmentation alone. The proposed summary is created in a three-step process, which includes 1) sentence evaluation 2) segment identification and 3) segment ranking. As the required length of the summary changes, low-ranking segments can then be dropped from (or higher ranking segments added to) the summary. We compare the output of TXTRACTOR to the output of a segmentation tool based on the TextTiling algorithm to validate the approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
in TIPSTER Text Phase III 18-Month Workshop, (Fairfax, VA, 1998)
|
| |
2
|
Anderson, M.K. Nanotech Fine Tuning http://www.wired.com/news/technology/0,1282,49447-2,00.html
|
| |
3
|
Aone, C., Okurowski, M.E., Gorlinsky, J. and Larsen, B. A Trainable Summarizer with Knowledge Acquired from Robust NLP Techniques. in Maybury, M.T. ed. Advances in Automatic Text Summarization, The MIT Press, Cambridge, 1999, 71--80
|
| |
4
|
Boguraev, B. and Kennedy, C., Salience-based Content Characterization of Text Documents. in Proceedings of the Workshop on Intelligent Scalable Text Summarization at the ACL/EACL Conference, (Madrid, Spain, 1997), 2--9
|
 |
5
|
|
| |
6
|
Edmundson, H.P. New Methods in Automatic Extracting. in Maybury, M.T. ed. Advances in Automatic Text Summarization, The MIT Press, Cambridge, 1969, 23--42
|
| |
7
|
Firmin, T. and Chrzanowski, M.J. An Evaluation of Automatic Text Summarization Systems. in Maybury, M.T. ed. Advances in Automatic Text Summarization, The MIT Press, Cambridge, 1999
|
 |
8
|
Jade Goldstein , Mark Kantrowitz , Vibhu Mittal , Jaime Carbonell, Summarizing text documents: sentence selection and evaluation metrics, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.121-128, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312665]
|
| |
9
|
|
| |
10
|
Hovy, E. and Lin, C.-Y. Automated Text Summarization in SUMMARIST. in Maybury, M.T. ed. Advances in Automatic Text Summarization, The MIT Press, Cambridge, 1999, 81--94
|
| |
11
|
|
 |
12
|
Julian Kupiec , Jan Pedersen , Francine Chen, A trainable document summarizer, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.68-73, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215333]
|
 |
13
|
|
| |
14
|
Landers, A. Ann Landers http://www.washingtonpost.com/wp-dyn/articles/A62823-2002Jan4.html
|
| |
15
|
Lessig, L. May the Source Be With You. Wired Magazine, 9.12 (December). http://www.wired.com/wired/archive/9.12/lessig.html
|
| |
16
|
Luhn, H.P. The Automatic Creation of Literature Abstracts. in Maybury, M.T. ed. Advances in Automatic Text Summarization, The MIT Press, Cambridge, 1958, 15--22
|
| |
17
|
|
 |
18
|
|
| |
19
|
Dragomir R. Radev , Hongyan Jing , Malgorzata Budzikowska, Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies, NAACL-ANLP 2000 Workshop on Automatic summarization, p.21-30, April 30-30, 2000, Seattle, Washington
[doi> 10.3115/1117575.1117578]
|
| |
20
|
Recap. No. 4 Virginia suffers first loss of season http://sports.espn.go.com/ncaa/mbasketball/recap?gameId=220050189, 2002
|
 |
21
|
|
 |
22
|
|
| |
23
|
Shachtman, N. Turning Snooping Into Art http://www.wired.com/news/culture/0,1284,49439,00.html
|
| |
24
|
Teufel, S. and Moens, M., Sentence Extraction as a Classification Task. in Workshop on Intelligent Scalable Summarization ACL/EACL Conference, (Madrid, Spain, 1999), 58--65
|
| |
25
|
Vaknin, S. A Primer on Narcissism http://www.mentalhelp.net/poc/view_doc.php/type/doc/id/419
|
CITED BY 13
|
|
Wingyan Chung , Yiwen Zhang , Zan Huang , Gang Wang , Thian-Huat Ong , Hsinchun Chen, Internet searching and browsing in a multilingual world: an experiment on the Chinese business intelligence portal (CBizPort), Journal of the American Society for Information Science and Technology, v.55 n.9, p.818-831, July 2004
|
|
|
Yilu Zhou , Jialun Qin , Hsinchun Chen , Zan Huang , Yiwen Zhang , Wingyan Chung , Gang Wang, CMedPort: a cross-regional Chinese medical portal, Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries, May 27-31, 2003, Houston, Texas
|
|
|
|
|
|
Hsinchun Chen , Yilu Zhou , Jialun Qin , Catherine Larson, Digital government portal: a tool for digital government research, Proceedings of the 2004 annual national conference on Digital government research, p.1-2, May 24-26, 2004, Seattle, WA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
S. Lamprier , T. Amghar , B. Levrat , F. Saubion, SegGen: a genetic algorithm for linear text segmentation, Proceedings of the 20th international joint conference on Artifical intelligence, p.1647-1652, January 06-12, 2007, Hyderabad, India
|
|