| Enhancing diversity, coverage and balance for summarization through structure learning |
| Full text |
Pdf
(782 KB)
|
Source
|
International World Wide Web Conference
archive
Proceedings of the 18th international conference on World wide web
table of contents
Madrid, Spain
SESSION: Data mining/session: text mining
table of contents
Pages 71-80
Year of Publication: 2009
ISBN:978-1-60558-487-4
|
|
Authors
|
|
Liangda Li
|
Shanghai Jiao-Tong University, Shanghai, China
|
|
Ke Zhou
|
Shanghai Jiao-Tong University, Shanghai, China
|
|
Gui-Rong Xue
|
Shanghai Jiao-Tong University, Shanghai, China
|
|
Hongyuan Zha
|
Georgia Institute of Technology, Atlanta, GA, USA
|
|
Yong Yu
|
Shanghai Jiao-Tong University, Shanghai, China
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 44, Downloads (12 Months): 178, Citation Count: 0
|
|
|
ABSTRACT
Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to generate summaries from documents. However, these approaches seldom simultaneously consider summary diversity, coverage, and balance issues which to a large extent determine the quality of summaries. In this paper, we consider extract-based summarization emphasizing the following three requirements: 1) diversity in summarization, which seeks to reduce redundancy among sentences in the summary; 2) sufficient coverage, which focuses on avoiding the loss of the document's main information when generating the summary; and 3) balance, which demands that different aspects of the document need to have about the same relative importance in the summary. We formulate the extract-based summarization problem as learning a mapping from a set of sentences of a given document to a subset of the sentences that satisfies the above three requirements. The mapping is learned by incorporating several constraints in a structure learning framework, and we explore the graph structure of the output variables and employ structural SVM for solving the resulted optimization problem. Experiments on the DUC2001 data sets demonstrate significant performance improvements in terms of F1 and ROUGE metrics.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Ibm many aspects document summarization tool, http://www.alphaworks.ibm.com/tech/manyaspects.
|
| |
2
|
|
 |
3
|
|
 |
4
|
Charles L.A. Clarke , Maheedhar Kolla , Gordon V. Cormack , Olga Vechtomova , Azin Ashkan , Stefan Büttcher , Ian MacKinnon, Novelty and diversity in information retrieval evaluation, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
[doi> 10.1145/1390334.1390446]
|
 |
5
|
|
 |
6
|
|
| |
7
|
G. ErKan and D. R. Radev. Lexpagerank: Prestige in multi-document text summarization. In EMNLP, Barcelona, Spain, 2004.
|
| |
8
|
J. Friedman, T. Hastie, and R. Tibshirani. The Elements of Statistical Learning: Data Mining, Inference and Prediction. newblock 2001.
|
| |
9
|
Jade Goldstein , Vibhu Mittal , Jaime Carbonell , Mark Kantrowitz, Multi-document summarization by sentence extraction, NAACL-ANLP 2000 Workshop on Automatic summarization, p.40-48, April 30-30, 2000, Seattle, Washington
[doi> 10.3115/1117575.1117580]
|
 |
10
|
|
 |
11
|
|
 |
12
|
Hilda Hardy , Nobuyuki Shimizu , Tomek Strzalkowski , Liu Ting , Xinyang Zhang , G. Bowden Wise, Cross-document summarization by concept classification, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564399]
|
| |
13
|
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
 |
17
|
Julian Kupiec , Jan Pedersen , Francine Chen, A trainable document summarizer, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.68-73, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215333]
|
| |
18
|
|
| |
19
|
D. Metzler and T. Kanungo. Machine learned sentence selection strategies for query-biased summarization. In SIGIR, 2008.
|
| |
20
|
R. Mihalcea. Language independent extractive summarization. In AAAI, pages 1688--1689, 2005.
|
| |
21
|
R. Mihalcea and P. Tarau. Textrank: Bringing order into texts. In EMNLP, Barcelona, Spain, 2004.
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
 |
25
|
Dou Shen , Zheng Chen , Qiang Yang , Hua-Jun Zeng , Benyu Zhang , Yuchang Lu , Wei-Ying Ma, Web-page classification through summarization, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
[doi> 10.1145/1008992.1009035]
|
| |
26
|
D. Shen, J. T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In IJCAI, pages 2862--2867, 2007.
|
 |
27
|
Jian-Tao Sun , Dou Shen , Hua-Jun Zeng , Qiang Yang , Yuchang Lu , Zheng Chen, Web-page summarization using clickthrough data, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil
[doi> 10.1145/1076034.1076070]
|
| |
28
|
|
| |
29
|
K. Wagsta, M. desJardins, E. Eaton, and J. Montminy. Learning and visualizing user preferences over sets. In AAAI, 2007.
|
 |
30
|
|
 |
31
|
|
 |
32
|
|
|