| Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization |
| Full text |
Pdf
(491 KB)
|
Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Singapore, Singapore
SESSION: Summarization
table of contents
Pages 307-314
Year of Publication: 2008
ISBN:978-1-60558-164-4
|
|
Authors
|
|
Dingding Wang
|
Florida International University, Miami, FL, USA
|
|
Tao Li
|
Florida International University, Miami, FL, USA
|
|
Shenghuo Zhu
|
NEC Labs. America, Inc, Cupertino, CA, USA
|
|
Chris Ding
|
University of Texas at Arlington, Arlington, TX, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 47, Downloads (12 Months): 454, Citation Count: 0
|
|
|
ABSTRACT
Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and machine learning techniques to extract sentences from documents. In this paper, we propose a new multi-document summarization framework based on sentence-level semantic analysis and symmetric non-negative matrix factorization. We first calculate sentence-sentence similarities using semantic analysis and construct the similarity matrix. Then symmetric matrix factorization, which has been shown to be equivalent to normalized spectral clustering, is used to group sentences into clusters. Finally, the most informative sentences are selected from each group to form the summary. Experimental results on DUC2005 and DUC2006 data sets demonstrate the improvement of our proposed framework over the implemented existing summarization systems. A further study on the factors that benefit the high performance is also conducted.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
D. Arnold, L. Balkan, S. Meijer, R. Humphreys, and L. Sadler. Machine Translation: an Introductory Guide. Blackwells-NCC, 1994.
|
| |
4
|
|
 |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
Chris Ding , Tao Li , Wei Peng , Haesun Park, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150420]
|
| |
9
|
G. Erkan and D. Radev. Lexpagerank: Prestige in multi-document text summarization. In Proceedings of EMNLP 2004.
|
| |
10
|
C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.
|
 |
11
|
Jade Goldstein , Mark Kantrowitz , Vibhu Mittal , Jaime Carbonell, Summarizing text documents: sentence selection and evaluation metrics, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.121-128, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312665]
|
 |
12
|
|
 |
13
|
|
| |
14
|
T. Hirao, Y. Sasaki, and H. Isozaki. An extrinsic evaluation for question-biased text summarization on qa tasks. In Prodeedings of NAACL 2001 workshop on Automatic Summarization.
|
| |
15
|
|
| |
16
|
|
| |
17
|
D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS 2001.
|
 |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
I. Mani. Automatic summarization. John Benjamins Publishing Company, 2001.
|
| |
22
|
R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP 2005.
|
| |
23
|
|
| |
24
|
S. Park, J.-H. Lee, D.-H. Kim, and C.-M. Ahn. Multi-document summarization based on cluster using non-negtive matrix factorization. In Proceedings of SOFSEM 2007.
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
| |
28
|
G. Sampathsampath and M. Martinovic. A Multilevel Text Processing Model of Newsgroup Dynamics. 2002.
|
| |
29
|
D. Shen, J.-T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In Proceedings of IJCAI 2007.
|
| |
30
|
|
 |
31
|
|
| |
32
|
X. Wan, J. Yang, and J. Xiao. Manifold-ranking based topic-focused multi-document summarization. In Proceedings of IJCAI 2007.
|
| |
33
|
W.-T. Yih, J. Goodman, L. Vanderwende, and H. Suzuki. Multi-document summarization by maximizing informative content-words. In Proceedings of IJCAI 2007.
|
| |
34
|
H. Zha. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In Prodeedings of SIGIR 2005.
|
|