ACM Home Page
Please provide us with feedback. Feedback
Multi-document summarization using cluster-based link analysis
Full text PdfPdf (368 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Singapore, Singapore
SESSION: Summarization table of contents
Pages 299-306  
Year of Publication: 2008
ISBN:978-1-60558-164-4
Authors
Xiaojun Wan  Peking University, Beijing, China
Jianwu Yang  Peking University, Beijing, China
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 64,   Downloads (12 Months): 552,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1390334.1390386
What is a DOI?

ABSTRACT

The Markov Random Walk model has been recently exploited for multi-document summarization by making use of the link relationships between sentences in the document set, under the assumption that all the sentences are indistinguishable from each other. However, a given document set usually covers a few topic themes with each theme represented by a cluster of sentences. The topic themes are usually not equally important and the sentences in an important theme cluster are deemed more salient than the sentences in a trivial theme cluster. This paper proposes the Cluster-based Conditional Markov Random Walk Model (ClusterCMRW) and the Cluster-based HITS Model (ClusterHITS) to fully leverage the cluster-level information. Experimental results on the DUC2001 and DUC2002 datasets demonstrate the good effectiveness of our proposed summarization models. The results also demonstrate that the ClusterCMRW model is more robust than the ClusterHITS model, with respect to different cluster numbers.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
G. Erkan and D. Radev. LexPageRank: prestige in multi-document text summarization. In Proceedings of EMNLP2004.
5
6
7
8
9
 
10
 
11
W. Kraaij, M. Spitters and M. van der Heijden. Combining a mixture language model and Naïve Bayes for multi-document summarization. In SIGIR2001 Workshop on Text Summarization.
12
13
 
14
 
15
 
16
 
17
 
18
 
19
D. Marcu. Discourse-based summarization in DUC-2001. 2001. In SIGIR 2001 Workshop on Text Summarization.
 
20
 
21
R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP2005.
 
22
L. Page, S. Brin, R. Motwani and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Libraries, 1998.
 
23
 
24
 
25
X. Wan and J. Yang. 2006. Improved affinity graph based multi-document summarization. In Proceedings of HLT-NAACL2006.
26
27
 
28
D. Zhou, S. A. Orshanskiy, H. Zha and C. L. Giles. Co-ranking authors and documents in a heterogeneous network. In Proceedings of IEEE ICDM2007.

Collaborative Colleagues:
Xiaojun Wan: colleagues
Jianwu Yang: colleagues