ACM Home Page
Please provide us with feedback. Feedback
Update summarization based on novel topic distribution
Full text PdfPdf (370 KB)
Source
Document Engineering archive
Proceedings of the 9th ACM symposium on Document engineering table of contents
Munich, Germany
SESSION: Document and linguistics (II) table of contents
Pages 205-213  
Year of Publication: 2009
ISBN:978-1-60558-575-8
Authors
Josef Steinberger  University of West Bohemia, Pilsen, Czech Rep
Karel Ježek  University of West Bohemia, Pilsen, Czech Rep
Sponsors
SIGDOC: ACM Special Interest Group for Design of Communications
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 23,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1600193.1600239
What is a DOI?

ABSTRACT

This paper deals with our recent research in text summarization. The field has moved from multi-document summarization to update summarization. When producing an update summary of a set of topic-related documents the summarizer assumes prior knowledge of the reader determined by a set of older documents of the same topic. The update summarizer thus must solve a novelty vs. redundancy problem. We describe the development of our summarizer which is based on Iterative Residual Rescaling (IRR) that creates the latent semantic space of a set of documents under consideration. IRR generalizes Singular Value Decomposition (SVD) and enables to control the influence of major and minor topics in the latent space. Our sentence-extractive summarization method computes the redundancy, novelty and significance of each topic. These values are finally used in the sentence selection process. The sentence selection component prevents inner summary redundancy. The results of our participation in TAC evaluation seem to be promising.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Document understanding conference 2007: http://duc.nist.gov/.
 
2
Text analysis conference 2008: http://www.nist.gov/tac/tracks/2008/index.html.
 
3
R. Ando and L. Lee. Iterative residual rescaling: An analysis and generalization of lsi. In Proceeding of the 24th SIGIR, 2001.
 
4
M. Berry, S. Dumais, and G. O'Brien. Using linear algebra for intelligent ir. SIAM Review, 37(4), 1995.
 
5
F. Boudin, M. El-Beze, and J. Torres-Moreno. A scalable mmr approach to sentence scoring for multi-document update summarization. In Proceedings of the 22nd International Conference on Computational Linguistics, 2008.
 
6
J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998.
 
7
F. Choi, P. Wiemer-Hastings, and J. Moore. Latent semantic analysis for text segmentation. In Proceedings of EMNLP, 2001.
 
8
C. Ding. A probabilistic model for latent semantic indexing. Journal of the American Society for Information Science and Technology, 56(6), 2005.
 
9
T. Dunning. Accurate methods for statistics of surprise and coincidence. Computational Linguistics, 19, 1993.
 
10
G. Erkan and D. Radev. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 2004.
 
11
Y. Gong and X. Liu. Generic text summarization using relevance measure and latent semantic analysis. In Proceedings of ACM SIGIR, 2002.
 
12
B. Hachey, G. Murray, and D. Reitter. The embra system at duc 2005: Query-oriented multi-document summarization with a very large latent semantic space. In Proceedings of the Document Understanding Conference, 2005.
 
13
A. Hickl, K. Roberts, and F. Lacatusu. Lcc's gistexter at duc 2007: Machine reading for update summarization. In Proceedings of the Document Understanding Conference, 2007.
 
14
E. Hovy and C. Lin. Automated text summarization in summarist. In Proceedings of ACL/EACL workshop on intelligent scalable text summarization, 1997.
 
15
E. Hovy, C.-Y. Lin, and L. Zhou. Evaluating duc 2005 using basic elements. In Proceedings of the Document Understanding Conference, 2005.
 
16
T. Landauer and S. Dumais. A solution to platos problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 1997.
 
17
C.-H. Lee, H.-C. Yang, and S.-M. Ma. A novel multilingual text categorization system using latent semantic indexing. In Proceedings of the First International Conference on Innovative Computing, Information and Control. IEEE Computer Society, 2006.
 
18
C. Lin. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out, 2004.
 
19
I. Mani and G. Wilson. Robust temporal processing of news. In 38th Annual Meeting on Association for Computational Linguistics, 2000.
 
20
R. Mihalcea and P. Tarau. Text-rank - bringing order into texts. In Proceeding of the Conference on Empirical Methods in Natural Language Processing, 2004.
 
21
R. Mihalcea and P. Tarau. An algorithm for language independent single and multiple document summarization. In Proceedings of the International Joint Conference on Natural Language Processing, 2005.
 
22
G. Murray, S. Renals, and J. Carletta. Extractive summarization of meeting recordings. In Proceedings of Interspeech, 2005.
 
23
A. Nenkova and R. Passonneau. Evaluating content selection in summarization: The pyramid method. In Document Understanding Conference, 2005.
 
24
P. Over, H. Dang, and D. Harman. Duc in context. Information Processing and Management, 43(6), 2007.
 
25
J. Steinberger and K. Ježek. Text summarization and singular value decomposition. In Lecture Notes in Computer Science 2457. Springer-Verlag Berlin Heidelberg, 2004.
 
26
J. Steinberger and K. Ježek. Sutler: Update summarizer based on latent topics. In Proceedings of TAC 2008, 2009.
 
27
J. Steinberger and M. Křišt'an. Lsa-based multi-document summarization. In Proceedings of 8th International Workshop on Systems and Control, 2007.
 
28
J. Steinberger, M. Poesio, M. Kabadjov, and K. Ježek. Two uses of anaphora resolution in summarization. Information Processing and Management, 43(6), 2007.
 
29
R. Swan and J. Allan. Automatic generation of overview timelines. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 2000.
 
30
R. Witte, R. Krestel, and S. Bergler. Generating update summaries for duc 2007. In Proceedings of the Document Understanding Conference, 2007.
 
31
J. Yeh, H. Ke, W. Yang, and I. Meng. Text summarization using a trainable summarizer and latent semantic analysis. Special issue of Information Processing and Management on An Asian digital libraries perspective, 41(1), 2005.
 
32
J. Zhang, X. Cheng, H. Xu, X. Wang, and Y. Zeng. Ictcas's ictgrasper at tac 2008: Summarizing dynamic information with signature terms based content filtering. In Proceedings of TAC 2008, 2009.