ACM Home Page
Please provide us with feedback. Feedback
The use of unlabeled data to improve supervised learning for text summarization
Full text PdfPdf (277 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Tampere, Finland
SESSION: Summarization table of contents
Pages: 105 - 112  
Year of Publication: 2002
ISBN:1-58113-561-0
Authors
Massih-Reza Amini  University of Pierre and Marie Curie, Paris, France
Patrick Gallinari  University of Pierre and Marie Curie, Paris, France
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 21,   Downloads (12 Months): 142,   Citation Count: 14
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/564376.564397
What is a DOI?

ABSTRACT

With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We propose new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets - the Reuters news-wire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline - non learning - system, and a reference trainable summarizer system.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Anderson J.A., Richardson S.C. Logistic Discrimination and Bias correction in maximum likelihood estimation. Technometrics, 21 (1979) 71-78.
 
2
Banko, M.; Mittal, V.; Kantrowitz, M.; and Goldstein, J. Generating Extraction-Based Summaries from Hand-Written One by Text Alignment. Pac. Rim Conf. on Comp. (1999)
 
3
Barzilay R., Elhadad M. Using lexical chains for text summarization. Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, (1997) 10-17.
4
5
6
 
7
8
 
9
Duda R. O., Hart P. T. Pattern Recognition and Scene Analysis. Edn. Wiley (1973).
10
 
11
 
12
 
13
Klavans J.L., Shaw J. Lexical semantics in summarization. Proceedings of the First Annual Workshop of the IFIP working Group for NLP and KR. (1995).
 
14
Knaus D., Mittendorf E., Schauble P., Sheridan P. Highlighting Relevant Passages for Users of the Interactive SPIDER Retrieval System. in TREC-4 proceedings. (1994).
15
 
16
Luhn P.H. Automatic creation of literature abstracts. IBM Journal (1958) 159--165.
 
17
 
18
Marcu D. From discourse structures to text summaries. Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization. (1997) 82--88.
 
19
McLachlan G.J. Discriminant Analysis and Statistical Pattern Recognition. Edn. John Wiley & Sons, New-York (1992).
 
20
Miller D., Uyar H. A Mixture of Experts classifier with learning based on both labeled and unlabeled data. Advances in Neural Information Processing Systems. 9 (1996) 571--577.
 
21
Mitra M., Singhal A., Buckley C. Automatic Text Summarization by Paragraph Extraction. Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization. (1997) 31-36.
 
22
 
23
 
24
NIST. TIPSTER Information-Retrieval Text Research Collection on CD-ROM. National Institute of Standards and Technology, Gaithersburg, Maryland. (1993).
25
 
26
 
27
Robertson S., Sparck-Jones K., Relevance weighting of search terms. Journal of the American Society for Information Science, 27 3, (1976) 129-146.
 
28
Roth V., Steinhage V. Nonlinear Discriminant Analysis using Kernel Functions. Advances in Neural Information Processing Systems. 12 (1999).
 
29
Scott A.J., Symons M.J. Clustering Methods based on Likelihood Ratio Criteria. Biometrics. 27 (1991) 387--397.
 
30
Sparck Jones K.: Discourse modeling for automatic summarizing. Technical Report 29D, Computer laboratory, university of Cambridge. (1993).
 
31
Strzalkowski T., Wang J., Wise B. A robust practical text summarization system. Proceedings of the Fifteenth National Conference on AI. (1998) 26--30.
 
32
Symons M.J. Clustering Criteria and Multivariate Normal Mixture. Biometrics. 37 (1981) 35--43.
 
33
Teufel S., Moens M. Sentence Extraction as a Classification Task. Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization. (1997) 58--65.
 
34
Tipster text phase III 18-month workshop notes, May 1998. FairFax, VA.
 
35

CITED BY  14

Collaborative Colleagues:
Massih-Reza Amini: colleagues
Patrick Gallinari: colleagues