|
ABSTRACT
With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We propose new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets - the Reuters news-wire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline - non learning - system, and a reference trainable summarizer system.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Anderson J.A., Richardson S.C. Logistic Discrimination and Bias correction in maximum likelihood estimation. Technometrics, 21 (1979) 71-78.
|
| |
2
|
Banko, M.; Mittal, V.; Kantrowitz, M.; and Goldstein, J. Generating Extraction-Based Summaries from Hand-Written One by Text Alignment. Pac. Rim Conf. on Comp. (1999)
|
| |
3
|
Barzilay R., Elhadad M. Using lexical chains for text summarization. Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization, (1997) 10-17.
|
 |
4
|
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
Duda R. O., Hart P. T. Pattern Recognition and Scene Analysis. Edn. Wiley (1973).
|
 |
10
|
Jade Goldstein , Mark Kantrowitz , Vibhu Mittal , Jaime Carbonell, Summarizing text documents: sentence selection and evaluation metrics, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.121-128, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312665]
|
| |
11
|
|
| |
12
|
|
| |
13
|
Klavans J.L., Shaw J. Lexical semantics in summarization. Proceedings of the First Annual Workshop of the IFIP working Group for NLP and KR. (1995).
|
| |
14
|
Knaus D., Mittendorf E., Schauble P., Sheridan P. Highlighting Relevant Passages for Users of the Interactive SPIDER Retrieval System. in TREC-4 proceedings. (1994).
|
 |
15
|
Julian Kupiec , Jan Pedersen , Francine Chen, A trainable document summarizer, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, p.68-73, July 09-13, 1995, Seattle, Washington, United States
[doi> 10.1145/215206.215333]
|
| |
16
|
Luhn P.H. Automatic creation of literature abstracts. IBM Journal (1958) 159--165.
|
| |
17
|
|
| |
18
|
Marcu D. From discourse structures to text summaries. Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization. (1997) 82--88.
|
| |
19
|
McLachlan G.J. Discriminant Analysis and Statistical Pattern Recognition. Edn. John Wiley & Sons, New-York (1992).
|
| |
20
|
Miller D., Uyar H. A Mixture of Experts classifier with learning based on both labeled and unlabeled data. Advances in Neural Information Processing Systems. 9 (1996) 571--577.
|
| |
21
|
Mitra M., Singhal A., Buckley C. Automatic Text Summarization by Paragraph Extraction. Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization. (1997) 31-36.
|
| |
22
|
Vibhu Mittal , Mark Kantrowitz , Jade Goldstein , Jaime Carbonell, Selecting text spans for document summaries: heuristics and metrics, Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence, p.467-473, July 18-22, 1999, Orlando, Florida, United States
|
| |
23
|
Kamal Nigam , Andrew McCallum , Sebastian Thrun , Tom Mitchell, Learning to classify text from labeled and unlabeled documents, Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, p.792-799, July 1998, Madison, Wisconsin, United States
|
| |
24
|
NIST. TIPSTER Information-Retrieval Text Research Collection on CD-ROM. National Institute of Standards and Technology, Gaithersburg, Maryland. (1993).
|
 |
25
|
|
| |
26
|
|
| |
27
|
Robertson S., Sparck-Jones K., Relevance weighting of search terms. Journal of the American Society for Information Science, 27 3, (1976) 129-146.
|
| |
28
|
Roth V., Steinhage V. Nonlinear Discriminant Analysis using Kernel Functions. Advances in Neural Information Processing Systems. 12 (1999).
|
| |
29
|
Scott A.J., Symons M.J. Clustering Methods based on Likelihood Ratio Criteria. Biometrics. 27 (1991) 387--397.
|
| |
30
|
Sparck Jones K.: Discourse modeling for automatic summarizing. Technical Report 29D, Computer laboratory, university of Cambridge. (1993).
|
| |
31
|
Strzalkowski T., Wang J., Wise B. A robust practical text summarization system. Proceedings of the Fifteenth National Conference on AI. (1998) 26--30.
|
| |
32
|
Symons M.J. Clustering Criteria and Multivariate Normal Mixture. Biometrics. 37 (1981) 35--43.
|
| |
33
|
Teufel S., Moens M. Sentence Extraction as a Classification Task. Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarization. (1997) 58--65.
|
| |
34
|
Tipster text phase III 18-month workshop notes, May 1998. FairFax, VA.
|
| |
35
|
|
CITED BY 14
|
|
Massih R. Amini , Anastasios Tombros , Nicolas Usunier , Mounia Lalmas , Patrick Gallinari, Learning to summarise XML documents using content and structure, Proceedings of the 14th ACM international conference on Information and knowledge management, October 31-November 05, 2005, Bremen, Germany
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Young-Min Kim , Jean-François Pessiot , Massih Reza Amini , Patrick Gallinari, An extension of PLSA for document clustering, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|