|
ABSTRACT
This article presents a novel automatic method (AutoSummENG) for the evaluation of summarization systems, based on comparing the character n-gram graphs representation of the extracted summaries and a number of model summaries. The presented approach is language neutral, due to its statistical nature, and appears to hold a level of evaluation performance that matches and even exceeds other contemporary evaluation methods. Within this study, we measure the effectiveness of different representation methods, namely, word and character n-gram graph and histogram, different n-gram neighborhood indication methods as well as different comparison methods between the supplied representations. A theory for the a priori determination of the methods' parameters along with supporting experiments concludes the study to provide a complete alternative to existing methods concerning the automatic summary system evaluation process.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Anderson, J. 1985. Cognitive Psychology and Its Implications, 2nd ed. WH Freeman.
|
| |
2
|
Banko, M. and Vanderwende, L. 2004. Using n-grams to understand the nature of summaries. In HLT-NAACL 2004: Short Papers, D. M. Susan Dumais and S. Roukos, Eds. Association for Computational Linguistics, Boston, Massachusetts, USA, 1--4.
|
| |
3
|
|
| |
4
|
Cavnar, W. B. and Trenkle, J. M. 1994. N-gram-based text categorization. In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval. Las Vegas, US, 161--175.
|
| |
5
|
Cleveland, W. 1981. LOWESS: A program for smoothing scatterplots by robust locally weighted regression. The American Statistician 35, 1, 54.
|
| |
6
|
Conroy, J. and Dang, H. T. 2008. Mind the gap: dangers of divorcing evaluations of summary content from linguistic quality. In Coling 2008 (to appear).
|
| |
7
|
Copeck, T. and Szpakowicz, S. 2004. Vocabulary Usage in Newswire Summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. Association for Computational Linguistics, 19--26.
|
| |
8
|
|
| |
9
|
Dang, H. 2005. Overview of DUC 2005. In Proceedings of the Document Understanding Conference Workkshop. (DUC'05) at the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP'05).
|
| |
10
|
Dang, H. 2006. Overview of DUC 2006. In Proceedings of HLT-NAACL'06.
|
| |
11
|
|
| |
12
|
Efron, B. and Tibshirani, R. 1993. An Introduction to the Bootstrap. Chapman & Hall/CRC.
|
| |
13
|
Endres-Niggemeyer, B. 2000. Human-style www summarization. http://endres.niggemeyer.fh-hannover.de/paps.html
|
| |
14
|
Erkan, G. and Radev, D. 2004a. LexRank: Graph-based lexical centrality as salience in text summarization. J. Art. Int. Resear. 22, 457--479.
|
| |
15
|
Erkan, G. and Radev, D. 2004b. Michigan at DUC 2004—Using sentence prestige for document summarization. In Proceedings of the Document Understanding Conferences.
|
| |
16
|
Giannakopoulos, G., Karkaletsis, V., and Vouros, G. 2006. Automatic Multi-document summarization and prior knowledge: Past, present and vision. Tech. rep. (DEMO-2006-2). NCSR Demokritos.
|
| |
17
|
Hollander, M. and Wolfe, D. 1973. Nonparametric Statistical Inference. John Wiley, New York, NY.
|
| |
18
|
Houvardas, J. and Stamatatos, E. 2006. N-gram feature selection for authorship identification. In Proceedings of the 12th International Conference on Artificial Intelligence: Methodology, Systems, Applications (AIMSA'06). 77--86.
|
| |
19
|
Hovy, E., Lin, C., and Zhou, L. 2005a. Evaluating DUC 2005 using basic elements. In Proceedings of the Document Understanding Conference (DUC'05).
|
| |
20
|
Hovy, E., Lin, C., Zhou, L., and Fukumoto, J. 2005b. Basic Elements. http://haydn.isi.edu/BE/.
|
| |
21
|
Hovy, E., Lin, C., Zhou, L., and Fukumoto, J. 2006. Automated summarization evaluation with basic elements. In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC).
|
| |
22
|
Kendall, M. 1962. Rank Correlation Methods. Hafner, New York, NY.
|
| |
23
|
Lamkhede, S. 2005. Multidocument summarization using concept chain graphs. M.S. thesis, State University of New York at Buffalo.
|
| |
24
|
Lin, C. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS'04). 25--26.
|
| |
25
|
|
| |
26
|
|
| |
27
|
Luhn, H. 1958. The automatic creation of literature abstracts. IBM J. Resear. Devel. 2, 2, 159--165.
|
| |
28
|
Mani, I. and Bloedorn, E. 1997. Multi-document summarization by graph search and matching. In Proceedings of AAAI. AAAI, 622--628.
|
| |
29
|
|
| |
30
|
|
| |
31
|
Massey Jr, F. 1951. The Kolmogorov-Smirnov test for goodness of fit. J. Amer. Statist. Ass. 46, 253, 68--78.
|
| |
32
|
Matsuo, Y., Ohsawa, Y., and Ishizuka, M. 2001. A Document as a Small World. In Proceedings the 5th World Multi-Conference on Systemics, Cybenetics and Infomatics (SCI'01) Vol. 8. 410--414.
|
| |
33
|
|
| |
34
|
Mihalcea, R. 2005. Multi-document Summarization with iterative graph-based algorithms. In Proceedings of the 1st International Conference on Intelligent Analysis Methods and Tools (IA'05).
|
| |
35
|
Mohamed, A. and Rajasekaran, S. 2006. Query-based summarization based on document graphs. In Proceedings of the Document Understanding Conference (DUC'06).
|
 |
36
|
|
| |
37
|
|
| |
38
|
Jahna Otterbacher , Güneş Erkan , Dragomir R. Radev, Using random walks for question-focused sentence retrieval, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, p.915-922, October 06-08, 2005, Vancouver, British Columbia, Canada
[doi> 10.3115/1220575.1220690]
|
| |
39
|
|
| |
40
|
Passonneau, R., McKeown, K., Sigelman, S., and Goodkind, A. 2006. Applying the Pyramid method. In Proceedings of the Document Understanding Conference (DUC'06).
|
| |
41
|
|
| |
42
|
Dragomir R. Radev , Hongyan Jing , Malgorzata Budzikowska, Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies, NAACL-ANLP 2000 Workshop on Automatic summarization, p.21-30, April 30-30, 2000, Seattle, Washington
[doi> 10.3115/1117575.1117578]
|
| |
43
|
Raymond, J., Gardiner, E., and Willett, P. 2002. RASCAL: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45, 6, 631.
|
| |
44
|
|
| |
45
|
Spearman, C. 1906. Footrule for measuring correlation. British J. Psych. 2, 89--108.
|
| |
46
|
Stephens, M. 1974. EDF Statistics for goodness of fit and some comparisons. J. Amer. Statist. Ass. 69, 347, 730--737.
|
| |
47
|
|
| |
48
|
|
| |
49
|
Voorhees, E. 2003. Overview of the TREC 2003 question answering track. In Proceedings of the 12th Text REtrieval Conference (TREC'03).
|
| |
50
|
Witte, R., Krestel, R., and Bergler, S. 2006. Context-based multi-document summarization using fuzzy coreference cluster graphs. In Proceedings of Document Understanding Workshop (DUC). New York, NY.
|
| |
51
|
Zens, R. and Ney, H. 2006. N-gram posterior probabilities for statistical machine translation. In Proceedings of the Human Language Technology Conference Workshop on Statistical Machine Translation. 72--77.
|
| |
52
|
Liang Zhou , Chin-Yew Lin , Dragos Stefan Munteanu , Eduard Hovy, ParaEval: using paraphrases to evaluate summaries automatically, Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, p.447-454, June 04-09, 2006, New York, New York
[doi> 10.3115/1220835.1220892]
|
|