ACM Home Page
Please provide us with feedback. Feedback
Extrinsic summarization evaluation: A decision audit task
Full text PdfPdf (912 KB)
Source
ACM Transactions on Speech and Language Processing (TSLP) archive
Volume 6 ,  Issue 2  (October 2009) table of contents
Article No. 2  
Year of Publication: 2009
ISSN:1550-4875
Authors
Gabriel Murray  University of British Columbia
Thomas Kleinbauer  German Research Center for Artificial Intelligence (DFKI)
Peter Poller  German Research Center for Artificial Intelligence (DFKI)
Tilman Becker  German Research Center for Artificial Intelligence (DFKI)
Steve Renals  University of Edinburgh
Jonathan Kilgour  University of Edinburgh
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 23,   Downloads (12 Months): 23,   Citation Count: 0
Additional Information:

abstract   references   index terms  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1596517.1596518
What is a DOI?

ABSTRACT

In this work we describe a large-scale extrinsic evaluation of automatic speech summarization technologies for meeting speech. The particular task is a decision audit, wherein a user must satisfy a complex information need, navigating several meetings in order to gain an understanding of how and why a given decision was made. We compare the usefulness of extractive and abstractive technologies in satisfying this information need, and assess the impact of automatic speech recognition (ASR) errors on user performance. We employ several evaluation methods for participant performance, including post-questionnaire data, human subjective and objective judgments, and a detailed analysis of participant browsing behavior. We find that while ASR errors affect user satisfaction on an information retrieval task, users can adapt their browsing behavior to complete the task satisfactorily. Results also indicate that users consider extractive summaries to be intuitive and useful tools for browsing multimodal meeting data. We discuss areas in which automatic summarization techniques can be improved in comparison with gold-standard meeting abstracts.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Alexandersson, J. 2003. Hybrid discourse modelling and summarization for a speech-to-speech translation system. Ph.D. dissertation, Universtität des Saarlandes, Germany.
 
2
Arons, B. 1997. Speechskimmer: A system for interactively skimming recorded speech. ACM Trans. Comput.-Hum. Interact. 4, 1, 3--38.
 
3
Carletta, J. 2007. Unleashing the killer corpus: Experiences in creating the multi-everything Lang. Resour. Eval. 41, 2, 181--190.
 
4
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., and Wellner, P. 2006. The AMI meeting corpus: A pre-announcement. In Machine Learning for Multimodal Interaction. Lecture Notes in Computer Science, vol. 3869, Springer, Berlin, 28--39.
 
5
Christensen, H., Kolluru, B., Gotoh, Y., and Renals, S. 2004. From text summarisation to style-specific summarisation for broadcast news. In Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 2997, Springer, Berlin, 223--237.
 
6
Daumé, H. and Marcu, D. 2005. Bayesian summarization at DUC and a suggestion for extrinsic evaluation. In Proceedings of the Document Understanding Conference.
 
7
DeJong, G. 1982. An overview of the FRUMP system. In Strategies for Natural Language Processing, W. G. Lehnert and M. H. Ringle Eds., Lawrence Erlbaum, Mahwah, NJ, 149--176.
 
8
Dorr, B., Monz, C., Oard, D., Zajic, D., and Schwartz, R. 2004. Extrinsic evaluation of automatic metrics for summarization. Tech. Rep. LAMP-TR-115,CAR-TR-999,CS-TR-4610,UMIACS-TR-2004-48, University of Maryland, College Park and BBN Technologies.
 
9
Dorr, B., Monz, C., President, S., Schwartz, R., and Zajic, D. 2005. A methodology for extrinsic evaluation of text summarization: Does ROUGE correlate? In Proceedings of the ACL05 Workshop.
 
10
Endres-Niggemeyer, B. 1998. Summarizing Information. Springer, Berlin.
 
11
Galley, M. 2006. A skip-chain conditional random field for ranking meeting utterances by importance. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP'06). Association for Computational Linguistics, 364--372.
 
12
Hahn, U. and Reimer, U. 1999. Knowledge-based text summarization: Salience and generalization operators for knowledge base abstraction. In Advances in Automatic Text Summarization, I. Mani and M. Maybury Eds., MIT Press, Cambridge, MA, 215--232.
 
13
Harman, D. and Over, P. Eds. 2004. Proceedings of the Document Understanding Conference.
 
14
Hirschberg, J., Bacchiani, M., Hindle, D., Eisenhower, P., Rosenberg, A., Stark, L., Stead, L., Whittaker, S., and Zamchick, G. 2001. SCANMail: Browsing and searching speech data by content. In Proceedings of the 7th European Conference on Speech Communication and Technology. 1299--1302.
 
15
Hirschman, L., Light, M., and Breck, E. 1999. Deep read: A reading comprehension system. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 325--332.
 
16
Hori, C., Furui, S., Malkin, R., Yu, H., and Waibel, A. 2002. Automatic speech summarization applied to english broadcast news speech. In Proceedings of the International Conference in Acoustics Speech and Signal Processing. 9--12.
 
17
Jing, H., Barzilay, R., McKeown, K., and Elhadad, M. 1998. Summarization evaluation methods: Experiments and analysis. In Proceedings of the AAAI Symposium on Intelligent Summarization. 60--68.
 
18
Jones, K. S. and Galliers, J. 1995. Evaluating natural language processing systems: An analysis and review. Lecture Notes in Artificial Intelligence, vol. 1083, Springer, Berlin.
 
19
Kameyama, M., Kawai, G., and Arima, I. 1996. A real-time system for summarizing human-human spontaneous dialogues. In Proceedings of the 4th International Conference on Spoken Language Processing (ICSLP'96), Vol. 2, 681--684.
 
20
Kleinbauer, T., Becker, S., and Becker, T. 2007. Combining multiple information layers for the automatic generation of indicative meeting abstracts. In Proceedings of the European Natural Language Generation Workshop. 151--154.
 
21
Kolluru, B., Gotoh, Y., and Christensen, H. 2005. Multi-stage compaction approach to broadcast news summarisation. In Proceedings of the Interspeech Conference. 69--72.
 
22
Koumpis, K. and Renals, S. 2005. Automatic summarization of voicemail messages using lexical and prosodic features. ACM Trans. Speech Lang. Process. 2, 1--24.
 
23
Kraaij, W. and Post, W. 2006. Task based evaluation of exploratory search systems. In Proceedings of the SIGIR Workshop, Evaluation Exploratory Search Systems. ACM, New York, 24--27.
 
24
Lin, C.-Y. 2004. Looking for a few good metrics: Automatic summarization evaluation: How many samples are enough. In Proceedings of the NTCIR-5 Workshop. 1765--1776.
 
25
Lin, C.-Y. and Hovy, E. H. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the HLT-NAACL03 on Text Summerization. 71--78.
 
26
Mani, I. 2001. Summarization evaluation: An overview. In Proceedings of the NTCIR Workshop 2 Meeting on Evaluation of Chinese and Japanese Text Retrieval and Text Summarization. 77--85.
 
27
Mani, I., House, D., Klein, G., Hirschman, L., Firmin, T., and Sundheim, B. 1999. The TIPSTER SUMMAC text summarization evaluation. In Proceedings of the EACL'99. 77--85.
 
28
Maskey, S. and Hirschberg, J. 2005. Comparing lexical, acoustic/prosodic, discourse and structural features for speech summarization. In Proceedings of the Interspeech Conference. 621--624.
 
29
Morris, A., Kasper, G., and Adams, D. 1992. The effects and limitations of automated text condensing on reading comprehension performance. Inform. Syst. Resear. 3, 1, 17--35.
 
30
Murray, G. and Renals, S. 2007. Term-weighting for summarization of multi-party spoken dialogues. In Proceedings of the MLMI Conference. 155--166.
 
31
Murray, G., Renals, S., Carletta, J., and Moore, J. 2005. Evaluating automatic summaries of meeting recordings. In Proceedings of the ACL MTSE Workshop. 33--40.
 
32
Murray, G., Renals, S., Moore, J., and Carletta, J. 2006. Incorporating speaker and discourse features into speech summarization. In Proceedings of the HLT-NAACL Conference. 367--374.
 
33
Nenkova, A. and Passonneau, B. 2004. Evaluating content selection in summarization: The Pyramid method. In Proceedings of the HLT-NAACL Conference. 145--152.
 
34
Nenkova, A., Passonneau, R., and McKeown, K. 2007. The Pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Trans. Comput. Logic 4, 2, 1--23.
 
35
Paice, C. D. and Jones, P. A. 1993. The identification of important concepts in highly structured technical papers. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93). ACM, New York, 69--78.
 
36
Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311--318.
 
37
Saggion, H. and Lapalme, G. 2002. Generating indicative-informative summaries with sumum. Comput. Linguist. 28, 4, 497--526.
 
38
Sparck-Jones, K. 1999. Automatic summarizing: Factors and directions. In Advances in Automatic Text Summarization, I. Mani and M. Maybury Eds., MITP, 1--12.
 
39
Tucker, S. and Whittaker, S. 2004. Accessing multimodal meeting data: Systems, problems and possibilities. In Proceedings of the MLMI Conference. 1--11.
 
40
Valenza, R., Robinson, T., Hickey, M., and Tucker, R. 1999. Summarization of spoken audio through information extraction. In Proceedings. of the ESCA Workshop on Accessing Information in Spoken Audio. 111--116.
 
41
Wellner, P., Flynn, M., and Guillemot, M. 2004. Browsing recorded meetings with Ferret. In Proceedings of the MLMI Conference. 12--21.
 
42
Wellner, P., Flynn, M., Tucker, S., and Whittaker, S. 2005. A meeting browser evaluation test. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, 2021--2024.
 
43
Whittaker, S., Hirschberg, J., Amento, B., Stark, L., Bacchiani, M., Isenhour, P., Stead, L., Zamchick, G., and Rosenberg, A. 2002. Scanmail: A voicemail interface that makes speech browsable, readable and searchable. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, 275--282.
 
44
Whittaker, S., Tucker, S., Swampillai, K., and Laban, R. 2008. Design and evaluation of systems to support interaction capture and retrieval. Person. Ubiquit. Comput. 12, 3, 197--221.
 
45
Zechner, K. 2002. Automatic summarization of open-domain multiparty dialogues in diverse genres. Comput. Linguis. 28, 4, 447--485.
 
46
Zechner, K. and Waibel, A. 2000. Minimizing word error rate in textual summaries of spoken language. In Proceedings of the NAACL Conference. 186--193.
 
47
Zhu, X. and Penn, G. 2006. Summarization of spontaneous conversations. In Proceedings of the Interspeech Conference. 1531--1534.