ACM Home Page
Please provide us with feedback. Feedback
Automatic generation of concise summaries of spoken dialogues in unrestricted domains
Full text PdfPdf (210 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
New Orleans, Louisiana, United States
Pages: 199 - 207  
Year of Publication: 2001
ISBN:1-58113-331-6
Author
Klaus Zechner  Carnegie Mellon Univ., Pittsburgh, PA
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 44,   Citation Count: 8
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/383952.383989
What is a DOI?

ABSTRACT

Automatic summarization of open domain spoken dialogues is a new research area. This paper introduces the task, the challenges involved, and presents an approach to obtain automatic extract summaries for multi-party dialogues of four different genres, without any restriction on domain. We address the following issues which are intrinsic to spoken dialogue summarization and typically can be ignored when summarizing written text such as newswire data: (i) detection and removal of speech disfluencies; (ii) detection and insertion of sentence boundaries; (iii) detection and linking of cross-speaker information units (question-answer pairs). A global system evaluation using a corpus of 23 relevance annotated dialogues containing 80 topical segments shows that for the two more informal genres, our summarization system using dialogue specific components significantly outperforms a baseline using TFIDF term weighting with maximum marginal relevance ranking (MMR).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
J. S. Garofolo, E. M. Voorhees, C. G. P. Auzanne, and V. M. Stanford. Spoken document retrieval: 1998 evaluation and investigation of new metrics. In Proceedings of the ESCA workshop: Accessing information in spoken audio, pages 1-7. Cambridge, UK, Apr. 1999.
 
4
 
5
 
6
 
7
C. Hori and S. Furui. Improvements in automatic speech summarization and evaluation methods. In Proceedings of ICSLP-00, Beijing, China, October, pages 326-329, 2000.
 
8
D. Jurafsky, R. Bates, N. Coccaro, R. Martin, M. Meteer, K. Ries, E. Shriberg, A. Stolcke, P. Taylor, and C. V. Ess-Dykema. SwitchBoard discourse language modeling project, final report. Research Note 30, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, 1998.
 
9
M. Kameyama, G. Kawai, and I. Arima. A real-time system for summarizing human-human spontaneous spoken dialogues. In Proceedings of the ICSLP-96, pages 681-684, 1996.
 
10
K. Koumpis and S. Renals. Transcription and summarization of voicemail speech. In Proceedings of ICSLP-00, Beijing, China, October, pages 688-91, 2000.
11
 
12
Linguistic Data Consortium. LDC. CallHome and CallFriend LVCSR databases, 1996.
 
13
Linguistic Data Consortium. LDC. Treebank-3: CD-ROM containing databases of dis uency annotated Switchboard transcripts (LDC99T42), 1999.
 
14
I. Mani, D. House, G. Klein, L. Hirschman, L. Obrst, T. Firmin, M. Chrzanowski, and B. Sundheim. The TIPSTER SUMMAC text summarization evaluation. Mitre Technical Report MTR 98W0000138, October 1998, 1998.
 
15
 
16
D. Marcu. Discourse trees are good indicators of importance in text. In Mani and Maybury {15}, pages 123-136.
 
17
M. Meteer, A. Taylor, R. MacIntyre, and R. Iyer. Dys uency annotation stylebook for the Switchboard corpus. Revised by Ann Taylor, June 1995, available on the LDC99T42 CD-ROM, published by LDC, 1995.
 
18
 
19
G. J. Rath, A. Resnick, and T. R. Savage. The formation of abstracts by the selection of sentences. American Documentation, 12(2):139-143, 1961.
 
20
 
21
R. L. Rose. The communicative value of filled pauses in spontaneous speech. PhD thesis, University of Birmingham, Birmingham, UK, 1998.
 
22
E. E. Shriberg. Preliminaries to a Theory of Speech Dis uencies. PhD thesis, University ofBerkeley, Berkeley, CA, 1994.
 
23
 
24
A. Stolcke, E. Shriberg, R. Bates, M. Ostendorf, D. Hakkani, M. Plauche, G. T. ur, and Y. Lu. Automatic detection of sentence boundaries and dis uencies based on recognized words. In Proceedings of the ICSLP-98, Sydney, Australia, December, volume 5, pages 2247-2250, 1998.
 
25
S. Teufel and M. Moens. Sentence extraction as a classification task. In ACL/EACL-97 Workshop on Intelligent and Scalable Text Summarization, Madrid, Spain, 1997.
 
26
R. Valenza, T. Robinson, M. Hickey, and R. Tucker. Summarisation of spoken audio through information extraction. In Proceedings of the ESCA workshop: Accessing information in spoken audio, pages 111-116. Cambridge, UK, Apr. 1999.
 
27
 
28
A. Waibel, M. Bett, and M. Finke. Meeting browser: Tracking and summarizing meetings. In Proceedings of the DARPA Broadcast News Workshop, 1998.
 
29
A. Waibel, M. Bett, F. Metze, K. Ries, T. Schaaf, T. Schultz, H. Soltau, H. Yu,and K. Zechner. Advances in automatic meeting record creation and access. In Proceedings of ICASSP-2001, Salt Lake City, UT, May, 2001.
30
 
31
K. Zechner. Automatic Summarization of Spoken Dialogues in Unrestricted Domains. PhD thesis, Language Technologies Institute, School of Computer Science, Carnegie Mellon University, forthcoming.
 
32
K. Zechner and A. Lavie. Increasing the coherence of spoken dialogue summaries by cross-speaker information linking. In Proceedings of the NAACL-01 Workshop on Automatic Summarization, Pittsburgh, PA, June, 2001.
 
33
 
34

CITED BY  8
 
 
 
 
 


Peer to Peer - Readers of this Article have also read: