ACM Home Page
Please provide us with feedback. Feedback
Automatic discovery of salient segments in imperfect speech transcripts
Full text PdfPdf (118 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the tenth international conference on Information and knowledge management table of contents
Atlanta, Georgia, USA
Session: Multimedia Information Processing table of contents
Pages: 490 - 497  
Year of Publication: 2001
ISBN:1-58113-436-3
Authors
Dulce Ponceleon  IBM Almaden Research Center, San Jose, CA
Savitha Srinivasan  IBM Almaden Research Center, San Jose, CA
Sponsors
SIGMIS: ACM Special Interest Group on Management Information Systems
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 25,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/502585.502668
What is a DOI?

ABSTRACT

This paper addresses the problem of automatic detection of salient video segments for real-world applications such as corporate training based on associated speech transcriptions. We present a novel segmentation algorithm based on automatic speech recognition (ASR) applied to the audio track of the video. Our feature set consists of word n-grams extracted from the imperfect speech transcriptions. We use a two-pass algorithm that combines a boundary-based method with a content-based method. In the first pass, we analyze the temporal distribution and the rate of arrival of features to compute an initial segmentation. In the second pass, we detect changes in content-bearing words by using the content-bearing features as queries in an information retrieval system. The content-based second pass validates the initial segments and merges them as needed. Variations in the structure of the audio/video content, and the accuracy of ASR have an impact on the feasibility of the segmentation task. For realistic data we observe that we can identify content-rich segments of the audio. In the best scenario a high-level table-of-contents is generated and in the worse scenario a single salient segment is identified. We illustrate the algorithm in detail with some examples and validate the data with manual segmentation boundaries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Allan, J., et al., Topic Detection and Tracking Pilot Study Final Report, Proc. of the DARPA Broadcast News Transcription and Understanding Workshop, February 1998.
 
2
Bach, J.R., et al., Virage image search engine: An open framework for image management, Proc. of SPIE Storage and Retrieval for Still Images and Video Databases IV, Vol. 2670, IS&T/SPIE, February 1996. http://www.virage.com
3
 
4
Eichmann, D., et al., A cluster-based approach to tracking, detection and segmentation of Broadcast News, TDT Evaluation, NIST's 1999.
 
5
Fiscus, J.G., et al., TDT Evaluation, NIST's 1998.
 
6
7
 
8
 
9
Johnson, et al., Spoken Document Retrieval for TREC-7 at Cambridge University, Proc. of the 7th Text Retrieval Conference (TREC-7), 1998.
 
10
Justeson, J.S. and Slava K., Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text, in Natural Language Engineering, 1, pp 9-27, 1995.
 
11
Loach, P.D. and Wathen, A.J., On best least-squares approximation of continuous functions using linear splines with free knots, IMA J. Numerical Analysis, 11, pp. 393-409, 1991.
 
12
Schwartz, R., et. al., A Maximum Likelihood Model for Topic Classification in Broadcast News, Eurospeech, Fifth European Conf. on Speech Communication and Technology, September 1997.
13


Collaborative Colleagues:
Dulce Ponceleon: colleagues
Savitha Srinivasan: colleagues