ACM Home Page
Please provide us with feedback. Feedback
Analyzing discussion scene contents in instructional videos
Full text PdfPdf (142 KB)
Source International Multimedia Conference archive
Proceedings of the 12th annual ACM international conference on Multimedia table of contents
New York, NY, USA
POSTER SESSION: Technical poster session 1: multimedia analysis, processing, and retrieval table of contents
Pages: 264 - 267  
Year of Publication: 2004
ISBN:1-58113-893-8
Authors
Ying Li  IBM T.J. Watson Research Center, NY
Chitra Dorai  IBM T.J. Watson Research Center, NY
Sponsors
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 8,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1027527.1027587
What is a DOI?

ABSTRACT

This paper describes our current effort on analyzing the contents of discussion scenes in instructional videos based on a clustering technique. Specifically, given a discussion scene pre-detected from an education or training video, we first apply a mode-based clustering approach to group all speech segments into an optimal number of clusters where each cluster contains speech from one speaker; we then analyze the discussion patterns in the scene, and subsequently classify it into either a 2-speaker or multi-speaker discussion. Encouraging classification results have been achieved on 122 discussion scenes detected from five IBM MicroMBA videos. Moreover, we have also observed fairly good performance on the speaker clustering scheme, which demonstrates the superiority of the proposed clustering approach. Undoubtedly, the discussion scene information output from this analysis scheme would facilitate the content browsing, searching and understanding of instructional videos.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Chen and P. S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," Proc. of DARPA Broadcast News Transcription and Understanding Workshop, 1998.
 
2
H. Jin, F. Kubala, and R. Schwartz, "Automatic speaker clustering," Proc. of the Speech Recognition Workshop, Chantilly, February 1997.
 
3
M. Siegler and et al., "Automatic segmentation, classification, and clustering of broadcast news," Proc. of Speech Recognition Workshop, 1997.
 
4
A. Divakaran, Video Summarization Using MPEG-7 Motion Activity and Audio Descriptors. Video Mining, Kluwer Academic Publishers, 2003.
 
5
Y. Li and C. Dorai, "SVM-based audio classification for instructional video analysis," ICASSP'04, 2004.
 
6
F. Zheng and et al., "The distance measure for line spectrum pairs applied to speech recognition," ICSLP'98, 1998.
 
7
Y. Li and C. Dorai, "Detecting discussion scenes in instructional video," ICME'04, 2004.
 
8
 
9
C. Sugar and G. James, "Finding the number of clusters in a data set: An information theoretic approach," Journal of the American Statistical Association, pp. 750--763, 1998.