|
||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||
ABSTRACT
This paper describes our current effort on analyzing the contents of discussion scenes in instructional videos based on a clustering technique. Specifically, given a discussion scene pre-detected from an education or training video, we first apply a mode-based clustering approach to group all speech segments into an optimal number of clusters where each cluster contains speech from one speaker; we then analyze the discussion patterns in the scene, and subsequently classify it into either a 2-speaker or multi-speaker discussion. Encouraging classification results have been achieved on 122 discussion scenes detected from five IBM MicroMBA videos. Moreover, we have also observed fairly good performance on the speaker clustering scheme, which demonstrates the superiority of the proposed clustering approach. Undoubtedly, the discussion scene information output from this analysis scheme would facilitate the content browsing, searching and understanding of instructional videos. REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
INDEX TERMS
Primary Classification:
General Terms:
Keywords:
|
||||||||||||||||||||||||||||||||||||||||||||||