ACM Home Page
Please provide us with feedback. Feedback
Augmented segmentation and visualization for presentation videos
Full text PdfPdf (248 KB)
Source International Multimedia Conference archive
Proceedings of the 13th annual ACM international conference on Multimedia table of contents
Hilton, Singapore
SESSION: Applications 1: media fusion for communication and presentation table of contents
Pages: 51 - 60  
Year of Publication: 2005
ISBN:1-59593-044-2
Authors
Alexander Haubold  Columbia University, New York, NY
John R. Kender  Columbia University, New York, NY
Sponsors
ACM: Association for Computing Machinery
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 59,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1101149.1101158
What is a DOI?

ABSTRACT

We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). The video track is segmented by visual dissimilarities and changes in speaker gesturing, and augmented by representative key frames. An interactive user interface combines a visual representation of audio, video, text, key frames, and allows the user to navigate presentation videos. User studies with 176 students of varying knowledge were conducted on 7.5 hours of student presentation video (32 presentations). Tasks included searching for various portions of presentations, both known and unknown to students, and summarizing presentations given the annotations. The results are favorable towards the video summaries and the interface, suggesting faster responses by a factor of 20% compared to having access to the actual video. Accuracy of responses remained the same on average. Follow-up surveys present a number of suggestions towards improving the interface, such as the incorporation of automatic speaker clustering and identification, and the display of an abstract topological view of the presentation. Surveys also show alternative contexts in which students would like to use the tool in the classroom environment.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Amir, A., Berg, M., Chang, S.-F., Hsu, W., Iyengar, G., Lin, C.-Y., Naphade, M., Natsev, A., Neti, C., Nock, H., Smith, J.R., Tseng, B., Wu, Y., Zhang, D. IBM Research TRECVID-2003 Video Retrieval System. In Proceedings of the TREC Video Workshop (TRECVID '03) (Gaithersburg, MD, November 17"12, 2003)
3
4
 
5
Chaisorn, L., Chua, T.S., Lee, C.H. The segmentation of News Video into Story Units. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2002) Lausanne, Switzerland, August 26"29, 2002). IEEE Computer Society Press, 2002, Volume 1, 73--76.
 
6
Chen, S.S., Gopalakrishnan, P.S. Speaker, environment and channel detection and clustering via the Bayesian Information Criterion. In Proceedings of the 1998 DARPA Broadcast News Transcription and Understanding Workshop (Landsdowne, VA, February 28 March 3, 1998). 127--132.
 
7
 
8
Haubold, A., Kender, J.R. Analysis and Interface for Instructional Video. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '03) (Baltimore, MD, July 6"9, 2003). IEEE Computer Society, 2003, Volume 2, 705--708.
 
9
10
11
12
 
13
 
14


Collaborative Colleagues:
Alexander Haubold: colleagues
John R. Kender: colleagues