|
ABSTRACT
We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). The video track is segmented by visual dissimilarities and changes in speaker gesturing, and augmented by representative key frames. An interactive user interface combines a visual representation of audio, video, text, key frames, and allows the user to navigate presentation videos. User studies with 176 students of varying knowledge were conducted on 7.5 hours of student presentation video (32 presentations). Tasks included searching for various portions of presentations, both known and unknown to students, and summarizing presentations given the annotations. The results are favorable towards the video summaries and the interface, suggesting faster responses by a factor of 20% compared to having access to the actual video. Accuracy of responses remained the same on average. Follow-up surveys present a number of suggestions towards improving the interface, such as the incorporation of automatic speaker clustering and identification, and the display of an abstract topological view of the presentation. Surveys also show alternative contexts in which students would like to use the tool in the classroom environment.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Gregory D. Abowd , Christopher G. Atkeson , Ami Feinstein , Cindy Hmelo , Rob Kooper , Sue Long , Nitin Sawhney , Mikiya Tani, Teaching and learning as multimedia authoring: the classroom 2000 project, Proceedings of the fourth ACM international conference on Multimedia, p.187-198, November 18-22, 1996, Boston, Massachusetts, United States
[doi> 10.1145/244130.244191]
|
| |
2
|
Amir, A., Berg, M., Chang, S.-F., Hsu, W., Iyengar, G., Lin, C.-Y., Naphade, M., Natsev, A., Neti, C., Nock, H., Smith, J.R., Tseng, B., Wu, Y., Zhang, D. IBM Research TRECVID-2003 Video Retrieval System. In Proceedings of the TREC Video Workshop (TRECVID '03) (Gaithersburg, MD, November 17"12, 2003)
|
 |
3
|
Arnon Amir , Savitha Srinivasan , Dulce Ponceleon , Dragutin Petkovic, CueVideo (demonstration abstract): automated video/audio indexing and browsing, Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, p.326, August 15-19, 1999, Berkeley, California, United States
[doi> 10.1145/312624.312759]
|
 |
4
|
|
| |
5
|
Chaisorn, L., Chua, T.S., Lee, C.H. The segmentation of News Video into Story Units. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2002) Lausanne, Switzerland, August 26"29, 2002). IEEE Computer Society Press, 2002, Volume 1, 73--76.
|
| |
6
|
Chen, S.S., Gopalakrishnan, P.S. Speaker, environment and channel detection and clustering via the Bayesian Information Criterion. In Proceedings of the 1998 DARPA Broadcast News Transcription and Understanding Workshop (Landsdowne, VA, February 28 March 3, 1998). 127--132.
|
| |
7
|
|
| |
8
|
Haubold, A., Kender, J.R. Analysis and Interface for Instructional Video. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME '03) (Baltimore, MD, July 6"9, 2003). IEEE Computer Society, 2003, Volume 2, 705--708.
|
| |
9
|
|
 |
10
|
Liwei He , Elizabeth Sanocki , Anoop Gupta , Jonathan Grudin, Auto-summarization of audio-video presentations, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.489-498, October 30-November 05, 1999, Orlando, Florida, United States
[doi> 10.1145/319463.319691]
|
 |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.1
Content Analysis and Indexing
Subjects:
Indexing methods
Additional Classification:
H.
Information Systems
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.1
Multimedia Information Systems
Subjects:
Video (e.g., tape, disk, DVI);
Evaluation/methodology
General Terms:
Algorithms,
Design,
Human Factors,
Management
Keywords:
audio,
browsing,
cross-reference,
gesture,
multimedia,
presentation video,
presenter,
segmentation,
speaker,
summarization,
text,
user interface,
user studies,
video library,
visualization
|