| Fast unsupervised alignment of video and text for indexing/names and faces |
| Full text |
Pdf
(735 KB)
|
Source
|
International Multimedia Conference
archive
Workshop on multimedia information retrieval on The many faces of multimedia semantics
table of contents
Augsburg, Bavaria, Germany
SESSION: Semantics of video
table of contents
Pages: 57 - 64
Year of Publication: 2007
ISBN:978-1-59593-782-7
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 38, Citation Count: 0
|
|
|
ABSTRACT
We propose a novel way of combining weakly associated video/audio and text steams in an unsupervised manner which is faster than conventional speech recognition. The technique aligns audio/video and text streams which will enable video search using the associated text. Multimedia of this form includes news broadcast with summaries, parliament proceedings and court trials with transcripts, sports telecast with text commentary, etc. We also show how we can annotate the video with the names of the person appearing in the video which will allow name based indexing/search. We test the technique on a 80 minute video segment downloaded from the website of the International Court of the Former Yugoslavia, with the corresponding transcripts. The proposed technique achieves 88.49% accuracy on sentence level alignments and 95.5% accuracy on the task of assigning names to faces.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White, Y. W. Teh, E. Learned-Miller, and D. A. Forsyth. Names and faces in the news. In computer Vision and Pattern Recognition, pages 848--854, 2004.
|
| |
4
|
|
| |
5
|
M. Everingham, J. Sivic, and A. Zisserman. Hello! my name is... buffy -- automatic naming of characters in tv video. In Proceedings of the British Machine Vision Conference, 2006.
|
| |
6
|
The hidden markov model toolkit (htk), machine intelligence laboratory, cambridge university engineering department,http://htk.eng.cam.ac.uk/.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
K. Mikolajczyk, R. Choudhury, and C. Schmid. Face detection in a video sequence: A temporal approach. In CVPR01, pages II:96--101, 2001.
|
| |
11
|
Mplayer - the movie player, http://www.mplayerhq.hu/design7/info.html.
|
| |
12
|
Open computer vision library, http://sourceforge.net/projects/opencvlibrary/.
|
| |
13
|
Sox - sound exchange, http://sox.sourceforge.net/.
|
| |
14
|
Cmusphinx: The carnegie mellon sphinx project, http://cmusphinx.sourceforge.net/html/cmusphinx.php.
|
| |
15
|
M. Turk and A. Pentland. Face recognition using eigenfaces. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, page 586âAŞ591, 1991.
|
| |
16
|
A. Venkataraman, A. Stolcke, W. Wang, D. Vergyri, V. R. R. Gadde, and J. Zheng. An efficient repair procedure for quick transcriptions. In Proceedings of ICSLP, 2000.
|
| |
17
|
|
| |
18
|
|
| |
19
|
Q. Zhang and S. Goldman. Em-dd: An improved multiple-instance learning technique, 2001.
|
|