ACM Home Page
Please provide us with feedback. Feedback
Fast unsupervised alignment of video and text for indexing/names and faces
Full text PdfPdf (735 KB)
Source
International Multimedia Conference archive
Workshop on multimedia information retrieval on The many faces of multimedia semantics table of contents
Augsburg, Bavaria, Germany
SESSION: Semantics of video table of contents
Pages: 57 - 64  
Year of Publication: 2007
ISBN:978-1-59593-782-7
Authors
Subhransu Maji  University of California: Berkeley, Berkeley, CA
Ruzena Bajcsy  University of California: Berkeley, Berkeley, CA
Sponsors
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 38,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1290067.1290077
What is a DOI?

ABSTRACT

We propose a novel way of combining weakly associated video/audio and text steams in an unsupervised manner which is faster than conventional speech recognition. The technique aligns audio/video and text streams which will enable video search using the associated text. Multimedia of this form includes news broadcast with summaries, parliament proceedings and court trials with transcripts, sports telecast with text commentary, etc. We also show how we can annotate the video with the names of the person appearing in the video which will allow name based indexing/search. We test the technique on a 80 minute video segment downloaded from the website of the International Court of the Former Yugoslavia, with the corresponding transcripts. The proposed technique achieves 88.49% accuracy on sentence level alignments and 95.5% accuracy on the task of assigning names to faces.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
T. L. Berg, A. C. Berg, J. Edwards, M. Maire, R. White, Y. W. Teh, E. Learned-Miller, and D. A. Forsyth. Names and faces in the news. In computer Vision and Pattern Recognition, pages 848--854, 2004.
 
4
 
5
M. Everingham, J. Sivic, and A. Zisserman. Hello! my name is... buffy -- automatic naming of characters in tv video. In Proceedings of the British Machine Vision Conference, 2006.
 
6
The hidden markov model toolkit (htk), machine intelligence laboratory, cambridge university engineering department,http://htk.eng.cam.ac.uk/.
 
7
 
8
 
9
 
10
K. Mikolajczyk, R. Choudhury, and C. Schmid. Face detection in a video sequence: A temporal approach. In CVPR01, pages II:96--101, 2001.
 
11
Mplayer - the movie player, http://www.mplayerhq.hu/design7/info.html.
 
12
Open computer vision library, http://sourceforge.net/projects/opencvlibrary/.
 
13
Sox - sound exchange, http://sox.sourceforge.net/.
 
14
Cmusphinx: The carnegie mellon sphinx project, http://cmusphinx.sourceforge.net/html/cmusphinx.php.
 
15
M. Turk and A. Pentland. Face recognition using eigenfaces. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, page 586âAŞ591, 1991.
 
16
A. Venkataraman, A. Stolcke, W. Wang, D. Vergyri, V. R. R. Gadde, and J. Zheng. An efficient repair procedure for quick transcriptions. In Proceedings of ICSLP, 2000.
 
17
 
18
 
19
Q. Zhang and S. Goldman. Em-dd: An improved multiple-instance learning technique, 2001.

Collaborative Colleagues:
Subhransu Maji: colleagues
Ruzena Bajcsy: colleagues