ACM Home Page
Please provide us with feedback. Feedback
Identifying relevant frames in weakly labeled videos for training concept detectors
Full text PdfPdf (7.01 MB)
Source
Conference On Image And Video Retrieval archive
Proceedings of the 2008 international conference on Content-based image and video retrieval table of contents
Niagara Falls, Canada
SESSION: Tagging, training and classification table of contents
Pages 9-16  
Year of Publication: 2008
ISBN:978-1-60558-070-8
Authors
Adrian Ulges  Technical University, Kaiserslautern, Germany
Christian Schulze  German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Daniel Keysers  German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Thomas Breuel  DFKI and Technical University - Kaiserslautern, Kaiserslautern, Germany
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 14,   Downloads (12 Months): 153,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1386352.1386358
What is a DOI?

ABSTRACT

A key problem with the automatic detection of semantic concepts (like 'interview' or 'soccer') in video streams is the manual acquisition of adequate training sets. Recently, we have proposed to use online videos downloaded from portals like youtube.com for this purpose, whereas tags provided by users during video upload serve as ground truth annotations.

The problem with such training data is that it is weakly labeled: Annotations are only provided on video level, and many shots of a video may be "non-relevant", i.e. not visually related to a tag. In this paper, we present a probabilistic framework for learning from such weakly annotated training videos in the presence of irrelevant content. Thereby, the relevance of keyframes is modeled as a latent random variable that is estimated during training.

In quantitative experiments on real-world online videos and TV news data, we demonstrate that the proposed model leads to a significantly increased robustness with respect to irrelevant content, and to a better generalization of the resulting concept detectors.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Borth, A. Ulges, C. Schulze, and T. Breuel. Keyframe Extraction for Video Tagging and Summarization. In GI--Informatiktage, 2008.
 
2
M. Campbell, A. Haubold, M. Liu, A. Natsev, J. Smith, and J. Tesic. IBM Research TRECVID--2007 Video Retrieval System. In TRECVID Workshop, Gaithersburg, USA, November 2007.
 
3
A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.
 
4
 
5
 
6
 
7
R. Fergus, P. Perona, and A. Zisserman. Object Class Recognition by Unsupervised Scale-Invariant Learning. In CVPR, pages 264--271, 2003.
 
8
 
9
 
10
K. Mikolajczyk, R. Mohr, and C. Bauckhage. Evaluation of Interest Point Detectors. Intern. J. Compt. Vis., 37(2):1--38, 2007.
 
11
K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. In CVPR, pages 257--263, 2007.
 
12
 
13
J. Philbin, O. Chum, J. Sivic, V. Ferrari, M. Marin, A. Bosch, N. Apostolof, and A. Zisserman. Oxford TRECVID 2007, Notebook paper. In TRECVID Workshop, 2007.
 
14
C. Rosenberg and M. Hebert. Training Object Detection Models with Weakly Labeled Data. In BMVC, 2002.
 
15
 
16
C. G. M. Snoek, I. Everts, J. C. van Gemert, J.-M. Geusebroek, B. Huurnink, D. C. Koelma, M. van Liempt, O. de Rooij, K. E. A. van de Sande, A. W. M. Smeulders, J. R. R. Uijlings, and M. Worring. The MediaMill TRECVID 2007 Semantic Video Search Engine. In TRECVID Workshop, November 2007.
 
17
H. Tamura, S. Mori, and T. Yamawaki. Textural Features Corresponding to Visual Perception. IEEE Trans. on Sys., Man, Cybern., 6(8):460--472, 1978.
 
18
A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. Content-Based Video Tagging for Online Video Portals. In MUSCLE/Image-CLEF Workshop, Budapest, 2007.
 
19
A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. A System that Learns to Tag Videos by Watching Youtube. In ICVS (accepted for publication), 2008.
 
20
 
21


Collaborative Colleagues:
Adrian Ulges: colleagues
Christian Schulze: colleagues
Daniel Keysers: colleagues
Thomas Breuel: colleagues