| Identifying relevant frames in weakly labeled videos for training concept detectors |
| Full text |
Pdf
(7.01 MB)
|
Source
|
Conference On Image And Video Retrieval
archive
Proceedings of the 2008 international conference on Content-based image and video retrieval
table of contents
Niagara Falls, Canada
SESSION: Tagging, training and classification
table of contents
Pages 9-16
Year of Publication: 2008
ISBN:978-1-60558-070-8
|
|
Authors
|
|
Adrian Ulges
|
Technical University, Kaiserslautern, Germany
|
|
Christian Schulze
|
German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
|
|
Daniel Keysers
|
German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
|
|
Thomas Breuel
|
DFKI and Technical University - Kaiserslautern, Kaiserslautern, Germany
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 14, Downloads (12 Months): 153, Citation Count: 1
|
|
|
ABSTRACT
A key problem with the automatic detection of semantic concepts (like 'interview' or 'soccer') in video streams is the manual acquisition of adequate training sets. Recently, we have proposed to use online videos downloaded from portals like youtube.com for this purpose, whereas tags provided by users during video upload serve as ground truth annotations. The problem with such training data is that it is weakly labeled: Annotations are only provided on video level, and many shots of a video may be "non-relevant", i.e. not visually related to a tag. In this paper, we present a probabilistic framework for learning from such weakly annotated training videos in the presence of irrelevant content. Thereby, the relevance of keyframes is modeled as a latent random variable that is estimated during training. In quantitative experiments on real-world online videos and TV news data, we demonstrate that the proposed model leads to a significantly increased robustness with respect to irrelevant content, and to a better generalization of the resulting concept detectors.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Borth, A. Ulges, C. Schulze, and T. Breuel. Keyframe Extraction for Video Tagging and Summarization. In GI--Informatiktage, 2008.
|
| |
2
|
M. Campbell, A. Haubold, M. Liu, A. Natsev, J. Smith, and J. Tesic. IBM Research TRECVID--2007 Video Retrieval System. In TRECVID Workshop, Gaithersburg, USA, November 2007.
|
| |
3
|
A. Dempster, N. Laird, and D. Rubin. Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
R. Fergus, P. Perona, and A. Zisserman. Object Class Recognition by Unsupervised Scale-Invariant Learning. In CVPR, pages 264--271, 2003.
|
| |
8
|
|
| |
9
|
|
| |
10
|
K. Mikolajczyk, R. Mohr, and C. Bauckhage. Evaluation of Interest Point Detectors. Intern. J. Compt. Vis., 37(2):1--38, 2007.
|
| |
11
|
K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. In CVPR, pages 257--263, 2007.
|
| |
12
|
|
| |
13
|
J. Philbin, O. Chum, J. Sivic, V. Ferrari, M. Marin, A. Bosch, N. Apostolof, and A. Zisserman. Oxford TRECVID 2007, Notebook paper. In TRECVID Workshop, 2007.
|
| |
14
|
C. Rosenberg and M. Hebert. Training Object Detection Models with Weakly Labeled Data. In BMVC, 2002.
|
| |
15
|
|
| |
16
|
C. G. M. Snoek, I. Everts, J. C. van Gemert, J.-M. Geusebroek, B. Huurnink, D. C. Koelma, M. van Liempt, O. de Rooij, K. E. A. van de Sande, A. W. M. Smeulders, J. R. R. Uijlings, and M. Worring. The MediaMill TRECVID 2007 Semantic Video Search Engine. In TRECVID Workshop, November 2007.
|
| |
17
|
H. Tamura, S. Mori, and T. Yamawaki. Textural Features Corresponding to Visual Perception. IEEE Trans. on Sys., Man, Cybern., 6(8):460--472, 1978.
|
| |
18
|
A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. Content-Based Video Tagging for Online Video Portals. In MUSCLE/Image-CLEF Workshop, Budapest, 2007.
|
| |
19
|
A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel. A System that Learns to Tag Videos by Watching Youtube. In ICVS (accepted for publication), 2008.
|
| |
20
|
|
| |
21
|
|
|