|
ABSTRACT
Semantic video indexing is critical for practical video retrieval systems and a generic and scalable indexing framework is a must for indexing a large semantic lexicon with over 1000 concepts present. This paper fully explores the idea of incorporating many kinds of diverse features into a single framework, combining them altogether to obtain larger degree of invariance which is absent in any of the component features, and thus achieves genericness and scalability. We scale down the formidable computational expense with a clever design of the classification and fusion schemes. To be specific, ~20 kinds of diverse features are extracted to capture limited yet complementary variance in color, texture and edge with spatial constraints implicitly integrated, and over 100 classifiers are built subsequently and fused to produce a generic detector. The extensive experiments on a total of 310 hours of TRECVID news videos show that the proposed framework yields significantly improved performance over that of the best single feature across a variety of concepts. Moreover, a benchmark comparison demonstrates that this approach is state-of-the-art. Meanwhile, the proposed approach generalizes well over previously unseen programs and stations and scales well to a lexicon of over 300 concepts in the LSCOM [18] ontology.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Amir, J. Argillandery, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade, A. P. Natsev, J. R. Smith, J. Tešić, and T. Volkmer. Ibm research trecvid-2005 video retrieval system. In Proc. of TRECVID workshop, 2006.
|
| |
2
|
A. Amir and et al. Ibm research trecvid-2003 video retrieval system. In Proc. of TRECVID workshop, 2004.
|
 |
3
|
|
| |
4
|
H. Bay, T. Tuytelaars, and L. Gool. Surf: Speeded up robust features. In Proc. of ECCV 2006.
|
| |
5
|
S.-F. Chang, W. Hsu, W. Jiang, L. Kennedy, D. Xu, A. Yanagawa, , and E. Zavesky. Columbia university trecvid-2006 video search and high-level feature extraction. In Proc. of TRECVID workshop, 2007.
|
| |
6
|
S.-F. Chang, W. Hsu, W. Jiang, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky. Evaluating the impact of 374 visualbased lscom concept detectors on automatic search. www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.
|
| |
7
|
S.-F. Chang, W. Hsu, L. Kennedy, L. Xie, A. Yanagawa, E. Zavesky, and D.-Q. Zhang. Columbia university trecvid-2005 video search and high-level feature extraction. In Proc. of TRECVID workshop, 2007.
|
| |
8
|
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, at ECCV, 2004.
|
| |
9
|
|
| |
10
|
|
| |
11
|
J. Fan, A. Elmagarmid, X. Zhu, W. Aref, and L. Wu. Classview: hierarchical video shot classification, indexing, and accessing. IEEE Trans. Multimedia, 6(1):70--86, 2004.
|
| |
12
|
|
| |
13
|
A. Hauptmann, M.-Y. Chen, M. Christel, W.-H. Lin, R. Yan, and J. Yang. Multi-lingual broadcast news retrieval. In Proc. of TRECVID workshop, 2007.
|
| |
14
|
A. Hauptmann, M. Christel, R. Concescu, J. Gao, Q. Jin, W.-H. Lin, J.-Y. Pan, S. M. Stevens, R. Yan, J. Yang, and Y. Zhang. Cmu informediaaŕs trecvid 2005 skirmishes. In Proc. of TRECVID workshop, 2006.
|
| |
15
|
W. Jiang, S.-F. Chang, and A. C. Loui. Context-based concept fusion with boosted conditional random fields. In Proc. of ICASSP, Hawaii, USA, April 2007.
|
 |
16
|
Xirong Li , Dong Wang , Jianmin Li , Bo Zhang, Video search in concept subspace: a text-like paradigm, Proceedings of the 6th ACM international conference on Image and video retrieval, p.603-610, July 09-11, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1282280.1282366]
|
 |
17
|
Xiaobing Liu , Dong Wang , Jianmin Li , Bo Zhang, The feature and spatial covariant kernel: adding implicit spatial constraints to histogram, Proceedings of the 6th ACM international conference on Image and video retrieval, p.565-572, July 09-11, 2007, Amsterdam, The Netherlands
[doi> 10.1145/1282280.1282361]
|
| |
18
|
Milind Naphade , John R. Smith , Jelena Tesic , Shih-Fu Chang , Winston Hsu , Lyndon Kennedy , Alexander Hauptmann , Jon Curtis, Large-Scale Concept Ontology for Multimedia, IEEE MultiMedia, v.13 n.3, p.86-91, July 2006
[doi> 10.1109/MMUL.2006.63]
|
| |
19
|
M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for trecvid 2005. 2005. www-nlpir.nist.gov/projects/ tv2005/LSCOMlite_NKKCSOH.pdf.
|
 |
20
|
|
| |
21
|
J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, 1998.
|
| |
22
|
J. Platt. Advances in Large Margin Classifiers, chapter Probabilities for SV machines, pages 61--74. MIT Press, 2000.
|
| |
23
|
M. Riesenhuber and T. Poggio. Hierachical models of object recognition in cortex. Nature Neuroscience, 2(11):1019--1025, 1999.
|
| |
24
|
|
 |
25
|
|
| |
26
|
C. Snoek, J. van Gemert, J. Geusebroek, B. Huurnink, D. Koelma, G. Nguyen, O. de Rooij, F. Seinstra, A. Smeulders, C. Veenman, and M. Worring. The mediamill trecvid 2005 semantic video search engine. In Proc. of TRECVID workshop, 2006.
|
| |
27
|
C. Snoek, J. van Gemert, T. Gevers, B. Huurnink, D. Koelma, M. van Liempt, O. de Rooij, K. van de Sande, F. Seinstra, A. Smeulders, A. Thean, C. Veenman, and M. Worring. The mediamill trecvid 2006 semantic video search engine. In Proc. of TRECVID workshop, 2007.
|
| |
28
|
Cees G. M. Snoek , Marcel Worring , Jan-Mark Geusebroek , Dennis C. Koelma , Frank J. Seinstra , Arnold W. M. Smeulders, The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.28 n.10, p.1678-1689, October 2006
[doi> 10.1109/TPAMI.2006.212]
|
| |
29
|
|
| |
30
|
D. Wang, J. Li, and B. Zhang. Relay boost fusion for learning rare concepts in multimedia. CIVR 2006.
|
 |
31
|
|
| |
32
|
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classifcation of texture and object categories: An in-depth study. Technical Report RR-5737, INRIA Rhône-Alpes, 2005.
|
CITED BY 9
|
|
|
|
|
|
|
|
Guo-Jun Qi , Xian-Sheng Hua , Yong Rui , Jinhui Tang , Tao Mei , Meng Wang , Hong-Jiang Zhang, Correlative multilabel video annotation with temporal kernels, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.5 n.1, p.1-27, October 2008
|
|
|
|
|
|
|
|
|
Dong Wang , Zhikun Wang , Jianmin Li , Bo Zhang , Xirong Li, Query representation by structured concept threads with application to interactive video retrieval, Journal of Visual Communication and Image Representation, v.20 n.2, p.104-116, February, 2009
|
|
|
|
|
|
|
|
|
Meng Wang , Xian-Sheng Hua , Richang Hong , Jinhui Tang , Guo-Jun Qi , Yan Song, Unified video annotation via multigraph learning, IEEE Transactions on Circuits and Systems for Video Technology, v.19 n.5, p.733-746, May 2009
|
|