|
ABSTRACT
The huge amount of videos currently available poses a difficult problem in semantic video retrieval. The success of query-by-concept, recently proposed to handle this problem, depends greatly on the accuracy of concept-based video indexing. This paper describes a multi-cue fusion approach toward improving the accuracy of semantic video indexing. This approach is based on a unified framework that explores and integrates both contextual correlation among concepts and temporal dependency among shots. The framework is novel in two ways. First, a recursive algorithm is proposed to learn both inter-concept and inter-shot relationships from ground-truth annotations of tens of thousands of shots for hundreds of concepts. Second, labels for all concepts and all shots are solved simultaneously through optimizing a graphical model. Experiments on the widely used TRECVID 2006 data set show that our framework is effective for semantic concept detection in video, achieving around a 30% performance boost on two popular benchmarks, VIREO-374 and Columbia374, in inferred average precision.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
W. Adams, G. Iyengar, C.-Y. Lin, M. Naphade, C. Neti, H. Nock, and J. Smith. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP JASP, 2003(2):170--185, 2003.
|
| |
2
|
A. Amir et al. IBM research TRECVID-2005 video retrieval system. In Proc. of TREC Video Retrieval Evaluation, 2005.
|
| |
3
|
J. Cao et al. Intelligent multimedia group of Tsinghua University at TRECVid 2006. In Proc. of TREC Video Retrieval Evaluation, 2006.
|
| |
4
|
S.-F. Chang, W.-Y. Ma, and A. Smeulders. Recent advances and challenges of semantic image/video search. In Proc. of ICASSP, 2007.
|
 |
5
|
Ritendra Datta , Dhiraj Joshi , Jia Li , James Z. Wang, Image retrieval: Ideas, influences, and trends of the new age, ACM Computing Surveys (CSUR), v.40 n.2, p.1-60, April 2008
[doi> 10.1145/1348246.1348248]
|
| |
6
|
W. Jiang, S.-F. Chang, and A. Loui. Context-based concept fusion with boosted conditional random fields. In Proc. of ICASSP, 2007.
|
| |
7
|
W. Jiang, S.-F. Chang, and A. C. Loui. Active context-based concept fusion with partial user labels. In Proc. of ICIP, 2006.
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
Michael S. Lew , Nicu Sebe , Chabane Djeraba , Ramesh Jain, Content-based multimedia information retrieval: State of the art and challenges, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.2 n.1, p.1-19, February 2006
[doi> 10.1145/1126004.1126005]
|
| |
12
|
K.-H. Liu, M.-F. Weng, C.-Y. Tseng, Y.-Y. Chuang, and M.-S. Chen. Association and temporal rule mining for post-processing of semantic concept detection in video. IEEE TMM, 10(2):240--251, 2008.
|
| |
13
|
LSCOM lexicon definitions and annotations version 1.0, DTO challenge workshop on large scale concept ontology for multimedia. Technical report, Columbia University, March 2006.
|
| |
14
|
M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE TCSVT, 12(1):40--52, Jan 2002.
|
| |
15
|
H. Naphide and T. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE TMM, 3(1):141--151, Mar. 2001.
|
| |
16
|
W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.
|
 |
17
|
Guo-Jun Qi , Xian-Sheng Hua , Yong Rui , Jinhui Tang , Tao Mei , Hong-Jiang Zhang, Correlative multi-label video annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291245]
|
| |
18
|
|
 |
19
|
|
| |
20
|
|
| |
21
|
|
 |
22
|
Cees G. M. Snoek , Marcel Worring , Jan C. van Gemert , Jan-Mark Geusebroek , Arnold W. M. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180727]
|
| |
23
|
|
| |
24
|
B. Tseng, C.-Y. Lin, M. Naphade, A. Natsev, and J. Smith. Normalized classifier fusion for semantic visual concept detection. In Proc. of ICIP, 2003.
|
| |
25
|
M.-F. Weng et al. The NTU toolkit and framework for high-level feature detection at TRECVID 2007. In Proc. of TREC Video Retrieval Evaluation, 2007.
|
| |
26
|
R. Yan, M.-Y. Chen, and A. Hauptmann. Mining relationship between video concepts using probabilistic graphical models. In Proc. of ICME, 2006.
|
| |
27
|
A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia university's baseline detectors for 374 LSCOM semantic visual concepts. Technical report, Columbia University, March 2007.
|
 |
28
|
|
 |
29
|
|
|