|
ABSTRACT
We present a probabilistic ranking-driven classifier for the detection of video semantic concept, such as airplane, building, etc. Most existing concept detection systems utilize Support Vector Machines (SVM) to perform the detection and ranking of retrieved video shots. However, the margin maximization principle of SVM does not perform ranking optimization but merely classification error minimization. To tackle this problem, we exploit the sparse Bayesian kernel model, namely the relevance vector machine (RVM), as the classifier for semantic concept detection. Based on automatic relevance determination principle, RVM outputs the posterior probabilistic prediction of the semantic concepts. This inference output is optimal for ranking the target video shots, according to the Probabilistic Ranking Principle. The probability output of RVM on individual uni-modal features also facilitates probabilistic fusion of multi-modal evidences to minimize Bayes risk. We demonstrate both theoretically and empirically that RVM outperforms SVM for video semantic concept detection. The testings on TRECVID 07 dataset show that RVM produces statically significant improvements in MAP scores over the SVM-based methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Agarwal and W. Triggs. Hyperfeatures - multilevel local coding for visual recognition. In ECCV International Workshop on Statistical Learning in Computer Vision, 2006.
|
| |
2
|
|
| |
3
|
S.-F. Chang, W. Hsu, L. Kennedy, L. Xie, A. Yanagawa, E. Zavesky, and D.-Q. Zhang. Columbia university trecvid-2005 video search and high-level feature extraction. In TREC Video Retrieval Evaluation Proceedings, March 2006.
|
| |
4
|
T.-S. Chua, S.-Y. Neo, Y.-T Zheng, H.-K. Goh, Y. Xiao, S. Tang, and M. Zhao. Trecvid-2006 by nus-i2r. In TREC Video Retrieval Evaluation Proceedings, March 2006.
|
 |
5
|
|
| |
6
|
|
| |
7
|
M. Evans and T. Swartz. Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems. Statistical Science, 10:254--272, 1995.
|
 |
8
|
Stephan Fischer , Rainer Lienhart , Wolfgang Effelsberg, Automatic recognition of film genres, Proceedings of the third ACM international conference on Multimedia, p.295-304, November 05-09, 1995, San Francisco, California, United States
[doi> 10.1145/217279.215283]
|
| |
9
|
S. Gao and Q. Sun. Classifier optimization for multimedia semantic concept detection. In Proceedings of the International Conference on Multimedia and Expo (ICME), pages 61--74, 2006.
|
| |
10
|
A. G. Hauptmann, M.-Y. Chen, M. Christel, W.-H. Lin, R. Yan, and J. Yang. Multi-lingual broadcast news retrieval. In Proceedings of TREC Video Retrieval Evaluation, March 2006.
|
| |
11
|
Y.-G. Jiang, X. Wei, C.-W. Ngo, H.-K. Tan, W. Zhao, and X. Wu. Modeling local interest points for semantic detection and video search at trecvid 2006. In TREC Video Retrieval Evaluation Proceedings, March 2006.
|
| |
12
|
|
| |
13
|
HD. Le, S. Satoh, and T. Matsui. Nii-ism, japan at trecvid 2007: High level feature extraction. In TREC Video Retrieval Evaluation Proceedings, Nov 2007.
|
| |
14
|
|
| |
15
|
J. Luo, M. R. Boutell, R. T. Gray, and C. M. Brown. Image transform bootstrapping and its applications to semantic scene classification. IEEE Transactions on SMC, 35(3):563--570, 2005.
|
| |
16
|
T. Mei, X. Hua, W. Lai, , L. Yang, Z. Zha, Y. Liu, Z. Gu, G. Qi, M. Wang, J. Tang, , X. Yuan, Z. Lu, and J. Liu. Msra-ustc-sjtu at trecvid 2007: High-level feature extraction and search. In TREC Video Retrieval Evaluation Proceedings, Nov 2007.
|
 |
17
|
|
| |
18
|
|
| |
19
|
C. Ngo, Y. Jiang, X. Wei, F. Wang, W. Zhao, H. Tan, and X. Wu. Experimenting vireo-374: Bag-of-visual-words and visual-based ontology for semantic video indexing and search. In TREC Video Retrieval Evaluation Proceedings, Nov 2007.
|
| |
20
|
J. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classiers, MIT Press, pages 61--74, 2000.
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
 |
24
|
Cees G. M. Snoek , Marcel Worring , Jan C. van Gemert , Jan-Mark Geusebroek , Arnold W. M. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180727]
|
 |
25
|
|
 |
26
|
Cees G. M. Snoek , Marcel Worring , Jan C. van Gemert , Jan-Mark Geusebroek , Arnold W. M. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180727]
|
| |
27
|
M. Tipping. The relevance vector machine. In Advances in Neural Information Processing Systems, San Mateo, CA. Morgan Kaufmann, 2000.
|
| |
28
|
|
 |
29
|
|
 |
30
|
|
| |
31
|
J. Yuan, Z. Guo, L. Lv, W. Wan, T. Zhang, D. Wang, X. Liu, C. Liu, S. Zhu, D. Wang, Y. Pang, N. Ding, Y. Liu, J. Wang, X. Zhang, X. Tie, Z. Wang, H.Wang, T. Xiao, Y. Liang, J. Li, F. Lin, , B. Zhang, L. JianGuo, W. WeiXin, T. XiaoFeng, D. DaYong, C. YuRong, W. Tao, , and Z. Yimin. Thu and icrc at trecvid 2007. In TREC Video Retrieval Evaluation Proceedings, Nov 2007.
|
 |
32
|
|
| |
33
|
|
|