|
ABSTRACT
Transductive video concept detection is an effective way to handle the lack of sufficient labeled videos. However, another issue, the multi-label interdependence, is not essentially addressed in the existing transductive methods. Most solutions only applied the transductive single-label approach to detect each individual concept separately, but ignoring the concept relation, or simply imposed the smoothness assumption over the multiple labels for each video, without indeed exploring the interdependence between the concepts. On the other hand, the semi-supervised extension of supervised multi-label classifiers, such as correlative multi-label support vector machines, is usually intractable and hence impractical due to the quite expensive computational cost. In this paper, we propose an effective transductive multi-label classification approach, which simultaneously models the labeling consistency between the visually similar videos and the multi-label interdependence for each video in an integrated framework. We compare the performance between the proposed approach and several representative transductive single-label and supervised multi-label classification approaches for the video concept detection task over the widely-used TRECVID data set. The comparative results demonstrate the superiority of the proposed approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Y. Altun, D. A. McAllester, and M. Belkin. Maximum Margin Semi-Supervised Learning for Structured Variables. In NIPS, 2005.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
G. Chen, Y. Song, F. Wang, and C. Zhang. Semi-Supervised Multi-Label Learning by Solving a Sylvester Equation. In SDM, 2008.
|
| |
7
|
K. Duh and K. Kirchhoff. Structured Multi-Label Transductive Learning. In NIPS Workshop on Advances in Structured Learning for Text/Speech Processing, 2005.
|
 |
8
|
Jingrui He , Mingjing Li , Hong-Jiang Zhang , Hanghang Tong , Changshui Zhang, Manifold-ranking based image retrieval, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027531]
|
| |
9
|
W. Jiang, S.-F. Chang, and A. C. Loui. Active Context-Based Concept Fusion with Partial User Labels. In ICIP, pages 2917--2920, 2006.
|
| |
10
|
|
| |
11
|
|
| |
12
|
H. H. Ku and S. Kullback. Approximating Discrete Probability Distributions. IEEE Transactions on Information Theory, IT-15(4):444--447, 1969.
|
| |
13
|
C.-H. Lee, S. Wang, F. Jiao, D. Schuurmans, and R. Greiner. Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields. In NIPS, pages 793--800, 2006.
|
| |
14
|
Y. Liu, R. Jin, and L. Yang. Semi-supervised Multi-label Learning by Constrained Non-Negative Matrix Factorization. In AAAI, 2006.
|
| |
15
|
M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A Light Scale Concept Ontology for Multimedia Understanding for TRECVID 2005. In IBM Research Report RC23612 (W0505--104), 2005.
|
| |
16
|
A. Y. Ng, M. I. Jordan, and Y. Weiss. On Spectral Clustering: Analysis and an Algorithm. In NIPS, pages 849--856, 2001.
|
 |
17
|
Guo-Jun Qi , Xian-Sheng Hua , Yong Rui , Jinhui Tang , Tao Mei , Hong-Jiang Zhang, Correlative multi-label video annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291245]
|
 |
18
|
|
 |
19
|
Jinhui Tang , Xian-Sheng Hua , Guo-Jun Qi , Meng Wang , Tao Mei , Xiuqing Wu, Structure-sensitive manifold ranking for video concept detection, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291430]
|
| |
20
|
TRECVID2005. http://www-nlpir.nist.gov/projects/trecvid/.
|
 |
21
|
Meng Wang , Xian-Sheng Hua , Xun Yuan , Yan Song , Li-Rong Dai, Optimizing multi-graph learning: towards a unified video annotation scheme, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
[doi> 10.1145/1291233.1291431]
|
 |
22
|
Meng Wang , Xian-Sheng Hua , Yan Song , Xun Yuan , Shipeng Li , Hong-Jiang Zhang, Automatic video annotation by semi-supervised learning with kernel density estimation, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180855]
|
| |
23
|
|
| |
24
|
R. Yan, M. yu Chen, and A. G. Hauptmann. Mining Relationship Between Video Concepts using Probabilistic Graphical Models. In ICME, pages 301--304, 2006.
|
| |
25
|
A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. Technical report, Columbia University, March 2007.
|
| |
26
|
Z.-J. Zha, T. Mei, J. Wang, Z. Wang, and X.-S. Hua. Graph-Based Semi-Supervised Learning with Multi-Label. In ICME, pages 1321--1324, 2008.
|
| |
27
|
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf. Learning with Local and Global Consistency. In NIPS, 2003.
|
| |
28
|
X. Zhu. Semi-supervied Learning Literature Survey. Computer Sciences Technical Report, 1530, University of Wisconsin-Madison, 2007.
|
| |
29
|
X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In ICML, pages 912--919, 2003.
|
 |
30
|
|
CITED BY 2
|
|
Teng Li , Shuicheng Yan , Tao Mei , In-So Kweon, Local-driven semi-supervised learning with multi-label, Proceedings of the 2009 IEEE international conference on Multimedia and Expo, p.1508-1511, June 28-July 03, 2009, New York, NY, USA
|
|
|
Jialie Shen , Dacheng Tao , Xuelong Li, Robust semantic concept detection in large video collections, Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics, p.635-638, October 11-14, 2009, San Antonio, TX, USA
|
|