|
ABSTRACT
In this paper, we present a model based on a multi-resolution, multi-source and multi-modal (M3) bootstrapping framework that exploits knowledge of sub-domains for concept detection in news video. Because the characteristics and distributions of data in different sub-domains are different, we model and analyze the video in each sub-domain separately using a transductive framework. Along with this framework, we propose a "pseudo-Vapnik combined error bound" to tackle the problem of imbalanced distribution of training data in certain segments of sub-domains. For effective fusion of multi-modal features, we utilize multi-resolution inference and constraints to permit evidences from different modal features to support each other. Finally, we employ a bootstrapping technique to leverage unlabeled data to boost the overall system performance. We test our framework by detecting semantic concepts in the TRECVID 2004 dataset. Experimental results demonstrate that our approach is effective.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Amir et al, "IBM research TRECVID 2005 video retrieval system", Proceedings of TRECVID 2005, Gaithersburg, MD, November 2005 available at: http://www-nlpir.nist.gov/projects/tvpubs/tv5.papers/
|
| |
2
|
L. Chaisorn, "A Hierarchical Multi-Modal approach to story segmentation in news video", PhD thesis in National University of Singapore, 2004
|
| |
3
|
S. F. Chang, R. Manmatha, and T. S. Chua, "Combining text and audio-visual features in video indexing", Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1005--1008, 2005
|
 |
4
|
Tat-Seng Chua , Shih-Fu Chang , Lekha Chaisorn , Winston Hsu, Story boundary detection in large broadcast news video archives: techniques, experience and trends, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027679]
|
| |
5
|
T. S. Chua et al, "TRECVID 2004 Search and Feature Extraction Task by NUS PRIS" Proceedings of (VIDEO) TRECVID 2004, Gaithersburg, MD, November 2004, available at : http://www-nlpir.nist.gov/projects/tvpubs/
|
| |
6
|
T. S. Chua et al, "TRECVID 2005 by NUS PRIS", Proceeding of TRECVID 2005, Gaithersburg, MD, November 2005, available at http://www-nlpir.nist.gov/projects/tvpubs/
|
 |
7
|
|
| |
8
|
|
| |
9
|
A. Hauptmann et al, "Multi-Lingual Broadcast News Retrieval" Proceedings of TRECVID 2006 available at: http://www-nlpir.nist.gov/projects/tvpubs/
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
M. Lan, C. L. Tan and H. B. Low "Proposing a new term weighting scheme for text categorization", Proceeding of the 21st National Conference on Artificial Intelligence, AAAI-2006
|
| |
14
|
|
 |
15
|
|
| |
16
|
G. J. Qi, X. S. Hua, Y. Song, J. H. Tang, H. J. Zhang, "Transductive Inference with Hierarchical Clustering for Video Annotation" International Conference on Multimedia and Expo, pp.643--646, 2007
|
 |
17
|
|
 |
18
|
Cees G. M. Snoek , Marcel Worring , Jan C. van Gemert , Jan-Mark Geusebroek , Arnold W. M. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180727]
|
| |
19
|
Q. Tian, J. Yu, Q. Xue, and N. Sebe, "A New Analysis of the Value of Unlabeled Data in Semi-Supervised Learning for Image Retrieval", Proceedings of IEEE International Conference on Multimedia and Expo (ICME 2004), Vol.2, pp.1019--1022, 2004.
|
| |
20
|
V. N. Vapnik, "Statistical learning theory", Wiley Interscience New York. pp.120--200, 1998,
|
| |
21
|
|
 |
22
|
|
| |
23
|
J. Yang, A. Hauptmann, M. Y. Chen, "Finding Person X: Correlating Names with Visual Appearances", International Conference on Image and Video Retrieval (CIVR'04), Dublin City University, Ireland, July 21--23, 2004
|
 |
24
|
|
| |
25
|
R. E. Yaniv, and L. Gerzon, "Effective Transductive Learning via PAC-Bayesian Model Selection." Technical Report CS-2004-05, IIT, 2004.
|
|