ACM Home Page
Please provide us with feedback. Feedback
Correlative multilabel video annotation with temporal kernels
Full text PdfPdf (6.63 MB)
Source
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) archive
Volume 5 ,  Issue 1  (October 2008) table of contents
Article No. 3  
Year of Publication: 2008
ISSN:1551-6857
Authors
Guo-Jun Qi  University of Science and Technology of China, Anhui, China
Xian-Sheng Hua  Microsoft Corporation, Beijing, China
Yong Rui  Microsoft Corporation, Beijing, China
Jinhui Tang  University of Science and Technology of China, Anhui, China
Tao Mei  Microsoft Corporation, Beijing, China
Meng Wang  University of Science and Technology of China, Anhui, China
Hong-Jiang Zhang  Microsoft Corporation, Beijing, China
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 186,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1404880.1404883
What is a DOI?

ABSTRACT

Automatic video annotation is an important ingredient for semantic-level video browsing, search and navigation. Much attention has been paid to this topic in recent years. These researches have evolved through two paradigms. In the first paradigm, each concept is individually annotated by a pre-trained binary classifier. However, this method ignores the rich information between the video concepts and only achieves limited success. Evolved from the first paradigm, the methods in the second paradigm add an extra step on the top of the first individual classifiers to fuse the multiple detections of the concepts. However, the performance of these methods can be degraded by the error propagation incurred in the first step to the second fusion one. In this article, another paradigm of the video annotation method is proposed to address these problems. It simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multilabel (CML) method, which benefits from the compensation of complementary information between different labels. Furthermore, since the video clips are composed by temporally ordered frame sequences, we extend the proposed method to exploit the rich temporal information in the videos. Specifically, a temporal-kernel is incorporated into the CML method based on the discriminative information between Hidden Markov Models (HMMs) that are learned from the videos. We compare the performance between the proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. As to be shown, superior performance of the proposed method is gained.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Berg, B. A. 2004. Markov Chain Monte Carlo Simulations and Their Statistical Analysis. World Scientific.
 
2
 
3
Campbell, M., et al. 2006. Ibm research trecvid-2006 video retrieval system. TREC Video Retrieval Evaluation (TRECVID) Proceedings.
 
4
Chang, S.-F., et al. 2006. Columbia university trecvid-2006 video search and high-level feature extraction. In TREC Video Retrieval Evaluation (TRECVID) Proceedings.
 
5
 
6
 
7
Do, M. 2003. Fast approximation of kullback-leibler distance for dependence trees and hidden markov models. IEEE Signal Process. Lett. 10, 4, 115--118.
 
8
Ebadollahi, S., Xie, L., Chang, S.-F., and Smith, J. R. 2006. Visual event detection using multidimensional concept dynamics. In Proceedings of the IEEE International Conference on Multimedia and Expo.
 
9
Gauvain, J.-L. and Lee, C.-H. 1994. Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process. 2, 2, 291--298.
 
10
Godbole, S. and Sarawagi, S. 2004. Discriminative methods for multi-labeled classification. In Proceedings of the Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining.
 
11
Goldberger, J. and Aronowitz, H. 2005. A distance measure between gmms based on the unscented transform and its application to speaker recognition. In Proceedings of the International Conference on Spoken Language Processes.
 
12
Hauptmann, A. G., Chen, M.-Y., and Christel, M. 2004. Confounded expectations: Informedia at TRECVID 2004. In TREC Video Retrieval Evaluation (TRECVID) Proceedings.
 
13
Hauptmann, A. G., et al. 2006. Multi-lingual broadcast news retrieval. In TREC Video Retrieval Evaluation (TRECVID) Procedings.
 
14
Hauptmann, A. G., Yan, R., Lin, W.-H., Christel, M., and Wactlar, H. 2007. Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans. Multimed. 9, 5, 958--966.
 
15
Hua, X.-S., Mei, T., Lai, W., Wang, M., Tang, J., Qi, G.-J., Li, L., and Gu, Z. 2006. Microsoft reseach asia trecvid 2006 high-level feature extraction and rushes exploitation. In TREC Video Retrieval Evaluation (TRECVID) Proceedings.
 
16
Jiang, W., Chang, S.-F., and Loui, A. 2006. Active concept-based concept fusion with partial user labels. In Proceedings of the IEEE International Conference on Image Processing.
17
 
18
Koskela, M., Smeaton, A., and Laaksonen, J. 2007. Measuring concept similarities in multimedia ontologies: analysis and evaluations. IEEE Trans. Multimed. 9, 5, 912--922.
 
19
 
20
 
21
Liu, P., Soong, F. K., and Zhou, J.-L. 2007. Divergence-based similarity measure for spoken document retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
 
22
Marr, D. 1982. Vision. W. H. Freeman and Company.
 
23
Naphade, M. R., Kozintsev, I., and Huang, T. 2002. Factor graph framework for semantic video indexing. IEEE Trans. CSVT 12, 1 (Jan.).
 
24
 
25
Naphade, M. R. 2002. Statistical techniques in video data management. In Proceedings of the IEEE Workshop on Multimedia Signal Processing.
 
26
Naphade, M. R., Kennedy, L., Kender, J. R., Chang, S.-F., Smith, J. R., Over, P., and Hauptmann, A. G. 2005. A light scale concept ontology for multimedia understanding for TRECVID 2005. IBM Research Report RC23612 (W0505-104).
 
27
Nigam, K., Lafferty, J., and McCallum, A. 1999. Using maximum entropy for text classification. In Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering. 61--67.
 
28
Petersohn, C. 2004. Fraunhofer hhi at trecvid 2004: shot boundary detection system. In TREC Video Retrieval Evaluation (TRECVID) Proceedings.
 
29
Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.
30
 
31
 
32
33
34
35
36
 
37
 
38
 
39
Wu, Y., Tseng, B. L., and Smith, J. R. 2004. Ontology-based multi-classification learning for video concept detection. In Proceedings of the IEEE Internaional Conference on Multimedia and Expo.
 
40
Xie, L. and Chang, S.-F. 2002. Structural analysis of soccer video with hidden markov models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.
 
41
Yan, R., Chen, M.-Y., and Hauptmann, A. G. 2006. Discriminative random fields: A discriminative framework for contextual interaction in classification. In Proceedings of the IEEE Internaional Conference on Multimedia and Expo.
 
42
Yanagawa, A., Chang, S.-F., Kennedy, L., and Hsu, W. 2007. Columbia university's baseline detectors for 374 lscom semantic visual concepts. Tech. Rep. 222-2006-8, Columbia University ADVENT Technical Report. March. 20.
 
43
44
 
45



REVIEW

"Sebastien Lefevre : Reviewer"

Annotation of multimedia data is a very topical yet very challenging problem. Indeed, Web sites such as YouTube store terabytes or even petabytes of video data. To successfully enable user navigation or retrieval in these huge databases, some auto  more...

Collaborative Colleagues:
Guo-Jun Qi: colleagues
Xian-Sheng Hua: colleagues
Yong Rui: colleagues
Jinhui Tang: colleagues
Tao Mei: colleagues
Meng Wang: colleagues
Hong-Jiang Zhang: colleagues