| Correlative multi-label video annotation |
| Full text |
Mov
(56:49),
Pdf
(603 KB)
|
Source
|
International Multimedia Conference
archive
Proceedings of the 15th international conference on Multimedia
table of contents
Augsburg, Germany
SESSION: Best papers session
table of contents
Pages: 17 - 26
Year of Publication: 2007
ISBN:978-1-59593-702-5
|
|
Authors
|
|
Guo-Jun Qi
|
University of Science and Technology of China, Hefei, China
|
|
Xian-Sheng Hua
|
Microsoft Research Asia, Beijing, China
|
|
Yong Rui
|
Microsoft China R&D Group, Beijing, China
|
|
Jinhui Tang
|
University of Science and Technology of China, Hefei, China
|
|
Tao Mei
|
Microsoft Research Asia, Beijing, China
|
|
Hong-Jiang Zhang
|
Microsoft Research Advanced Technology Center, Beijing, China
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 45, Downloads (12 Months): 232, Citation Count: 17
|
|
|
ABSTRACT
Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
M. Campbell and et al. Ibm research trecvid-2006 video retrieval system. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
|
| |
3
|
S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
|
| |
4
|
|
| |
5
|
S. Godbole and S. Sarawagi. Discriminative methods for multi-labeled classification. In PAKDD, 2004.
|
| |
6
|
A. Hauptmann, M.-Y. Chen, and M. Christel. Confounded expectations: Informedia at TRECVID 2004. In TREC Video Retrieval Evaluation Online Proceedings, 2004.
|
| |
7
|
A. G. Hauptmann and et al. Multi-lingual broadcast news retrieval. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
|
| |
8
|
W. Jiang, S.-F. Chang, and A. Loui. Active concept-based concept fusion with partial user labels. In Proceedings of IEEE International Conference on Image Processing, 2006.
|
| |
9
|
D. Marr. Vision. W. H. Freeman and Company, 1982.
|
| |
10
|
M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE Trans. on CSVT, 12(1), Jan. 2002.
|
| |
11
|
M. R. Naphade. Statistical techniques in video data management. In IEEE Workshop on Multimedia Signal Processing, 2002.
|
| |
12
|
M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. In IBM Research Report RC23612 (W0505-104), 2005.
|
| |
13
|
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.
|
| |
14
|
X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In International Symposium on Electronic Imaging, 2004.
|
| |
15
|
|
 |
16
|
Cees G. M. Snoek , Marcel Worring , Jan C. van Gemert , Jan-Mark Geusebroek , Arnold W. M. Smeulders, The challenge problem for automated detection of 101 semantic concepts in multimedia, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
[doi> 10.1145/1180639.1180727]
|
| |
17
|
TRECVID. http://www-nlpir.nist.gov/projects/trecvid/.
|
 |
18
|
Ioannis Tsochantaridis , Thomas Hofmann , Thorsten Joachims , Yasemin Altun, Support vector machine learning for interdependent and structured output spaces, Proceedings of the twenty-first international conference on Machine learning, p.104, July 04-08, 2004, Banff, Alberta, Canada
[doi> 10.1145/1015330.1015341]
|
| |
19
|
|
| |
20
|
Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2004.
|
| |
21
|
|
CITED BY 18
|
|
|
|
|
|
|
|
Jingdong Wang , Yinghai Zhao , Xiuqing Wu , Xian-Sheng Hua, Transductive multi-label learning for video concept detection, Proceeding of the 1st ACM international conference on Multimedia information retrieval, October 30-31, 2008, Vancouver, British Columbia, Canada
|
|
|
Jiebo Luo , Jie Yu , Dhiraj Joshi , Wei Hao, Event recognition: viewing the world with a third eye, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Bo Geng , Linjun Yang , Chao Xu , Xian-Sheng Hua, Collaborative learning for image and video annotation, Proceeding of the 1st ACM international conference on Multimedia information retrieval, October 30-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
Lei Wu , Xian-Sheng Hua , Nenghai Yu , Wei-Ying Ma , Shipeng Li, Flickr distance, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
Yanan Liu , Fei Wu , Yueting Zhuang , Jun Xiao, Active post-refined multimodality video semantic concept detection with tensor representation, Proceeding of the 16th ACM international conference on Multimedia, October 26-31, 2008, Vancouver, British Columbia, Canada
|
|
|
Zheng-Jun Zha , Tao Mei , Jingdong Wang , Zengfu Wang , Xian-Sheng Hua, Graph-based semi-supervised learning with multiple labels, Journal of Visual Communication and Image Representation, v.20 n.2, p.97-103, February, 2009
|
|
|
|
|
|
|
|
|
Jinhui Tang , Xian-Sheng Hua , Meng Wang , Zhiwei Gu , Guo-Jun Qi , Xiuqing Wu, Correlative linear neighborhood propagation for video annotation, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, v.39 n.2, p.409-416, April 2009
|
|
|
|
|
|
|
|
|
|
|