ACM Home Page
Please provide us with feedback. Feedback
Correlative multi-label video annotation
Full text MovMov (56:49),  PdfPdf (603 KB)
Source
International Multimedia Conference archive
Proceedings of the 15th international conference on Multimedia table of contents
Augsburg, Germany
SESSION: Best papers session table of contents
Pages: 17 - 26  
Year of Publication: 2007
ISBN:978-1-59593-702-5
Authors
Guo-Jun Qi  University of Science and Technology of China, Hefei, China
Xian-Sheng Hua  Microsoft Research Asia, Beijing, China
Yong Rui  Microsoft China R&D Group, Beijing, China
Jinhui Tang  University of Science and Technology of China, Hefei, China
Tao Mei  Microsoft Research Asia, Beijing, China
Hong-Jiang Zhang  Microsoft Research Advanced Technology Center, Beijing, China
Sponsors
ACM: Association for Computing Machinery
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 45,   Downloads (12 Months): 232,   Citation Count: 17
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1291233.1291245
What is a DOI?

ABSTRACT

Automatically annotating concepts for video is a key to semantic-level video browsing, search and navigation. The research on this topic evolved through two paradigms. The first paradigm used binary classification to detect each individual concept in a concept set. It achieved only limited success, as it did not model the inherent correlation between concepts, e.g., urban and building. The second paradigm added a second step on top of the individual concept detectors to fuse multiple concepts. However, its performance varies because the errors incurred in the first detection step can propagate to the second fusion step and therefore degrade the overall performance. To address the above issues, we propose a third paradigm which simultaneously classifies concepts and models correlations between them in a single step by using a novel Correlative Multi-Label (CML) framework. We compare the performance between our proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. We report superior performance from the proposed approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
M. Campbell and et al. Ibm research trecvid-2006 video retrieval system. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
 
3
S.-F. Chang and et al. Columbia university trecvid-2006 video search and high-level feature extraction. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
 
4
 
5
S. Godbole and S. Sarawagi. Discriminative methods for multi-labeled classification. In PAKDD, 2004.
 
6
A. Hauptmann, M.-Y. Chen, and M. Christel. Confounded expectations: Informedia at TRECVID 2004. In TREC Video Retrieval Evaluation Online Proceedings, 2004.
 
7
A. G. Hauptmann and et al. Multi-lingual broadcast news retrieval. In TREC Video Retrieval Evaluation (TRECVID) Proceedings, 2006.
 
8
W. Jiang, S.-F. Chang, and A. Loui. Active concept-based concept fusion with partial user labels. In Proceedings of IEEE International Conference on Image Processing, 2006.
 
9
D. Marr. Vision. W. H. Freeman and Company, 1982.
 
10
M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE Trans. on CSVT, 12(1), Jan. 2002.
 
11
M. R. Naphade. Statistical techniques in video data management. In IEEE Workshop on Multimedia Signal Processing, 2002.
 
12
M. R. Naphade, L. Kennedy, J. R. Kender, S.-F. Chang, J. R. Smith, P. Over, and A. Hauptmann. A light scale concept ontology for multimedia understanding for TRECVID 2005. In IBM Research Report RC23612 (W0505-104), 2005.
 
13
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, pages 61--67, 1999.
 
14
X. Shen, M. Boutell, J. Luo, and C. Brown. Multi-label machine learning and its application to semantic scene classification. In International Symposium on Electronic Imaging, 2004.
 
15
16
 
17
TRECVID. http://www-nlpir.nist.gov/projects/trecvid/.
18
 
19
 
20
Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. In Proceeding of IEEE International Conferences on Multimedia and Expo, 2004.
 
21

CITED BY  18

Collaborative Colleagues:
Guo-Jun Qi: colleagues
Xian-Sheng Hua: colleagues
Yong Rui: colleagues
Jinhui Tang: colleagues
Tao Mei: colleagues
Hong-Jiang Zhang: colleagues