ACM Home Page
Please provide us with feedback. Feedback
Multi-cue fusion for semantic video indexing
Full text PdfPdf (1.13 MB)
Source
International Multimedia Conference archive
Proceeding of the 16th ACM international conference on Multimedia table of contents
Vancouver, British Columbia, Canada
SESSION: Content track C2: semantic video annotation table of contents
Pages 71-80  
Year of Publication: 2008
ISBN:978-1-60558-303-7
Authors
Ming-Fang Weng  National Taiwan University, Taipei, Taiwan Roc
Yung-Yu Chuang  National Taiwan University, Taipei, Taiwan Roc
Sponsors
ACM: Association for Computing Machinery
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 25,   Downloads (12 Months): 220,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1459359.1459370
What is a DOI?

ABSTRACT

The huge amount of videos currently available poses a difficult problem in semantic video retrieval. The success of query-by-concept, recently proposed to handle this problem, depends greatly on the accuracy of concept-based video indexing. This paper describes a multi-cue fusion approach toward improving the accuracy of semantic video indexing. This approach is based on a unified framework that explores and integrates both contextual correlation among concepts and temporal dependency among shots. The framework is novel in two ways. First, a recursive algorithm is proposed to learn both inter-concept and inter-shot relationships from ground-truth annotations of tens of thousands of shots for hundreds of concepts. Second, labels for all concepts and all shots are solved simultaneously through optimizing a graphical model. Experiments on the widely used TRECVID 2006 data set show that our framework is effective for semantic concept detection in video, achieving around a 30% performance boost on two popular benchmarks, VIREO-374 and Columbia374, in inferred average precision.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
W. Adams, G. Iyengar, C.-Y. Lin, M. Naphade, C. Neti, H. Nock, and J. Smith. Semantic indexing of multimedia content using visual, audio and text cues. EURASIP JASP, 2003(2):170--185, 2003.
 
2
A. Amir et al. IBM research TRECVID-2005 video retrieval system. In Proc. of TREC Video Retrieval Evaluation, 2005.
 
3
J. Cao et al. Intelligent multimedia group of Tsinghua University at TRECVid 2006. In Proc. of TREC Video Retrieval Evaluation, 2006.
 
4
S.-F. Chang, W.-Y. Ma, and A. Smeulders. Recent advances and challenges of semantic image/video search. In Proc. of ICASSP, 2007.
5
 
6
W. Jiang, S.-F. Chang, and A. Loui. Context-based concept fusion with boosted conditional random fields. In Proc. of ICASSP, 2007.
 
7
W. Jiang, S.-F. Chang, and A. C. Loui. Active context-based concept fusion with partial user labels. In Proc. of ICIP, 2006.
8
9
10
11
 
12
K.-H. Liu, M.-F. Weng, C.-Y. Tseng, Y.-Y. Chuang, and M.-S. Chen. Association and temporal rule mining for post-processing of semantic concept detection in video. IEEE TMM, 10(2):240--251, 2008.
 
13
LSCOM lexicon definitions and annotations version 1.0, DTO challenge workshop on large scale concept ontology for multimedia. Technical report, Columbia University, March 2006.
 
14
M. Naphade, I. Kozintsev, and T. Huang. Factor graph framework for semantic video indexing. IEEE TCSVT, 12(1):40--52, Jan 2002.
 
15
H. Naphide and T. Huang. A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE TMM, 3(1):141--151, Mar. 2001.
 
16
W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes in C. Cambridge University Press, 2nd edition, 1992.
17
 
18
19
 
20
 
21
22
 
23
 
24
B. Tseng, C.-Y. Lin, M. Naphade, A. Natsev, and J. Smith. Normalized classifier fusion for semantic visual concept detection. In Proc. of ICIP, 2003.
 
25
M.-F. Weng et al. The NTU toolkit and framework for high-level feature detection at TRECVID 2007. In Proc. of TREC Video Retrieval Evaluation, 2007.
 
26
R. Yan, M.-Y. Chen, and A. Hauptmann. Mining relationship between video concepts using probabilistic graphical models. In Proc. of ICME, 2006.
 
27
A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu. Columbia university's baseline detectors for 374 LSCOM semantic visual concepts. Technical report, Columbia University, March 2007.
28
29


Collaborative Colleagues:
Ming-Fang Weng: colleagues
Yung-Yu Chuang: colleagues