ACM Home Page
Please provide us with feedback. Feedback
Active post-refined multimodality video semantic concept detection with tensor representation
Full text PdfPdf (402 KB)
Source
International Multimedia Conference archive
Proceeding of the 16th ACM international conference on Multimedia table of contents
Vancouver, British Columbia, Canada
SESSION: Content track C2: semantic video annotation table of contents
Pages 91-100  
Year of Publication: 2008
ISBN:978-1-60558-303-7
Authors
Yanan Liu  Zhejiang University, Hangzhou, China
Fei Wu  Zhejiang University, Hangzhou, China
Yueting Zhuang  Zhejiang University, Hangzhou, China
Jun Xiao  Zhejiang University, Hangzhou, China
Sponsors
ACM: Association for Computing Machinery
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 18,   Downloads (12 Months): 141,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1459359.1459372
What is a DOI?

ABSTRACT

In this paper, we resolve the problem of multi-modality video representation and semantic concept detection. Interaction and integration of multi-modality media types such as visual, audio and textual data in video are essential to video semantic analysis. Traditionally, videos are represented as vectors in the Euclidean space. Many learning algorithms are then taken to these vectors in a high dimensional space for dimension reduction, classification, clustering and so on. However, the multiple modalities in video not only have their own properties, but also have correlations among them; whereas the simple vector representation weakens the power of these relatively independent modalities and even ignores their relations to some extent. In this paper, we introduce a higher-order tensor framework for video analysis, in which we represent image, video and text three modalities in video shots as data points by the 3rd-order tensor called tensorshots. We propose a novel dimension reduction method that explicitly considers the manifold structure of the tensor space from multimodal media data which is temporal associated co-occurrence and then detect video semantic concepts through powerful classifiers which take tensor as input. Our algorithm preserves the intrinsic structure of the submanifold where tensorshots are sampled, and is also able to map out-of-sample data points directly. Moreover we apply an active learning based contextual and temporal post-refining strategy to enhance detection accuracy. Experiment results show that our method improves the performance of video semantic concept detection.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
N. Babaguchi, Y. Kawai, T. Kitahashi. Event based indexing of broadcast sports video by intermodal collaboration. In IEEE Transactions on Multimedia, 2002
 
2
Cees G. M. Snoek, Marcel Worring. Multimedia event-based video indexing using time intervals. In IEEE Transactions on Multimedia, 2005
 
3
Yanan Liu, Fei Wu. Video semantic concept detection using multi-modality subspace correlation propagation. In 13th Int. Multimedia Modeling Conf. (mmm2007), 2006.
 
4
 
5
Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems (NIPS2002), MIT Press, Cambridge 585--591.
 
6
Xiaofei He, and Partha Niyogi. Locality preserving projections. Advances in Neural Information Processing Systems (NIPS2003).
7
 
8
 
9
 
10
11
12
13
 
14
I. T. Jolliffe. Principal Component Analysis. Springer, New York, 2nd edition, 2002.
 
15
T. Cox and M. Cox. Multidimensional Scaling. Chapman & Hall, London, 1994.
 
16
Sam T. Roweis, Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, Vol.290, 2323--2326, 2000.
 
17
Joshua B. Tenenbaum, Vin de Silva, John C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, Vol.290, 2319--2323, 2000.
 
18
K. Q. Weinberger, B. D. Packer, and L. K. Saul. Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In Proceedings of the Tenth International Workshop on AI and Statistics (AISTATS-05), Barbados, WI, 2005.
 
19
L. K. Saul, K. Q. Weinberger, Fei Sha, Jihun Ham, and Daniel D. Lee. Spectral Methods for Dimensionality Reduction - Semisupervised Learning. MIT Press, Cambridge, MA, 2006.
 
20
M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. Computer Vision and Pattern Recognition, 586--591, 1991.
 
21
L. Itti, C. Koch and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. In IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 25, no 9, 1075--1088, 2003.
 
22
 
23
 
24
25
26
 
27
28
 
29
Lieven De Lathauwer. Signal Processing based on Multilinear Algebra. Ph.D. Thesis, September 1997.
30
 
31
Xiaofei He, Deng Cai, and Partha Niyogi. Tensor subspace analysis. Advances in Neural Information Processing Systems (NIPS2005).
 
32
Fan Rong K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. 1997.
 
33
Brett W. Bader and Tamara G. Kolda. MATLAB Tensor Classes for Fast Algorithm Prototyping. Technical Report SAND2004-5187, Sandia National Laboratories, October 2004.
 
34
Brett. W. Bader and Tamara G. Kolda. Efficient MATLAB Computations with Sparse and Factored Tensors. Technical Report SAND02006-7592, Sandia National Laboratories, December 2006.
35
 
36
Y. Y. Yao. Information-theoretic measures for knowledge discovery and data mining. In Entropy Measure, Maximum Entropy Principle and Emerging Applications, pages 115--136. Springer, 2003.
 
37
TREVID. http://www-nlpir.nist.gov/projects/trevid/.
 
38
LSCOM lexicon definitions and annotations version 1.0. In DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia, Columbia University ADVENT Technical Report 117-2006-3, 2006.
39
 
40
Yi Yang, Yueting Zhuang, Fei Wu, Yunhe Pan. Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. In IEEE Transactions on Multimedia, 10(3): 437--446, 2008.


Collaborative Colleagues:
Yanan Liu: colleagues
Fei Wu: colleagues
Yueting Zhuang: colleagues
Jun Xiao: colleagues