|
ABSTRACT
This paper proposes a new approach and algorithm for the semantic concept annotation based on audio PLSA (probabilistic latent semantic analysis) model. The novelty of our approach includes two sides: Audio vocabulary construction, and audio PLSA model. In audio vocabulary construction, we first segment an audio-clip into a few homogeneous audio-segments according to its content change, which not only capture the change property of audio-clip, but also keep and present the change relation and temporal order of audio features. Then an audio vocabulary is constructed by the RPCL (rival penalized competitive learning) clustering of audio-segments. In this way, each audio-clip can be represented by a bag-of-word form. In audio PLSA model, PLSA is employed to discover the latent topics existing in audio-clips. Based on the discovered topics, the concept classification is then carried out by a support vector machine (SVM) classifier. In addition, we also combine the local features extracted by PLSA and global features in audio-clip to further improve the performance of concept annotation. The experiments are evaluated on 85 hours of audio data from the TRECVID 2005, and show the encouraging results of our approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. Lu, H.-J. Zhang, and S. Z. Li. Content-based Audio Classification and Segmentation by Using Support Vector Machines, ACM Multimedia System, Vol.8, No.6, 2003.
|
| |
2
|
Z. Liu, Y. Wang and T. Chen. Audio Feature Extraction and Analysis for Scene Segmentation and Classification, J. VLSI Signal Processing, Vol.20, pp.61--79, 1998.
|
| |
3
|
S. Kiranyaz, A. F. Qureshi and M. Gabbouj. A Generic Audio Classification and Segmentation Approach for Multimedia Indexing and Retrieval, IEEE Trans. on Audio, Speech & Language Processing, Vol.14, No.3, 2006.
|
| |
4
|
E. D. Gelasca, S. Joshi, J. Kleban, and et al. The Vision Research Lab of UCSB at TRECVID 2007, In Online Proceedings of TRECVID, 2007.
|
| |
5
|
G. Guo and S. Li. Content-based Audio Classification and Retrieval by Support Vector Machines, IEEE Trans. on Neural Networks, Vol.14, No.1, pp.209--215, 2003.
|
| |
6
|
R. Cai, L. Lu, A. Hanjalic, and et al.. A Flexible Framework for Key Audio Effects Detection and Auditory Context Inference, IEEE Trans. on Audio, Speech & Language Processing, Vol.14, No.3, pp.1026--1039, 2006.
|
| |
7
|
M. Xu, N. Maddage, C.-S. Xu, and et al. Creating Audio Keywords for Event Detection in Soccer Video, ICME, 2003.
|
| |
8
|
L. Lu, and A. Hanjalic. Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval, IEEE Trans. on Multimedia, Vol.10, No.1, pp.74--85, 2008
|
| |
9
|
T. Hofmann. Probabilistic Latent Semantic Indexing, ACM SIGIR, 1999.
|
| |
10
|
A. Bosch, and et al. Scene Classification via PLSA, 9th European Conference on Computer Vision (ECCV), 2006.
|
| |
11
|
M. Cettolo and M. Vescovi. Efficient Audio Segmentation Algorithms based on the BIC, ICASSP, pp.537--540, 2003.
|
| |
12
|
C. Snoek, and et al. The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia, ACM Multimedia Conference (MM), pp.421--430, 2006.
|
| |
13
|
A. Farahat and F. Chen. Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis, EACL, 2006.
|
| |
14
|
L. Xu, A. Krzyzak, and E. Oja. Rival Penalized Competitive Learning for Clustering Analysis, RBF net, and curve detection, IEEE Trans. on Neural Networks, Vol.4, No.7, pp.636--649, 1993.
|
|