|
ABSTRACT
We propose a robust scene recognition framework using scene context information for multimedia contents. Multimedia contents con-sist of scene sequences that are more likely to happen compared with other scene sequences. We employ a statistical approach to deal with this scene context information. We employ a hidden Markov model (HMM) to model each scene and n-gram language model to represent the contexts among scenes. We evaluated the proposed method in scene recognition experiments for 16 scenes in video data of 25 baseball games. The proposed method significantly improved the results compared to that without scene context information.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Brunelli, O. Mich, and C. M. Modena, "A survey on the automatic indexing of video data," Journal of Visual Communication and Image Representation, vol. 10, no. 2, pp. 78--112, 1999.
|
| |
2
|
|
| |
3
|
H. B. Nguyen, K. Shinoda, and S. Furui, "Robust highlight extraction using multi-stream hidden Markov models for baseball video," Proc. IEEE International Conference on Image Processing, vol. 3, pp. 173--176, 2005.
|
| |
4
|
T. Mochizuki, M. Tadenuma, and N. Yagi, "Baseball video indexing using patternization of scenes and hidden Markov model," Proc. IEEE International Conference on Image Processing, vol. 3, pp. 1212--1215, 2005.
|
| |
5
|
P. Chang, M. Han, and Y. Gong, "Extract highlights from baseball game video with hidden Markov models," Proc. IEEE International Conference on Image Processing, vol.1, pp. I-609--612, 2002.
|
| |
6
|
|
| |
7
|
C.-H. Liang, W.-T. Chu, J.-H. Kuo, J.-L. Wu, and W.-H. Cheng, "Baseball event detection using game-specific feature sets and rules," Proc. IEEE International Symposium on Circuits and Systems, pp. 3829--3832, 2005.
|
| |
8
|
P. Xu, L. Xie, S. F. Chang, A. Divakaran, A. Vetro, and H. Sun, "Algorithms and system for segmentation and structure analysis in soccer video," Proc. IEEE International Conference on Multimedia and Expo, pp. 928--931, 2001.
|
| |
9
|
|
| |
10
|
E. Kijak, L. Oisel, and P. Gros, "Hierarchical structure analysis of sport videos using HMMs," Proc. IEEE International Conference on Image Processing, vol.3, pp. 1025--1028, 2003.
|
| |
11
|
G. Xu, Y.-F. Ma, H.-J. Zhang, and S.-Q. Yang, "Motion based event recognition using HMM," IEEE Trans. Circuits and Systems, vol. 15, pp. 1422--1433, 2005.
|
| |
12
|
N. Babaguchi, Y. Kwai, and T. Kitahashi, "Event based indexing of broadcasted sports video by intermodal collaboration," IEEE Trans. Multimedia, vol. 4, no. 1, pp. 68--75, 2002.
|
| |
13
|
|
| |
14
|
G. Xu, Y.-F. Ma, H.-J. Zhang, and S. Yang, "Motion based event recognition using HMM," Proc. IEEE International Conference on Pattern Recognition, vol. 2, pp. 831--834, 2002.
|
| |
15
|
D. Zhong and S. F. Chang, "Structure analysis of sports video using demain models," Proc. IEEE International Conference on Multimedia and Expo, pp. 920--923, 2001.
|
| |
16
|
|
| |
17
|
B. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," Proc. 7th International Joint Conference on Artificial Intelligence, pp. 674--679, 1981.
|
| |
18
|
S. M. Katz, "Estimation of probabilities from sparse data for the language model component of a speech recognizer," IEEE Trans. Acoustics, Speech and Signal Processing, vol. 35, pp. 400--401, 1987.
|
| |
19
|
H. Ney, U. Essen, and R. Kneser, "On structuring probabilistic dependencies in stochastic language modeling," Computer Speech and Language, vol. 8, no. 1, pp. 1--38, 1994.
|
| |
20
|
P. Placeway, R. Schwartz, P. Fung, and L. Nguyen, "The estimation of powerful language models from small and large corpora," Proc. IEEE Acoustics, Speech and Signal Processing, vol. II, pp. 33--36, 1993.
|
| |
21
|
I. H. Witten and T. C. Bell, "The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression," IEEE Trans. Information Theory, vol.37, no. 4, pp. 1085--1094, 1991.
|
| |
22
|
G. Saon and M. Padmanablan, "Data-driven approach to designing compound words for continuous speech recognition," IEEE Trans. Speech and Audio Processing, vol. 9, no. 4, pp. 327--332, 2001.
|
| |
23
|
A. Kilgariff and D. Tugwell, "Wasp-bench: an mt lexicographer's workstation supporting state-of-the-art lexical disambiguation," Proc. the 8th Machine Translation Summit, pp. 187--190, 2001.
|
| |
24
|
|
| |
25
|
|
| |
26
|
|
| |
27
|
|
CITED BY
|
|
Ryoichi Ando , Koichi Shinoda , Sadaoki Furui , Takahiro Mochizuki, A robust scene recognition system for baseball broadcast using data-driven approach, Proceedings of the 6th ACM international conference on Image and video retrieval, p.186-193, July 09-11, 2007, Amsterdam, The Netherlands
|
|