ACM Home Page
Please provide us with feedback. Feedback
Topic transition detection using hierarchical hidden Markov and semi-Markov models
Full text PdfPdf (493 KB)
Source International Multimedia Conference archive
Proceedings of the 13th annual ACM international conference on Multimedia table of contents
Hilton, Singapore
SESSION: Content 1: news video processing table of contents
Pages: 11 - 20  
Year of Publication: 2005
ISBN:1-59593-044-2
Authors
Dinh Q. Phung  Curtin Univesrity of Technology, Perth, Western Australia
T. V. Duong  Curtin Univesrity of Technology, Perth, Western Australia
S. Venkatesh  Curtin Univesrity of Technology, Perth, Western Australia
Hung H. Bui  SRI International, Menlo Park, CA
Sponsors
ACM: Association for Computing Machinery
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 64,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1101149.1101153
What is a DOI?

ABSTRACT

In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
B. Adams, C. Dorai, and S. Venkatesh. Automated film rhythm extraction for scene analysis. In IEEE International Conference on Multimedia and Expo, pages 1056--1059, Tokyo, Japan, August 2001.
 
2
 
3
H. H. Bui, D. Q. Phung, and S. Venkatesh. Hierarchical hidden markov models with general state hierarchy. In D. L. McGuinness and G. Ferguson, editors, Proceedings of the Nineteenth National Conference on Artificial Intelligence, pages 324--329, San Jose, California, USA, 2004. AAAI Press / The MIT Press.
 
4
L. Chaisorn, T.-S. Chua, C.-H. Lee, and Q. Tian. A hierarchical approach to story segmentation of large broadcast news video corpus. In IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, June 2004.
 
5
 
6
 
7
A. Hanjalic. Shot-boundary detection: Unraveled and resolved? IEEE Transaction in Circuits and Systems for Video Technology, 12(2):90--105, 2002.
 
8
A. Hanjalic, R. L. Lagendijk, and J. Biemond. Automated high-level movie segmentation for advanced video retrieval systems. IEEE Transactions in Circuits and Systems for Video Technology, 9(4):580--588, 1999.
 
9
 
10
U. Iurgel, R. Meermeier, S. Eickeler, and G. Rigoll. New approaches to audio-visual segmentation of TV news for automatic topic retrieval. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, volume 3, pages 1397--1400, Salt Lake City, Utah, 2001.
 
11
E. Kijak, L. Oisel, and P. Gros. Hierarchical structure analysis of sport videos using HMMs. In Int. Conf. on Image Processing, volume 2, pages II--1025--8 vol.3, 2003.
 
12
 
13
T. Lin and H. J. Zhang. Automatic video scene extraction by shot grouping. Pattern Recognition, 4:39--42, 2000.
 
14
Z. Liu and Q. Huang. Detecting news reporting using audio/visual information. In International Conference on Image Processing, pages 24--28, Kobe, Japan, October 1999.
 
15
Mediaware-Company. Mediaware solution webflix professional V1.5.3, 1999. http://www.mediaware.com.au/webflix.html.
 
16
C. D. Mitchell and L. H. Jamieson. Modeling duration in a hidden markov model with the exponential family. In Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pages II.331--II.334, Minneapolis, Minnesota, April 1993.
 
17
K. Murphy and M. Paskin. Linear-time inference in hierarchical HMMs. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, Cambridge, MA, 2001. MIT Press.
 
18
M. R. Naphade and T. S. Huang. Discovering recurrent events in video using unsupervised methods. In Int. Conf. om Image Processing, volume 2, pages 13--16, Rochester, NY, USA, 2002.
 
19
D. Q. Phung. Probabilistic and Film Grammar Based Methods for Video Content Analysis. PhD thesis, Curtin University of Technology, Australia, 2005.
 
20
D. Q. Phung, H. H. Bui, and S. Venkatesh. Content structure discovery in educational videos with shared structures in the hierarchical HMMs. In Joint Int. Workshop on Syntactic and Structural Pattern Recognition, pages 1155--1163, Lisbon, Portugal, August 18--20 2004.
 
21
D. Q. Phung and S. Venkatesh. Structural unit identification and segmentation of topical content in educational videos. Technical report, Department of Computing, Curtin University of Technology, 2005. TR-May-2005.
 
22
D. Q. Phung, S. Venkatesh, and H. H. Bui. Automatically learning structural units in educational videos using the hierarchical HMMs. In International Conference on Image Processing, Singapore, 2004.
23
 
24
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Procs. IEEE, volume 77, pages 257--286, February 1989.
 
25
 
26
K. Shearer, C. Dorai, and S. Venkatesh. Incorporating domain knowlege with video and voice data analysis. In Workshop on Multimedia Data Minning, Boston, USA, August 2000.
 
27
 
28
 
29
 
30
H. Sundaram and S.-F. Chang. Computable scenes and structures in films. IEEE Transactions in Multimedia, 4(4):482--491, 2002.
 
31
B. T. Truong. An Investigation into Structural and Expressive Elements in Film. PhD thesis, Curtin University of Technology, 2004.
 
32
J. Vendrig and M. Worring. Systematic evaluation of logical story unit segmentation. IEEE Transactions on Multimedia, 4(4):492--499, 2002.
 
33
C. Wang, Y. Wang, H. Liu, and Y. He. Automatic story segmentation of news video based on audio-visual features and text information. In Int. Conf. on Machine Learning and Cybernetics, volume 5, pages 3008--3011, 2003.
 
34
J. Wang, T.-S. Chua, and L. Chen. Cinematic-based model for scene boundary detection. In The Eight Conference on Multimedia Modeling, Amsterdam, Netherland, 5-7 November 2001.
 
35
L. Xie and S.-F. Chang. Unsupervised mining of statistical temporal structures in video. In A. Rosenfield, D. Doreman, and D. Dementhons, editors, Video Mining. Kluwer Academic Publishers, June 2003.
 
36
L. Xie, S.-F. Chang, A. Divakaran, and H. Sun. Learning hierarhical hidden markov models for unsupervised structure discovery from video. Technical report, Columbia University, 2002.
 
37


Collaborative Colleagues:
Dinh Q. Phung: colleagues
T. V. Duong: colleagues
S. Venkatesh: colleagues
Hung H. Bui: colleagues