| Heuristic approach for generic audio data segmentation and annotation |
| Full text |
Pdf
(1.82 MB)
|
| Source
|
International Multimedia Conference
archive
Proceedings of the seventh ACM international conference on Multimedia (Part 1)
table of contents
Orlando, Florida, United States
Pages: 67 - 76
Year of Publication: 1999
ISBN:1-58113-151-8
|
|
Authors
|
|
Tong Zhang
|
Integrated Media Systems Center and Department of Electrical Engineering-Systems, University of Southern California, Los Angeles, CA
|
|
C.-C. Jay Kuo
|
Integrated Media Systems Center and Department of Electrical Engineering-Systems, University of Southern California, Los Angeles, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 9, Downloads (12 Months): 36, Citation Count: 5
|
|
|
ABSTRACT
A real-time audio segmentation and indexing scheme is presented in this paper. Audio recordings are segmented and classified into basic audio types such as silence, speech, music, song, environmental sound, speech with the music background, environmental sound with the music background, etc. Simple audio features such as the energy function, the average zero-crossing rate, the fundamental frequency, and the spectral peak track are adopted in this system to ensure on-line processing. Morphological and statistical analysis for temporal curves of these features are performed to show differences among different types of audio. A heuristic rule-based procedure is then developed to segment and classify audio signals by using these features. The proposed approach is generic and model free. It can be applied to almost any content-based audio management system. It is shown that the proposed scheme achieves an accuracy rate of more than 90% for audio classification. Examples for segmentation and indexing of accompanying audio signals in movies and video programs are also provided.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Boreczky, J. S. and Wilcox, L. D.: A hidden Markov model framework for video segmentation using audio and image features, in Proceedings of ICASSP'98, pp.3741-3744, Seattle, May 1998.
|
| |
2
|
Foote, J.: Content-based retrieval of music and audio, in Proceedings of SPIE'97, Dallas, 1997.
|
 |
3
|
Asif Ghias , Jonathan Logan , David Chamberlin , Brian C. Smith, Query by humming: musical information retrieval in an audio database, Proceedings of the third ACM international conference on Multimedia, p.231-236, November 05-09, 1995, San Francisco, California, United States
[doi> 10.1145/217279.215273]
|
| |
4
|
Kimber, D. and Wilcox, L.: Acoustic segmentation for audio browsers, in Proceedings of Interface Conference, Sydney, Australia, July 1996.
|
| |
5
|
Liu, Z., Huang, J., Wang, Y. et al.: Audio feature extraction and analysis for scene classification, in Proceedings of IEEE 1st Multimedia Workshop, 1997.
|
| |
6
|
Naphade, M. R., Kristjansson, T., Frey, B. et al.: Probabilistic multimedia objects (MULTIJECTS): a novel approach to video indexing and retrieval in multiinedia systems, in Proceedings of IEEE Conference on Image Processing, Chicago, Oct. 1998.
|
| |
7
|
Patel, N. and Sethi, I.: Audio characterization for video indexing, in Proceedings of SPIE Conference on Storage and Retrieval for Still Image and Video Databases, vol.2670, pp.373-384, San Jose, 1996.
|
| |
8
|
Saunders, J.: Real-time discrimination of broadcast speech/music, in Proceedings of ICASSP'96, vol. II, pp.993-996, May 1996.
|
| |
9
|
|
| |
10
|
Erling Wold , Thom Blum , Douglas Keislar , James Wheaton, Content-Based Classification, Search, and Retrieval of Audio, IEEE MultiMedia, v.3 n.3, p.27-36, September 1996
[doi> 10.1109/93.556537]
|
| |
11
|
Wyse, L. and Smoliar, S.: Toward content-based audio indexing and retrieval and a new speaker discrimination technique, in http://www.iss.nus.sg/People/lwyse/lwyse.html, Dec. 1995.
|
|