ACM Home Page
Please provide us with feedback. Feedback
A robust audio classification and segmentation method
Full text PdfPdf (432 KB)
Source International Multimedia Conference; Vol. 9 archive
Proceedings of the ninth ACM international conference on Multimedia table of contents
Ottawa, Canada
Session: Audio Processing table of contents
Pages: 203 - 211  
Year of Publication: 2001
ISBN:1-58113-394-4
Authors
Lie Lu  Microsoft Research, China, Beijing, PRC
Hao Jiang  Microsoft Research, China, Beijing, PRC
HongJiang Zhang  Microsoft Research, China, Beijing, PRC
Sponsors
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
SIGCOMM: ACM Special Interest Group on Data Communication
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 26,   Downloads (12 Months): 170,   Citation Count: 19
Additional Information:

abstract   references   cited by   index terms   review   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/500141.500173
What is a DOI?

ABSTRACT

In this paper, we present a robust algorithm for audio classification that is capable of segmenting and classifying an audio stream into speech, music, environment sound and silence. Audio classification is processed in two steps, which makes it suitable for different applications. The first step of the classification is speech and non-speech discrimination. In this step, a novel algorithm based on KNN and LSP VQ is presented. The second step further divides non-speech class into music, environment sounds and silence with a rule based classification scheme. Some new features such as the noise frame ratio and band periodicity are introduced and discussed in detail. Our experiments in the context of video structure parsing have shown the algorithms produce very satisfactory results.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Foote. Content-based retrieval of music and audio. In C. C. J. Kuo et al., editors, Multimedia Storage and Archiving Systems II, Proc. of SPIE, volume 3229, pages 138147, 1997.
 
2
3
 
4
J. Saunders. Real-time Discrimination of Broadcast Speech/ Music. Proc. ICASSP96, vol.11, pp.993-996, Atlanta, May, 1996
 
5
 
6
D. Kimber and L. Wilcox. Acoustic Segmentation for Audio Browsers, Proc. Interface Conference, Sydney, Australia, July, 1996
 
7
T. Zhang and C.-C. J. Kuo. Video Content Parsing Based on Combined Audio and Visual Information. SPIE 1999, Vol. IV, pp. 78-89, 1999.
 
8
J. P. Campbell, JR. Speaker Recognition: A Tutorial. Proceedings of the IEEE, vol1.85, no.9, pp.1437-1462, 1997.
 
9
A. V. McCree and T. P. Barnwell. Mixed Excitation LPC Vocoder Model for Low Bit Rate Speech Coding. IEEE Transaction on Speech and Audio Processing, vol. 3, No. 4, pp242-250. July 1995.
 
10
K. El-Maleh, M. Klein, G. Petrucci and P. Kabal. Speech/music discrimination for multimedia application. ICASSPOO, 2000
 
11
Y. Linde, A. Buzo, and R.M. Gray. A Algorithm for Vector Quantizer Design, IEEE Trans. on Comm. Corn-28, No. 1, pp. 84-95, 1980.
12
 
13
 
14
J. S. Boreczky and L. D. Wilcox. A Hidden Markov Model Frame Work for Video Segmentation Using Audio and Image Features. Proceedings of ICASSP'98, pp.3741- 3744, Seattle, May 1998.

CITED BY  19


REVIEW

"Hadi Harb : Reviewer"

The authors present a technique for the classification of audio into speech, music, environment sounds, and silence classes. Such a classification is useful for audio indexing and retrieval, and for video structure extraction.

The technique   more...

Collaborative Colleagues:
Lie Lu: colleagues
Hao Jiang: colleagues
HongJiang Zhang: colleagues