|
ABSTRACT
Over the past decade, there has been explosive growth in the availability of multimedia data, particularly image, video, and music. Because of this, content-based music retrieval has attracted attention from the multimedia database and information retrieval communities. Content-based music retrieval requires us to be able to automatically identify particular characteristics of music data. One such characteristic, useful in a range of applications, is the identification of the singer in a musical piece. Unfortunately, existing approaches to this problem suffer from either low accuracy or poor scalability. In this article, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for efficient automated singer recognition. HSI uses multiple low-level features extracted from both vocal and nonvocal music segments to enhance the identification process; it achieves this via a hybrid architecture that builds profiles of individual singer characteristics based on statistical mixture models. An extensive experimental study on a large music database demonstrates the superiority of our method over state-of-the-art approaches in terms of effectiveness, efficiency, scalability, and robustness.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Bartsch, M. and Wakefield, G. 2004. Singing voice identification using spectral envelop estimation. IEEE Trans. Speech Aud. Process. 12, 100--109.
|
| |
2
|
|
| |
3
|
Berenzweig, A., Ellis, D. P. W., and Lawrence, S. 2002. Using voice segments to improve artist classification of music. In Proceedings of the AES 22nd International Conference on Virtual, Synthetic and Entertainment Audio. 119--122.
|
| |
4
|
|
| |
5
|
Berenzweig, A. L. and Ellis, D. P. W. 2001. Locating singing voice segments within music signals. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. 119--122.
|
 |
6
|
|
| |
7
|
|
| |
8
|
Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
|
| |
9
|
|
 |
10
|
|
| |
11
|
Downie, J. S. 2006. The Music Information Retrieval Evaluation Exchange (MIREX). D-Lib Mag. 12, 12 (Dec.)
|
| |
12
|
Downie, J. S., West, K., Ehmann, A., and Vincent, E. 2005b. The 2005 Music Information Retrieval Evaluation Exchange (MIREX 2005) preliminary overview. In Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR). 320--323.
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag, Berlin, Germany.
|
| |
17
|
ISMIR. 2004. The Fifth International Conference on Music Information Retrieval. http://ismir2004.ismir.net/index.html.
|
| |
18
|
Jordan, M. I. 1995. Why the logistic function? a tutorial discussion on probabilities and neural networks. Tech. rep. 9503. MIT, Cambridge, MA.
|
| |
19
|
Kim, Y. E. and Whitman, B. 2002. Singer identification in popular music recordings using voice coding features. In Proceedings of the 3rd International Conference Music on Information Retrieval (ISMIR). 164--169.
|
| |
20
|
Kim, Y. E., Williamson, D., and Pilli, S. 2006. Towards quantifying the album effect in artist identification. In Proceedings of the 7th International Conference Music Information Retrieval (ISMIR'06). 393--394.
|
 |
21
|
|
| |
22
|
Lebanon, G. and Lafferty, J. 2001. Boosting and maximum likelihood for exponential model and Bregman distances. In Advances in Neural Information Processing Systems 14 (Proceedings of NIPS). 110--121.
|
 |
23
|
|
 |
24
|
|
 |
25
|
|
| |
26
|
Livshin, A. and Rodet, X. 2004. Musical instrument identification in continuous recordings. In Proceedings of the 7th International Conference on Digital Audio Effects (DAFx). 222--227.
|
| |
27
|
Lu, L., Zhang, H., and Li, S. Z. 2003. Content-based audio classification and segmentation by using support vector machines. Multimed. Syst. 8, 6, 482--492.
|
| |
28
|
MIREX. 2005. Artist identification contest track. http://www.music-ir.org/evaluation/mirex-results/audio-artist/index.html.
|
| |
29
|
MIREX. 2007. Artist identification contest track. http://www.music-ir.org/mirex2007/index.php/AudioArtistIdentificationResults.
|
 |
30
|
|
 |
31
|
|
 |
32
|
|
| |
33
|
|
| |
34
|
Rabiner, L. and Schafer, R. 1978. Digital Processing of Speech Signals. Prentice-Hall, Englewood Cliffs, NJ.
|
| |
35
|
Rissanen, J. 1978. Modeling by shortest data description. Automatica 14, 465--471.
|
| |
36
|
|
| |
37
|
Tolonen, T. and Karjalainen, M. 2000. A computationally efficient multipitch analysis model. IEEE Trans. Speech Aud. Process. 8, 4, 708--716.
|
| |
38
|
Tsai, W. H. and Wang, H. M. 2006. Automatic singer recognition of popular music recordings via estimation and modeling of solo vocal signals. IEEE Trans. Speech Aud. Process. 14, 1, 330--341.
|
| |
39
|
Tsai, W. H., Wang, H. M., Rodgers, D., Cheng, S. S., and Yu, H. M. 2003. Blind clustering of popular music recordings based on singer voice characteristics. In Proceedings of the 4th international Conference on Music Information Retrieval (ISMIR). 167--173.
|
| |
40
|
Vapnik, V. 1998. Statistical Learning Theory. John Wiley & Sons. New York, NY.
|
| |
41
|
Whitman, B., Flake, G., and Lawrence, S. 2001. Artist detection in music with Minnowmatch. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing. 559--568.
|
| |
42
|
Xu, C. S., Maddage, N., and Shao, X. 2005. Automatic music classification and summarization. IEEE Trans. Speech Aud. Process. 13, 3, 441--450.
|
| |
43
|
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models
Additional Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.4
Systems and Software
Subjects:
Performance evaluation (efficiency and effectiveness)
H.5
INFORMATION INTERFACES AND PRESENTATION (I.7)
H.5.5
Sound and Music Computing
Subjects:
Modeling
J.
Computer Applications
J.5
ARTS AND HUMANITIES
Subjects:
Performing arts (e.g., dance, music)
General Terms:
Algorithms,
Experimentation,
Performance
Keywords:
EM algorithm,
Gaussian mixture models,
Music retrieval,
classification,
evaluation,
singer identification,
statistical modeling
|