|
ABSTRACT
This paper presents an implementation of a content-based music retrieval system that can take a user's acoustic input (8-second clip of singing or humming) via a microphone and then retrieve the intended song from a database containing over 3000 candidate songs. The system, known as Super MBox, demonstrates the feasibility of real-time music retrieval with a high success rate. Super MBox first takes the user's acoustic input from a microphone and converts it into a pitch vector. Then a hierarchical filtering method (HFM) is used to first filter out 80% unlikely candidates and then compare the query input with the remaining 20% candidates in a detailed manner. The output of Super MBox is a ranked song list according to the computed similarity scores. A brief mathematical analysis of the two-step HFM is given in the paper to explain how to derive the optimum parameters of the comparison engine. The proposed HFM and its analysis framework can be directly applied to other multimedia information retrieval systems. We have tested Super MBox extensively and found the top-20 success rate is over 85%, based on a dataset of about singing/humming 2000 clips from people with mediocre singing skills. Our studies demonstrate the feasibility of using Super MBox as a prototype for music search engines over the Internet and/or query engines in digital music libraries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Brown, J. and Zhang, B. "Musical frequency tracking using the methods of conventional and 'narrowed' autocorrelation". Journal of the Acoustical Society of America, Volume 89, Number 5, pages 2346-2354, 1991.
|
| |
2
|
Chan, Chok-ki, and Ma, Chi-Kit, "A Fast Method of Designing Better Codebooks for Image Vector Quantization," IEEE Transactions on Communications, Vol. 42, No. 21314, PP. 237-242, February/March/April, 1994.
|
| |
3
|
Chen B. and Jang, J.-S. Roger "Query by Singing", 11th JPPR Conference on Computer Vision, Graphics, and Image Processing, PP. 529-536, Taiwan, Aug 1998.
|
| |
4
|
Myron Flickner , Harpreet Sawhney , Wayne Niblack , Jonathan Ashley , Qian Huang , Byron Dom , Monika Gorkani , Jim Hafner , Denis Lee , Dragutin Petkovic , David Steele , Peter Yanker, Query by Image and Video Content: The QBIC System, Computer, v.28 n.9, p.23-32, September 1995
[doi> 10.1109/2.410146]
|
| |
5
|
|
| |
6
|
Fukunaga, Keinosuke and M. Narendra, Patrenahalli "A Branch and Bound Algorithm for Computing K-Nearest Neighbors", IEEE Transactions on Computers, July 1975.
|
 |
7
|
Asif Ghias , Jonathan Logan , David Chamberlin , Brian C. Smith, Query by humming: musical information retrieval in an audio database, Proceedings of the third ACM international conference on Multimedia, p.231-236, November 05-09, 1995, San Francisco, California, United States
[doi> 10.1145/217279.215273]
|
| |
8
|
Gold, B. and Rabiner, L. "Parallel processing techniques for estimating pitch periods of speech in the time domain," J. Acoust. Sot. Am. 46 (2), pp 442-448, 1969.
|
| |
9
|
Hess, Wolfgang, "Pitch determination of speech signals: algorithms and devices," Springer-Verlag, 1983.
|
| |
10
|
International Symposium on Music Information Retrieval (MUSIC IR 2000), Plymouth, Massachusetts, Oct. 23-25, 2000. (httn://ciir.cs.umass.edu/music2000/)
|
| |
11
|
Jang, J.-S. Roger and Gao, Ming-Yang "A Query-by-Singing System based on Dynamic Programming", International Workshop on Intelligent Systms Resolutions (the 8th Bellman Continuum), PP. 85-89, Hsinchu, Taiwan, Dee 2000.
|
| |
12
|
Katsavounidis, Ioannis and Kuo, C.-C Jay and Zhang, Zhen, "Fast Tree-Structured Nearest Neighbor Encoding for Vector Quantization," IEEE Transactions on Image Processing, Vol. 5, No. 2, PP. 398-404, Feb. 1996.
|
| |
13
|
Kosugi, N. Y., Kon'ya, Nishihara, S., Yamamura, M. and Kushima, K. "Music Retrieval by Humming - Using Similarity Retrieval over High Dimensional Feature Vector Space," pp 404-407, IEEE 1999.
|
 |
14
|
Naoko Kosugi , Yuichi Nishihara , Seiichi Kon'ya , Masashi Yamamuro , Kazuhiko Kushima, Let's search for songs by humming!, Proceedings of the seventh ACM international conference on Multimedia (Part 2), p.194, October 30-November 05, 1999, Orlando, Florida, United States
[doi> 10.1145/319878.319932]
|
| |
15
|
Kosugi, N., Nishihara, Y., Kon'ya, S., Yamamuro, M., and Kushima, K., "Music Retrieval by Humming," In Proceedings of IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 404-407, August 1999.
|
 |
16
|
Naoko Kosugi , Yuichi Nishihara , Tetsuo Sakata , Masashi Yamamuro , Kazuhiko Kushima, A practical query-by-humming system for a large music database, Proceedings of the eighth ACM international conference on Multimedia, p.333-342, October 2000, Marina del Rey, California, United States
[doi> 10.1145/354384.354520]
|
| |
17
|
Lee, I-Yang, Jang, J.-S. Roger and Hsu, Wen-Hao "Content-based Music Retrieval from Acoustic Input", 12th IPPR Conference on Computer Vision, Graphics, and Image Processing, PP. 325-330, Taiwan, August 1999.
|
| |
18
|
Liu, C. C. and Chen, A. L. P., "A Multimedia Database System Supporting Content-Based Retrieval", Journal of Information Science and Engineering, 13, PP. 369- 398,1997.
|
| |
19
|
McNab, R. J. and Smith, L. A. "Melody transcription for interactive applications" Department of Computer Science University of Waikato, New Zealand.
|
| |
20
|
McNab, R. J., Smith, L. A. and Witten, Jan H. "Towards the Digital Music Library: Tune Retrieval from Acoustic Input"" ACM, 1996.
|
| |
21
|
McNab, R. J., Smith, L. A., Witten, I. H. and Henderson, C. L. "Tune Retrieval in the Multimedia Library,"
|
| |
22
|
McNab,R. J., Smith, L. A. and Witten, Jan H. "Signal Processing for Melody Transcription" Proceedings of the 19'h Australasian Computer Science Conference, 1996.
|
| |
23
|
|
| |
24
|
Torres, L. and Huguet, J., "An Improvement on Codebook Search for Vector Quantization," IEEE Transactions on Communications, Vol 42, No. 2/3/4, PP. 208-210, February/March/April, 1994.
|
| |
25
|
Uitdenbogerd A. and Zobel, J. ""Melodic Matching Techniques for Large Music Databases", (httn://www.kom.e-technik.tudarnstadt.de/acmmm99/ep/uitdcnbogerd/)
|
| |
26
|
|
| |
27
|
Yianilos, Peter N. "Excluded Middle Vantage Point Forests for Nearest Neighbor Search," NEC Research Institute Technical Report, 1998
|
|