| Local summarization and multi-level LSH for retrieving multi-variant audio tracks |
| Full text |
Pdf
(571 KB)
|
Source
|
International Multimedia Conference
archive
Proceedings of the seventeen ACM international conference on Multimedia
table of contents
Beijing, China
SESSION: Applications track A3: information summarization
table of contents
Pages 341-350
Year of Publication: 2009
ISBN:978-1-60558-608-3
|
|
Authors
|
|
Yi Yu
|
New Jersey Institute of Technology, Newark, NJ, USA
|
|
Michel Crucianu
|
Conservatoire National des Arts et Métiers, Paris, France
|
|
Vincent Oria
|
New Jersey Institute of Technology, Newark, NJ, USA
|
|
Lei Chen
|
Hong Kong University of Science and Technology, Hong Kong, China
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 17, Downloads (12 Months): 17, Citation Count: 0
|
|
|
ABSTRACT
In this paper we study the problem of detecting and grouping multi-variant audio tracks in large audio datasets. To address this issue, a fast and reliable retrieval method is necessary. But reliability requires elaborate representations of audio content, which challenges fast retrieval by similarity from a large audio database. To find a better tradeoff between retrieval quality and efficiency, we put forward an approach relying on local summarization and multi-level Locality-Sensitive Hashing (LSH). More precisely, each audio track is divided into multiple Continuously Correlated Periods (CCP) of variable length according to spectral similarity. The description for each CCP is calculated based on its Weighted Mean Chroma (WMC). A track is thus represented as a sequence of WMCs. Then, an adapted two-level LSH is employed for efficiently delineating a narrow relevant search region. The "coarse" hashing level restricts search to items having a non-negligible similarity to the query. The subsequent, "refined" level only returns items showing a much higher similarity. Experimental evaluations performed on a real multi-variant audio dataset confirm that our approach supports fast and reliable retrieval of audio track variants.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Serra, E. Gomez, P. Herrera, and X. Serra. Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans. on Audio, Speech and Language Processing, 16(6):1138--1152, Aug 2008.
|
| |
2
|
D. Ellis and G. Poliner. Identifying cover songs with chroma features and dynamic programming beat tracking. In Proc. ICASSP'07, volume 4, pages 1429--1432, 2007.
|
| |
3
|
M. Casey and M. Slaney. Song intersection by approximate nearest neighbor search. In Proc. ISMIR'06, pages 144--149, 2006.
|
| |
4
|
Y. Yu, K. Joe, and J. S. Downie. Efficient query-by-content audio retrieval by locality sensitive hashing and partial sequence comparison. IEICE Trans. on Information and Systems, E91-D(6):1730--1739, Jun 2008.
|
| |
5
|
M. Lesaffre and M. Leman. Using fuzzy to handle semantic descriptions of music in a content-based retrieval system. In Proc. LSAS'06, pages 43--54, 2006.
|
| |
6
|
Y. Yu, J. S. Downie, F. Moerchen, L. Chen, and K. Joe. Using exact locality sensitive mapping to group and detect audio-based cover songs. In Proc. IEEE ISM'08, pages 302--309, 2008.
|
| |
7
|
B. Cui, J. Shen, G. Cong, H. Shen, and C. Yu. Exploring composite acoustic features for efficient music similarity query. In Proc. ACM MM'06, pages 634--642, 2006.
|
| |
8
|
F. Moerchen, I. Mierswa, and A. Ultsch. Understandable models of music collection based on exhaustive feature generation with temporal statistics. In Proc. ACM KDD'06, pages 882--891, 2006.
|
| |
9
|
W. H. Tsai, H. M. Yu, and H. M. Wang. A query-by-example technique for retrieving cover versions of popular songs with similar melodies. In Proc. ISMIR'05, pages 183--190, 2005.
|
| |
10
|
R. Miotto and N. Orio. A methodology for the segmentation and identification of music works. In Proc. ISMIR'07, pages 239--244, 2007.
|
| |
11
|
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multiprobe lsh: efficient indexing for high dimensional similarity search. In Proc. VLDB'07, pages 950--961, 2007.
|
| |
12
|
N. C. Maddage, H. Li, and M. S. Kankanhalli. Music structure based vector space retrieval. In Proc. ACM SIGIR'06, pages 67--74, 2006.
|
| |
13
|
T. Pohle, M. Schedl, P. Knees, and G. Widmer. Automatically adapting the structure of audio similarity spaces. In Proc. 1st Workshop on Learning the Semantics of Audio Signals (LSAS), pages 66--75, 2006.
|
| |
14
|
J. P. Bello. Audio-based cover song retrieval using approximate chord sequences: testing shifts, gaps, swaps and beats. In Proc. ISMIR'07, pages 239--244, 2007.
|
| |
15
|
I. Karydis, A. Nanopoulos, A. N. Papadopoulos, and Y. Manolopoulos. Audio indexing for efficient music information retrieval. In Proc. MMM'05, pages 22--29, 2005.
|
| |
16
|
J. Reiss, J. J. Aucouturier, and M. Sandler. Efficient multi dimensional searching routines for music information retrieval. In Proc. ISMIR'01, pages 15--20, 2001.
|
| |
17
|
N. Bertin and A. Cheveigne. Scalable metadata and quick retrieval of audio signals. In Proc. ISMIR'05, pages 238--244, 2005.
|
| |
18
|
C. Yang. Efficient acoustic index for music retrieval with various degrees of similarity. In Proc. ACM MM'02, pages 584--591, 2002.
|
| |
19
|
Y. Yu, C. Watanabe, and K. Joe. Towards a fast and efficient match algorithm for content-based music retrieval on acoustic data. In Proc. ISMIR'05, pages 696--701, 2005.
|
| |
20
|
M. A. Bartsch and G. H. Wakefield. Audio thumbnailing of popular music using chroma-based representations. IEEE Trans. on Multimedia, 7(1):96--104, Feb 2005.
|
| |
21
|
P. Indyk and N. Thaper. Fast color image retrieval via embeddings. In Proc. Workshop on Statistical and Computational Theories of Vision, 2003.
|
| |
22
|
S. Hu. Efficient video retrieval by locality sensitive hashing. In Proc. ICASSP'05, pages 449--452, 2005.
|
| |
23
|
|
| |
24
|
R. Jain, R. Kasturi, and B. G. Schunck. Machine Vision. McGraw-Hill, 1995.
|
|