| Efficiently matching sets of features with random histograms |
| Full text |
Pdf
(313 KB)
|
Source
|
International Multimedia Conference
archive
Proceeding of the 16th ACM international conference on Multimedia
table of contents
Vancouver, British Columbia, Canada
SESSION: Content track C5: multimedia content analysis and applications
table of contents
Pages 179-188
Year of Publication: 2008
ISBN:978-1-60558-303-7
|
|
Authors
|
|
Wei Dong
|
Princeton University, Princeton, NJ, USA
|
|
Zhe Wang
|
Princeton University, Princeton, NJ, USA
|
|
Moses Charikar
|
Princeton University, Princeton, NJ, USA
|
|
Kai Li
|
Princeton University, Princeton, NJ, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 15, Downloads (12 Months): 171, Citation Count: 0
|
|
|
ABSTRACT
As the commonly used representation of a feature-rich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a content-based search engine for feature-rich data is to match feature-sets efficiently. Although substantial progress has been made during the past few years, existing approaches are still inefficient and inflexible for building a search engine for massive datasets. This paper presents a randomized algorithm to embed a set of features into a single high-dimensional vector to simplify the feature-set matching problem. The main idea is to project feature vectors into an auxiliary space using locality sensitive hashing and to represent a set of features as a histogram in the auxiliary space. A histogram is simply a high dimensional vector, and efficient similarity measures like L1 and L2 distances can be employed to approximate feature-set distance measures. We evaluated the proposed approach under three different task settings, i.e. content-based image search, image object recognition and near-duplicate video clip detection. The experimental results show that the proposed approach is indeed effective and flexible. It can achieve accuracy comparable to the feature-set matching methods, while requiring significantly less space and time. For object recognition with Caltech 101 dataset, our method runs 25 times faster to achieve the same precision as Pyramid Matching Kernel, the state-of-the-art feature-set matching method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Andrei Z. Broder , Moses Charikar , Alan M. Frieze , Michael Mitzenmacher, Min-wise independent permutations (extended abstract), Proceedings of the thirtieth annual ACM symposium on Theory of computing, p.327-336, May 24-26, 1998, Dallas, Texas, United States
[doi> 10.1145/276698.276781]
|
| |
3
|
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
|
 |
4
|
|
 |
5
|
Mayur Datar , Nicole Immorlica , Piotr Indyk , Vahab S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the twentieth annual symposium on Computational geometry, June 08-11, 2004, Brooklyn, New York, USA
[doi> 10.1145/997817.997857]
|
| |
6
|
|
| |
7
|
W. Dong, M. Charikar, and K. Li. High dimensional similarity search with sketches. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval, Singapore, July 2008.
|
| |
8
|
|
| |
9
|
|
| |
10
|
K. Grauman and T. Darrell. Pyramid match hashing: Sub-linear time indexing over partial correspondences. Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on, pages 1--8, 17-22 June 2007.
|
| |
11
|
|
| |
12
|
K. Grauman and T. Darrell. Approximate correspondences in high dimensions. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 505--512. MIT Press, Cambridge, MA, 2007.
|
| |
13
|
|
 |
14
|
|
| |
15
|
P. Indyk and N. Thaper. Fast image retrieval via embeddings. In 3rd International Workshop on Statistical and Computational Theories of Vision, 2003.
|
 |
16
|
|
| |
17
|
Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. cvpr, 02:506--513, 2004.
|
| |
18
|
A. Kirsch and M. Mitzenmacher. Distance-sensitive bloom filters. In ALENEX '06: Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments, pages 41--50, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics.
|
| |
19
|
|
| |
20
|
B. Leibe and B. Schiele. Analyzing appearance and contour based methods for object categorization. cvpr, 02:409, 2003.
|
| |
21
|
|
 |
22
|
|
 |
23
|
Qin Lv , William Josephson , Zhe Wang , Moses Charikar , Kai Li, Ferret: a toolkit for content-based similarity search of feature-rich data, Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, April 18-21, 2006, Leuven, Belgium
|
| |
24
|
S. Lyu. A kernel between unordered sets of data: The gaussian mixture approach. In European Conference on Machine Learning (ECML), Porto, Portugal, 2005.
|
| |
25
|
|
| |
26
|
A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Weak Hypotheses and Boosting for Generic Object Detection and Recognition, volume 3022/2004 of Lecture Notes in Computer Science, pages 71--84. Springer Berlin / Heidelberg, 2004.
|
| |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
|