ACM Home Page
Please provide us with feedback. Feedback
Efficiently matching sets of features with random histograms
Full text PdfPdf (313 KB)
Source
International Multimedia Conference archive
Proceeding of the 16th ACM international conference on Multimedia table of contents
Vancouver, British Columbia, Canada
SESSION: Content track C5: multimedia content analysis and applications table of contents
Pages 179-188  
Year of Publication: 2008
ISBN:978-1-60558-303-7
Authors
Wei Dong  Princeton University, Princeton, NJ, USA
Zhe Wang  Princeton University, Princeton, NJ, USA
Moses Charikar  Princeton University, Princeton, NJ, USA
Kai Li  Princeton University, Princeton, NJ, USA
Sponsors
ACM: Association for Computing Machinery
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 15,   Downloads (12 Months): 171,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1459359.1459384
What is a DOI?

ABSTRACT

As the commonly used representation of a feature-rich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a content-based search engine for feature-rich data is to match feature-sets efficiently. Although substantial progress has been made during the past few years, existing approaches are still inefficient and inflexible for building a search engine for massive datasets. This paper presents a randomized algorithm to embed a set of features into a single high-dimensional vector to simplify the feature-set matching problem. The main idea is to project feature vectors into an auxiliary space using locality sensitive hashing and to represent a set of features as a histogram in the auxiliary space. A histogram is simply a high dimensional vector, and efficient similarity measures like L1 and L2 distances can be employed to approximate feature-set distance measures.

We evaluated the proposed approach under three different task settings, i.e. content-based image search, image object recognition and near-duplicate video clip detection. The experimental results show that the proposed approach is indeed effective and flexible. It can achieve accuracy comparable to the feature-set matching methods, while requiring significantly less space and time. For object recognition with Caltech 101 dataset, our method runs 25 times faster to achieve the same precision as Pyramid Matching Kernel, the state-of-the-art feature-set matching method.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
4
5
 
6
 
7
W. Dong, M. Charikar, and K. Li. High dimensional similarity search with sketches. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval, Singapore, July 2008.
 
8
 
9
 
10
K. Grauman and T. Darrell. Pyramid match hashing: Sub-linear time indexing over partial correspondences. Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on, pages 1--8, 17-22 June 2007.
 
11
 
12
K. Grauman and T. Darrell. Approximate correspondences in high dimensions. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 505--512. MIT Press, Cambridge, MA, 2007.
 
13
14
 
15
P. Indyk and N. Thaper. Fast image retrieval via embeddings. In 3rd International Workshop on Statistical and Computational Theories of Vision, 2003.
16
 
17
Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. cvpr, 02:506--513, 2004.
 
18
A. Kirsch and M. Mitzenmacher. Distance-sensitive bloom filters. In ALENEX '06: Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments, pages 41--50, Philadelphia, PA, USA, 2006. Society for Industrial and Applied Mathematics.
 
19
 
20
B. Leibe and B. Schiele. Analyzing appearance and contour based methods for object categorization. cvpr, 02:409, 2003.
 
21
22
23
 
24
S. Lyu. A kernel between unordered sets of data: The gaussian mixture approach. In European Conference on Machine Learning (ECML), Porto, Portugal, 2005.
 
25
 
26
A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Weak Hypotheses and Boosting for Generic Object Detection and Recognition, volume 3022/2004 of Lecture Notes in Computer Science, pages 71--84. Springer Berlin / Heidelberg, 2004.
 
27
 
28
29
 
30

Collaborative Colleagues:
Wei Dong: colleagues
Zhe Wang: colleagues
Moses Charikar: colleagues
Kai Li: colleagues