ACM Home Page
Please provide us with feedback. Feedback
Locality sensitive hash functions based on concomitant rank order statistics
Full text PdfPdf (531 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Las Vegas, Nevada, USA
SESSION: Research papers table of contents
Pages 221-229  
Year of Publication: 2008
ISBN:978-1-60558-193-4
Authors
Kave Eshghi  Hewlett Packard Laboratories, Palo Alto, CA, USA
Shyamsundar Rajaram  Hewlett Packard Laboratories, Palo Alto, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 195,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1401890.1401921
What is a DOI?

ABSTRACT

Locality Sensitive Hash functions are invaluable tools for approximate near neighbor problems in high dimensional spaces. In this work, we are focused on LSH schemes where the similarity metric is the cosine measure. The contribution of this work is a new class of locality sensitive hash functions for the cosine similarity measure based on the theory of concomitants, which arises in order statistics. Consider n i.i.d sample pairs, {(X1; Y1); (X2; Y2); : : : ;(Xn; Yn)} obtained from a bivariate distribution f(X, Y). Concomitant theory captures the relation between the order statistics of X and Y in the form of a rank distribution given by Prob(Rank(Yi)=j-Rank(Xi)=k). We exploit properties of the rank distribution towards developing a locality sensitive hash family that has excellent collision rate properties for the cosine measure.

The computational cost of the basic algorithm is high for high hash lengths. We introduce several approximations based on the properties of concomitant order statistics and discrete transforms that perform almost as well, with significantly reduced computational cost. We demonstrate the practical applicability of our algorithms by using it for finding similar images in an image repository.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
4
 
5
H. A. David, M. J. O'Connell, and S. S. Yang. Distribution and expected value of the rank of a concomitant of an order statistic. The Annals of Statistics, 5(1):216--223, January 1977.
 
6
7
8
9
 
10
11
 
12
13
 
14
K. Singh, M. Ma, and D. W. Park. A content-based image retrieval using fft & cosine similarity coefficient. In Signal and Image Processing, 2003.
 
15
Santosh Vempala. The Random Projection Method. American Mathematical Society, Providence, RI, 2004.
 
16
Daniel Zwillinger. CRC Standard Mathematical Tables and Formulae, 31st Edition. CRC Press, 2003.

Collaborative Colleagues:
Kave Eshghi: colleagues
Shyamsundar Rajaram: colleagues