ACM Home Page
Please provide us with feedback. Feedback
Scaling up all pairs similarity search
Full text PdfPdf (507 KB)
Source
International World Wide Web Conference archive
Proceedings of the 16th international conference on World Wide Web table of contents
Banff, Alberta, Canada
SESSION: Similarity search table of contents
Pages: 131 - 140  
Year of Publication: 2007
ISBN:978-1-59593-654-7
Authors
Roberto J. Bayardo  Google: Inc., Mountain View, CA
Yiming Ma  University of California: Irvine, Irvine, CA
Ramakrishnan Srikant  Google: Inc., Mountain View, CA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 132,   Downloads (12 Months): 287,   Citation Count: 13
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1242572.1242591
What is a DOI?

ABSTRACT

Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such as cosine distance) is above a given threshold. We propose a simple algorithm based on novel indexing and optimization strategies that solves this problem without relying on approximation methods or extensive parameter tuning. We show the approach efficiently handles a variety of datasets across a wide setting of similarity thresholds, with large speedups over previous state-of-the-art approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
3
 
4
5
6
 
7
8
9
10
 
11
12
13
 
14
A. Moffat, R. Sacks-Davis, R. Wilkinson, & J. Zobel (1994). Retrieval of partial documents. In The Second Text REtrieval Conference, 181--190.
15
 
16
 
17
M. Persin, J. Zobel, & R. Sacks-Davis (1994). Fast document ranking for large scale information retrieval. In Proc. of the First Int'l Conf. on Applications of Databases, Lecture Notes in Computer Science v819, 253--266.
 
18
19
20
21
22
 
23

CITED BY  13

Collaborative Colleagues:
Roberto J. Bayardo: colleagues
Yiming Ma: colleagues
Ramakrishnan Srikant: colleagues