| Scaling up all pairs similarity search |
| Full text |
Pdf
(507 KB)
|
Source
|
International World Wide Web Conference
archive
Proceedings of the 16th international conference on World Wide Web
table of contents
Banff, Alberta, Canada
SESSION: Similarity search
table of contents
Pages: 131 - 140
Year of Publication: 2007
ISBN:978-1-59593-654-7
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 132, Downloads (12 Months): 287, Citation Count: 13
|
|
|
ABSTRACT
Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such as cosine distance) is above a given threshold. We propose a simple algorithm based on novel indexing and optimization strategies that solves this problem without relying on approximation methods or extensive parameter tuning. We show the approach efficiently handles a variety of datasets across a wide setting of similarity thresholds, with large speedups over previous state-of-the-art approaches.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
 |
3
|
Christian Böhm , Bernhard Braunmüller , Markus Breunig , Hans-Peter Kriegel, High performance clustering based on the similarity join, Proceedings of the ninth international conference on Information and knowledge management, p.298-305, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354832]
|
| |
4
|
Andrei Z. Broder , Steven C. Glassman , Mark S. Manasse , Geoffrey Zweig, Syntactic clustering of the Web, Selected papers from the sixth international conference on World Wide Web, p.1157-1166, September 1997, Santa Clara, California, United States
|
 |
5
|
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
A. Moffat, R. Sacks-Davis, R. Wilkinson, & J. Zobel (1994). Retrieval of partial documents. In The Second Text REtrieval Conference, 181--190.
|
 |
15
|
|
| |
16
|
|
| |
17
|
M. Persin, J. Zobel, & R. Sacks-Davis (1994). Fast document ranking for large scale information retrieval. In Proc. of the First Int'l Conf. on Applications of Databases, Lecture Notes in Computer Science v819, 253--266.
|
| |
18
|
|
 |
19
|
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
| |
23
|
|
CITED BY 13
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Benjamin E. Teitler , Michael D. Lieberman , Daniele Panozzo , Jagan Sankaranarayanan , Hanan Samet , Jon Sperling, NewsStand: a new view on news, Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems, November 05-07, 2008, Irvine, California
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wei Wang , Chuan Xiao , Xuemin Lin , Chengqi Zhang, Efficient approximate entity extraction with edit distance constraints, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
|
|