ACM Home Page
Please provide us with feedback. Feedback
Similarity caching
Full text PdfPdf (484 KB)
Source
Symposium on Principles of Database Systems archive
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems table of contents
Providence, Rhode Island, USA
SESSION: Data analysis and optimization table of contents
Pages 127-136  
Year of Publication: 2009
ISBN:978-1-60558-553-6
Authors
Flavio Chierichetti  Sapienza University of Rome, Dipartmento di Informatica, Roma, Italy
Ravi Kumar  Yahoo! Research, Sunnyvale, CA, USA
Sergei Vassilvitskii  Yahoo! Research, Sunnyvale, CA, USA
Sponsors
ACM: Association for Computing Machinery
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGMOD: ACM Special Interest Group on Management of Data
SIGART: ACM Special Interest Group on Artificial Intelligence
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 35,   Downloads (12 Months): 111,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1559795.1559815
What is a DOI?

ABSTRACT

We introduce the similarity caching problem, a variant of classical caching in which an algorithm can return an element from the cache that is similar, but not necessarily identical, to the query element. We are motivated by buffer management questions in approximate nearest-neighbor applications, especially in the context of caching targeted advertisements on the web. Formally, we assume the queries lie in a metric space, with distance function d(.,.). A query p is considered a cache hit if there is a point q in the cache that is sufficiently close to p, i.e., for a threshold radius r, we have d(p,q) ≤ r. The goal is then to minimize the number of cache misses, vis-à-vis the optimal algorithm. As with classical caching, we use the competitive ratio to measure the performance of different algorithms.

While similarity caching is a strict generalization of classical caching, we show that unless the algorithm is allowed extra power (either in the size of the cache or the threshold r) over the optimal offline algorithm, the problem is intractable. We then proceed to quantify the hardness as a function of the complexity of the underlying metric space. We show that the problem becomes easier as we proceed from general metric spaces to those of bounded doubling dimension, and to Euclidean metrics. Finally, we investigate several extensions of the problem: dependence of the threshold r on the query and a smoother trade-off between the cache-miss cost and the query-query similarity.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
4
 
5
H.T. Chou and D.J. DeWitt. An evaluation of buffer management strategies for relational database systems. Algorithmica, 1(3):311--336, 1986.
 
6
M. Chrobak and L.L. Larmore. Metrical service systems: Deterministic strategies. Manuscript.
7
8
9
 
10
 
11
 
12
 
13
T.F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293--306, 1985.
14
 
15
16
17
18
19
 
20
A. Sharp. Thoughts on the competitive ratio. Manuscript.
 
21
22
23

Collaborative Colleagues:
Flavio Chierichetti: colleagues
Ravi Kumar: colleagues
Sergei Vassilvitskii: colleagues