| Efficient online top-K retrieval with arbitrary similarity measures |
| Full text |
Pdf
(467 KB)
|
| Source
|
ACM International Conference Proceeding Series; Vol. 261
archive
Proceedings of the 11th international conference on Extending database technology: Advances in database technology
table of contents
Nantes, France
SESSION: Research sessions: Skyline, top-k, preferences
table of contents
Pages: 356-367
Year of Publication: 2008
ISBN:978-1-59593-926-5
|
|
Authors
|
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 53, Citation Count: 1
|
|
|
ABSTRACT
The top-k retrieval problem requires finding k objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. We consider the case where the similarities between attribute values are arbitrary (non-metric), due to which standard space partitioning indexes cannot be used. Among the most popular techniques that can handle arbitrary similarity measures is the family of threshold algorithms. These were designed as middleware algorithms that assume that similarity lists for each attribute are available and focus on efficiently merging these lists to arrive at the results. In this paper, we explore multi-dimensional indexing of non-metric spaces that can lead to efficient pruning of the search space utilizing inter-attribute relationships, during top-k computation. We propose an indexing structure, the AL-Tree and an algorithm to do top-k retrieval using it in an online fashion. The ALTree exploits the fact that many real world attributes come from a small value space. We show that our algorithm performs much better than the threshold based algorithms in terms of computational cost due to efficient pruning of the search space. Further, it out-performs them in terms of IOs by upto an order of magnitude in case of dense datasets.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
How fast is your disk? http://www.linuxinsight.com/how_fast_is_your_disk.html, January 2007.
|
| |
2
|
Holger Bast , Debapriyo Majumdar , Ralf Schenkel , Martin Theobald , Gerhard Weikum, IO-Top-k: index-access optimized top-k query processing, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
| |
3
|
|
 |
4
|
|
| |
5
|
W. Chung, Gray and Horst. Windows 2000 disk io performance. Microsoft Research Technical Report, MSTR-2000-55, June 2000.
|
| |
6
|
C. B. D. J. Newman, S. Hettich and C. Merz. UCI repository of machine learning databases, 1998.
|
| |
7
|
|
 |
8
|
|
 |
9
|
|
| |
10
|
|
| |
11
|
K. Goh, B. Li, and E. Chang. Dyndex: A dynamic and nonmetric space indexer, 2002.
|
| |
12
|
U. Guntzer, W.-T. Balke, and W. Kiesling. Towards efficient multi-feature queries in heterogeneous environments. itcc, 00:0622, 2001.
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
T. Mandl. Learning similarity functions in information retrieval. In EUFIT, pages 771--775, 1998.
|
 |
17
|
|
| |
18
|
T. Skopal. On fast non-metric similarity search by metric access methods. In EDBT, pages 718--736, 2006.
|
| |
19
|
|
| |
20
|
J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40(4):175--179, 1991.
|
| |
21
|
|
 |
22
|
|
| |
23
|
|
|