ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Efficient online top-K retrieval with arbitrary similarity measures
Full text PdfPdf (467 KB)
Source ACM International Conference Proceeding Series; Vol. 261 archive
Proceedings of the 11th international conference on Extending database technology: Advances in database technology table of contents
Nantes, France
SESSION: Research sessions: Skyline, top-k, preferences table of contents
Pages: 356-367  
Year of Publication: 2008
ISBN:978-1-59593-926-5
Authors
Prasad M Deshpande  IBM India Research Lab, Bangalore, India
Deepak P  IBM India Research Lab, Bangalore, India
Krishna Kummamuru  IBM India Research Lab, Bangalore, India
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 53,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1353343.1353388
What is a DOI?

ABSTRACT

The top-k retrieval problem requires finding k objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. We consider the case where the similarities between attribute values are arbitrary (non-metric), due to which standard space partitioning indexes cannot be used. Among the most popular techniques that can handle arbitrary similarity measures is the family of threshold algorithms. These were designed as middleware algorithms that assume that similarity lists for each attribute are available and focus on efficiently merging these lists to arrive at the results. In this paper, we explore multi-dimensional indexing of non-metric spaces that can lead to efficient pruning of the search space utilizing inter-attribute relationships, during top-k computation. We propose an indexing structure, the AL-Tree and an algorithm to do top-k retrieval using it in an online fashion. The ALTree exploits the fact that many real world attributes come from a small value space. We show that our algorithm performs much better than the threshold based algorithms in terms of computational cost due to efficient pruning of the search space. Further, it out-performs them in terms of IOs by upto an order of magnitude in case of dense datasets.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
How fast is your disk? http://www.linuxinsight.com/how_fast_is_your_disk.html, January 2007.
 
2
 
3
4
 
5
W. Chung, Gray and Horst. Windows 2000 disk io performance. Microsoft Research Technical Report, MSTR-2000-55, June 2000.
 
6
C. B. D. J. Newman, S. Hettich and C. Merz. UCI repository of machine learning databases, 1998.
 
7
8
9
 
10
 
11
K. Goh, B. Li, and E. Chang. Dyndex: A dynamic and nonmetric space indexer, 2002.
 
12
U. Guntzer, W.-T. Balke, and W. Kiesling. Towards efficient multi-feature queries in heterogeneous environments. itcc, 00:0622, 2001.
 
13
 
14
 
15
 
16
T. Mandl. Learning similarity functions in information retrieval. In EUFIT, pages 771--775, 1998.
17
 
18
T. Skopal. On fast non-metric similarity search by metric access methods. In EDBT, pages 718--736, 2006.
 
19
 
20
J. K. Uhlmann. Satisfying general proximity/similarity queries with metric trees. Information Processing Letters, 40(4):175--179, 1991.
 
21
22
 
23


Collaborative Colleagues:
Prasad M Deshpande: colleagues
Deepak P: colleagues
Krishna Kummamuru: colleagues