ACM Home Page
Please provide us with feedback. Feedback
A novel optimization approach to efficiently process aggregate similarity queries in metric access methods
Full text PdfPdf (628 KB)
Source
Conference on Information and Knowledge Management archive
Proceeding of the 17th ACM conference on Information and knowledge management table of contents
Napa Valley, California, USA
SESSION: DB: efficient maintenance and query optimization table of contents
Pages 193-202  
Year of Publication: 2008
ISBN:978-1-59593-991-3
Authors
Humberto L. Razente  University of São Paulo (USP), São Carlos (SP), Brazil
Maria Camila N. Barioni  Federal University of ABC (UFABC), Santo Andre (SP), Brazil
Agma Juci M. Traina  University of São Paulo (USP), Sao Carlos (SP), Brazil
Christos Faloutsos  Carnegie Mellon University (CMU), Pittsburgh, PA, USA
Caetano Traina, Jr.  University of São Paulo (USP), Sao Carlos (SP), Brazil
Sponsors
ACM: Association for Computing Machinery
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 134,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1458082.1458110
What is a DOI?

ABSTRACT

A similarity query considers an element as the query center and searches a dataset to find either the elements far up to a bounding radius or the k nearest ones from the query center. Several algorithms have been developed to efficiently execute similarity queries. However, there are queries that require more than one center, which we call Aggregate Similarity Queries. Such queries appear when the user gives multiple desirable examples, and requests data elements that are similar to all of the examples, as in the case of applying relevance feedback. Here we give the first algorithms that can handle aggregate similarity queries on Metric Access Methods (MAM) such as the M-tree and Slim-tree. Our method, which we call Metric Aggregate Similarity Search (MASS) has the following properties: (a) it requires only the triangle inequality property; (b) it guarantees no false-dismissals, as we prove that it lower-bounds the aggregate distance scores; (c) it can work with any MAM; (d) it can handle any number of query centers, which are either scattered all over the space or concentrated on a restricted region. Experiments on both real and synthetic data show that our method scales on both the number of elements and, if the dataset is in a spatial domain, also on its dimensionality. Moreover, it achieves better results than previous related methods.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. University of California, Irvine, http://archive.ics.uci.edu/ml/.
 
2
 
3
 
4
5
 
6
7
8
9
 
10
11
 
12
 
13
 
14
 
15

Collaborative Colleagues:
Humberto L. Razente: colleagues
Maria Camila N. Barioni: colleagues
Agma Juci M. Traina: colleagues
Christos Faloutsos: colleagues
Caetano Traina, Jr.: colleagues