| A novel optimization approach to efficiently process aggregate similarity queries in metric access methods |
| Full text |
Pdf
(628 KB)
|
Source
|
Conference on Information and Knowledge Management
archive
Proceeding of the 17th ACM conference on Information and knowledge management
table of contents
Napa Valley, California, USA
SESSION: DB: efficient maintenance and query optimization
table of contents
Pages 193-202
Year of Publication: 2008
ISBN:978-1-59593-991-3
|
|
Authors
|
|
Humberto L. Razente
|
University of São Paulo (USP), São Carlos (SP), Brazil
|
|
Maria Camila N. Barioni
|
Federal University of ABC (UFABC), Santo Andre (SP), Brazil
|
|
Agma Juci M. Traina
|
University of São Paulo (USP), Sao Carlos (SP), Brazil
|
|
Christos Faloutsos
|
Carnegie Mellon University (CMU), Pittsburgh, PA, USA
|
|
Caetano Traina, Jr.
|
University of São Paulo (USP), Sao Carlos (SP), Brazil
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 16, Downloads (12 Months): 134, Citation Count: 0
|
|
|
ABSTRACT
A similarity query considers an element as the query center and searches a dataset to find either the elements far up to a bounding radius or the k nearest ones from the query center. Several algorithms have been developed to efficiently execute similarity queries. However, there are queries that require more than one center, which we call Aggregate Similarity Queries. Such queries appear when the user gives multiple desirable examples, and requests data elements that are similar to all of the examples, as in the case of applying relevance feedback. Here we give the first algorithms that can handle aggregate similarity queries on Metric Access Methods (MAM) such as the M-tree and Slim-tree. Our method, which we call Metric Aggregate Similarity Search (MASS) has the following properties: (a) it requires only the triangle inequality property; (b) it guarantees no false-dismissals, as we prove that it lower-bounds the aggregate distance scores; (c) it can work with any MAM; (d) it can handle any number of query centers, which are either scattered all over the space or concentrated on a restricted region. Experiments on both real and synthetic data show that our method scales on both the number of elements and, if the dataset is in a spatial domain, also on its dimensionality. Moreover, it achieves better results than previous related methods.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Asuncion and D. J. Newman. UCI machine learning repository, 2007. University of California, Irvine, http://archive.ics.uci.edu/ml/.
|
| |
2
|
|
| |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
|
 |
7
|
|
 |
8
|
Humberto L. Razente , Maria Camila N. Barioni , Agma J. M. Traina , Caetano Traina, Jr., Aggregate similarity queries in relevance feedback methods for content-based image retrieval, Proceedings of the 2008 ACM symposium on Applied computing, March 16-20, 2008, Fortaleza, Ceara, Brazil
[doi> 10.1145/1363686.1363887]
|
 |
9
|
Nick Roussopoulos , Stephen Kelley , Frédéric Vincent, Nearest neighbor queries, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.71-79, May 22-25, 1995, San Jose, California, United States
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
|