ACM Home Page
Please provide us with feedback. Feedback
Supporting ranking and clustering as generalized order-by and group-by
Full text PdfPdf (413 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2007 ACM SIGMOD international conference on Management of data table of contents
Beijing, China
SESSION: Top-k queries and ranking table of contents
Pages: 127 - 138  
Year of Publication: 2007
ISBN:978-1-59593-686-8
Authors
Chengkai Li  University of Illinois at Urbana-Champaign, Urbana, IL
Min Wang  IBM T.J. Watson Research Center, Hawthorne, NY
Lipyeow Lim  IBM T.J. Watson Research Center, Hawthorne, NY
Haixun Wang  IBM T.J. Watson Research Center, Hawthorne, NY
Kevin Chen-Chuan Chang  University of Illinois at Urbana-Champaign, Urbana, IL
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 128,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1247480.1247496
What is a DOI?

ABSTRACT

The Boolean semantics of SQL queries cannot adequately capture the "fuzzy" preferences and "soft" criteria required in non-traditional data retrieval applications. One way to solve this problem is to add a flavor of "information retrieval" into database queries by allowing fuzzy query conditions and flexibly supporting grouping and ranking of the query results within the DBMS engine. While ranking is already supported by all major commercial DBMSs natively, support of flexibly grouping is still very limited (i.e., group-by).

In this paper, we propose to generalize group-by to enable flexible grouping (clustering specifically) of the query results. Different from clustering in data mining applications, our focus is on supporting efficient clustering of Boolean results generated at query time. Moreover, we propose to integrate ranking and clustering with Boolean conditions, forming a new type of ClusterRank query to allow structured data retrieval. Such an integration is nontrivial in terms of both semantics and query processing. We investigate various semantics of this type of queries. To process such queries, a straightforward approach is to simply glue the techniques developed for ranking-only and clustering-only together. This approach is costly since both ranking and clustering are treated as blocking post-processing tasks upon Boolean query results by existing techniques. We propose a summary-based evaluation method that utilizes bitmap index to seamlessly integrate Boolean conditions, clustering, and ranking. Experimental study shows that our approach significantly outperforms the straightforward one and maintains high clustering quality.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis. Automated ranking of database query results. In CIDR, 2003.
 
2
P. Berkhin. Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, 2002.
 
3
M. J. Carey and D. Kossmann. On saying "enough already!" in SQL. In SIGMOD, pages 219--230, 1997.
4
 
5
C. Y. Chan and Y. E. Ioannidis. An efficient bitmap encoding scheme for selection queries. In SIGMOD, 1999.
 
6
 
7
S. Chaudhuri, R. Ramakrishnan, and G. Weikum. Integrating DB and IR technologies: What is the sound of one hand clapping? In CIDR, pages 1--12, 2005.
 
8
D. Donjerkovic and R. Ramakrishnan. Probabilistic optimization of top n queries. In VLDB, 1999.
 
9
R. Fagin, A. Lote, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.
10
11
12
 
13
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, New York, 2000.
 
14
V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A system for the efficient execution of multi-parametric ranked queries. SIGMOD, 2001.
 
15
I. F. Ilyas, W. G. Aref, and A. K. Elmagarmid. Supporting top-k join queries in relational databases. In VLDB, pages 754--765, 2003.
16
 
17
K. Kerdprasop, N. Kerdprasop, and P. Sattayatham. Weighted k-means for density-biased clustering. In DaWaK, pages 488--497, 2005.
18
 
19
A. Leuski and J. Allan. Improving interactive retrieval by combining ranked lists and clustering. In RIAO, pages 665--681, 2000.
20
 
21
 
22
23
 
24
P. E. O'Neil and D. Quass. Improved query performance with variant indexes. In SIGMOD, pages 38--49, 1997.
 
25
 
26
R. R. Sinha, S. Mitra, and M. Winslett. Bitmap indexes for large scientific data sets: A case study. In IPDPS, 2006.
 
27
28
 
29
 
30
F. Zemke, K. Kulkarni, A. Witkowski, and B. Lyle. Introduction to OLAP function. Change proposal. ANS-NCTS H2-99-14 (April), 1999.
 
31
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An efficient data clustering method for very large databases. In SIGMOD, pages 103--114, 1996.


Collaborative Colleagues:
Chengkai Li: colleagues
Min Wang: colleagues
Lipyeow Lim: colleagues
Haixun Wang: colleagues
Kevin Chen-Chuan Chang: colleagues