|
ABSTRACT
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly. In this paper, we study the effectiveness and the efficiency issues of answering top-k keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. Compared with previous approaches, our new ranking method is simple yet effective, and agrees with human perceptions. We also study efficient query processing methods for the new ranking method, and propose algorithms that have minimal accesses to the database. We have conducted extensive experiments on large-scale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, pages 5--16, 2002.
|
| |
2
|
Holger Bast , Debapriyo Majumdar , Ralf Schenkel , Martin Theobald , Gerhard Weikum, IO-Top-k: index-access optimized top-k query processing, Proceedings of the 32nd international conference on Very large data bases, September 12-15, 2006, Seoul, Korea
|
| |
3
|
|
| |
4
|
S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001.
|
 |
5
|
|
| |
6
|
S. Chaudhuri, R. Ramakrishnan, and G. Weikum. Integrating db and ir technologies: What is the sound of one hand clapping? In CIDR, pages 1--12, 2005.
|
| |
7
|
R. Cyganiak. D2RQ benchemarking. http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2rq/benchmarks/.
|
| |
8
|
|
| |
9
|
B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, 2007.
|
| |
10
|
|
| |
11
|
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.
|
| |
12
|
R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In VLDB, 1998.
|
 |
13
|
|
| |
14
|
P. J. Haas and J. M. Hellerstein. Ripple joins for online aggregation. In SIGMOD 1999, pages 287--298, 1999.
|
| |
15
|
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-Style Keyword Search over Relational Databases. In VLDB, 2003.
|
| |
16
|
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, pages 670--681, 2002.
|
| |
17
|
|
| |
18
|
Varun Kacholia , Shashank Pandit , Soumen Chakrabarti , S. Sudarshan , Rushi Desai , Hrishikesh Karambelkar, Bidirectional expansion for keyword search on graph databases, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
| |
19
|
B. Kimelfeld and Y. Sagiv. Efficient engines for keyword proximity search. In WebDB, pages 67--72, 2005.
|
 |
20
|
|
| |
21
|
G. Koutrika, A. Simitsis, and Y. Ioannidis. Précis: The essence of a query answer. In ICDE, 2006.
|
 |
22
|
Fang Liu , Clement Yu , Weiyi Meng , Abdur Chowdhury, Effective keyword search in relational databases, Proceedings of the 2006 ACM SIGMOD international conference on Management of data, June 27-29, 2006, Chicago, IL, USA
[doi> 10.1145/1142473.1142536]
|
| |
23
|
Y. Luo, X. Lin, W. Wang, and X. Zhou. SPARK: Top-k keyword query in relational databases. Technical Report 0708, School of Computer Science and Engineering, University of New South Wales, 2007.
|
| |
24
|
N. Mamoulis, K. H. Cheng, M. L. Yiu, and D. W. Cheung. Efficient aggregation of ranked inputs. In ICDE, 2006.
|
| |
25
|
|
 |
26
|
|
| |
27
|
D. E. Rose and D. R. Cutting. Ranking for usability: Enhanced retrieval for short queries. Technical Report 163, Apple Technical Report, 1996.
|
 |
28
|
|
| |
29
|
M. Sayyadan, H. LeKhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.
|
| |
30
|
Q. Su and J. Widom. Indexing relational database content offline for efficient keyword-based search. In IDEAS, 2005.
|
| |
31
|
R. Wilkinson, J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In TREC, 1995.
|
| |
32
|
|
CITED BY 22
|
|
|
|
|
Guoliang Li , Beng Chin Ooi , Jianhua Feng , Jianyong Wang , Lizhu Zhou, EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Guoliang Li , Shengyue Ji , Chen Li , Jianhua Feng, Efficient type-ahead search on relational data: a TASTIER approach, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
|
|
|
Yueguo Chen , Su Chen , Yu Gu , Mei Hui , Feng Li , Chen Liu , Liangxu Liu , Beng Chin Ooi , Xiaoyan Yang , Dongxiang Zhang , Yuan Zhou, MarcoPolo: a community system for sharing and integrating travel information on maps, Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, March 24-26, 2009, Saint Petersburg, Russia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Yi Chen , Wei Wang , Ziyang Liu , Xuemin Lin, Keyword search on structured and semi-structured data, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
Thomas Neumann , Matthias Bender , Sebastian Michel , Ralf Schenkel , Peter Triantafillou , Gerhard Weikum, Distributed top-k aggregation queries at large, Distributed and Parallel Databases, v.26 n.1, p.3-27, August 2009
|
|
|
|
|
|
|
|