|
ABSTRACT
The wide popularity of free-and-easy keyword based searches over World Wide Web has fueled the demand for incorporating keyword-based search over structured databases. However, most of the current research work focuses on keyword-based searching over a single structured data source. With the growing interest in distributed databases and service oriented architecture over the Internet, it is important to extend such a capability over multiple structured data sources. One of the most important problems for enabling such a query facility is to be able to select the most useful data sources relevant to the keyword query. Traditional database summary techniques used for selecting unstructured datasources developed in IR literature are inadequate for our problem, as they do not capture the structure of the data sources. In this paper, we study the database selection problem for relational data sources, and propose a method that effectively summarizes the relationships between keywords in a relational database based on its structure. We develop effective ranking methods based on the keyword relationship summaries in order to select the most useful databases for a given keyword query. We have implemented our system on PlanetLab. In that environment we use extensive experiments with real datasets to demonstrate the effectiveness of our proposed summarization method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.
|
| |
2
|
A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, 2004.
|
| |
3
|
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In ICDE, 2002.
|
 |
4
|
|
| |
5
|
J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR, 1995.
|
| |
6
|
G. Cao, J. Y. Nie, and J. Bai. Integrating word relationships into language models. In SIGIR, 2005.
|
 |
7
|
Brent Chun , David Culler , Timothy Roscoe , Andy Bavier , Larry Peterson , Mike Wawrzoniak , Mic Bowman, PlanetLab: an overlay testbed for broad-coverage services, ACM SIGCOMM Computer Communication Review, v.33 n.3, July 2003
[doi> 10.1145/956993.956995]
|
| |
8
|
P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proc. 1996 Int. Conf. Machine Learning (ML'96), pages 105--112, 1996.
|
| |
9
|
|
| |
10
|
C. Fellbaum, editor. WordNet: An Electronic Lexical Database. Bradford Books, 1998.
|
| |
11
|
J. Gao, J. Y. Nie, G. Wu, and G. Cao. Dependence language model for information retrieval. In SIGIR, 2004.
|
 |
12
|
|
| |
13
|
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient IR-style keyword search over relational databases. In VLDB, 2003.
|
| |
14
|
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, 2002.
|
| |
15
|
F. K. Hwang, D. S. Richards, and P. Winter. The Steiner Tree Problem. Annals of Discrete Mathematics. North-Holland, 1992.
|
| |
16
|
P. G. Ipeirotis, E. Agichtein, P. Jain, and L. Gravano. To search or to crawl?: towards a query optimizer for text-centric tasks. In SIGMOD, 2006.
|
| |
17
|
P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In VLDB, 2002.
|
| |
18
|
H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: a balanced tree structure for peer-to-peer networks. In VLDB, 2005.
|
| |
19
|
B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, 2006.
|
| |
20
|
A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying heterogeneous information sources using source descriptions. In VLDB, 1996.
|
| |
21
|
F. Liu, C. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In SIGMOD, 2006.
|
 |
22
|
|
| |
23
|
R. Nallapati and J. Allan. Capturing term dependencies using a language model based on sentence trees. In CIKM, 2002.
|
| |
24
|
|
| |
25
|
W. S. Ng, B. C. Ooi, and K. L. Tan. BestPeer: A self-configurable peer-to-peer system. In ICDE, 2002.Poster Paper.
|
 |
26
|
|
| |
27
|
A. Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 24(4):35--43, 2001.
|
| |
28
|
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM, 2001.
|
| |
29
|
C. Yu, W. Meng, W. Wu, and K. L. Liu. Efficient and effective metasearch for text databases incorporating linkages among documents. In SIGMOD, 2001.
|
| |
30
|
|
CITED BY 6
|
|
|
|
|
Guoliang Li , Beng Chin Ooi , Jianhua Feng , Jianyong Wang , Lizhu Zhou, EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, June 09-12, 2008, Vancouver, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Yi Chen , Wei Wang , Ziyang Liu , Xuemin Lin, Keyword search on structured and semi-structured data, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|