|
ABSTRACT
Dataspaces are collections of heterogeneous and partially unstructured data. Unlike data-integration systems that also offer uniform access to heterogeneous data sources, dataspaces do not assume that all the semantic relationships between sources are known and specified. Much of the user interaction with dataspaces involves exploring the data, and users do not have a single schema to which they can pose queries. Consequently, it is important that queries are allowed to specify varying degrees of structure, spanning keyword queries to more structure-aware queries. This paper considers indexing support for queries that combine keywords and structure. We describe several extensions to inverted lists to capture structure when it is present. In particular, our extensions incorporate attribute labels, relationships between data items, hierarchies of schema elements, and synonyms among schema elements. We describe experiments showing that our indexing techniques improve query efficiency by an order of magnitude compared with alternative approaches, and scale well with the size of the data.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In Proc. of ICDE, 2002.
|
| |
2
|
S. Al-Khalifa, H. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.
|
 |
3
|
|
| |
4
|
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, New York, 1999.
|
| |
5
|
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.
|
| |
6
|
H. Bast and I. Weber. Type less, find more: Fast autocompletion search with a succinct index. In SigIR, 2006.
|
| |
7
|
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. of ICDE, 2002.
|
| |
8
|
L. Blunschi, J.-P. Dittrich, O. R. Girard, S. K. Karakashian, and M. A. V. Salles. A dataspace odyssey: The iMeMex personal dataspace management system. In CIDR, 2007.
|
| |
9
|
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal XML pattern matching. In Sigmod, 2002.
|
| |
10
|
S. Chakrabarti, K. Puniyani, and S. Das. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In WWW, 2006.
|
 |
11
|
|
| |
12
|
Zhiyuan Chen , Johannes Gehrke , Flip Korn , Nick Koudas , Jayavel Shanmugasundaram , Divesh Srivastava, Index Structures for Matching XML Twigs Using Relational Query Processors, Proceedings of the 21st International Conference on Data Engineering Workshops, p.1273, April 05-08, 2005
[doi> 10.1109/ICDE.2005.231]
|
| |
13
|
S.-Y. Chien, Z. Vagena, D. Zhang, V. J. Tsotras, and C. Zaniolo. Efficient structural joins on indexed XML documents. In Proc. of VLDB, 2002.
|
| |
14
|
J. Cho and S. Rajagopalan. A fast regular expression indexing engine. In Proc. of ICDE, 2001.
|
 |
15
|
|
| |
16
|
B. F. Cooper, N. Sample, M. J.Franklin, G. R. Hjaltason,and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, 2001.
|
| |
17
|
L. Denoyer and P. Gallinari. The Wikipedia XML Corpus. SIGIR Forum, 2006.
|
| |
18
|
P. DeRose, W. Shen, F. Chen, Y. Lee, D. Burdick, A. Doan, and R. Ramakrishnan. D B Life: A community information management platform for the database research community. In CIDR, 2007.
|
| |
19
|
X. Dong and A. Halevy. A Platform for Personal Information Management and Integration. In CIDR, 2005.
|
| |
20
|
|
| |
21
|
|
| |
22
|
J. Graupmann, R. Schenkel, and G. Weikum. The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents. In VLDB, 2005.
|
| |
23
|
M. Gubanov and P. A. Berstein. Structural text search and comparison using automatically extracted schema. In WebDB, 2006.
|
| |
24
|
A. Y. Halevy, M. J. Franklin, and D. Maier. Principles of dataspace systems. In PODS, 2006.
|
| |
25
|
H. He and J. Yang. Multiresolution indexing of XML for frequent queries. In Proc. of ICDE, 2004.
|
| |
26
|
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, 2002.
|
| |
27
|
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In Proc. of ICDE, 2003.
|
| |
28
|
Jena. http://jena.sourceforge.net/, 2005.
|
| |
29
|
H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-Tree: Indexing XML data for efficient structural joins. In ICDE, 2003.
|
| |
30
|
Y.-J. Joung and L.-W. Yang. KISS: A simple prefix search scheme in P2P networks. In WebDB, 2006.
|
 |
31
|
|
| |
32
|
R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the integration of structure indexes and inverted lists. In Proc. of SIGMOD, 2004.
|
| |
33
|
R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In Proc. of ICDE, 2002.
|
| |
34
|
Lucene. http://jakarta.apache.org/lucene/docs/index.html,2005.
|
| |
35
|
T. Milo and D. Suciu. Index structures for path expressions. In Proc. of ICDT, 1999.
|
| |
36
|
P. Rao and B. Moon. PRIX: Indexing and querying XML using Prufer sequences. In ICDE, 2004.
|
| |
37
|
M. Sayyadian, H. Lekhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.
|
| |
38
|
A. Schmidt, F. Waas, M. Kersten, M. J. Carey, I. Manolescu, and R. Busse. XMark: A benchmark for XML data management. In VLDB, 2002.
|
 |
39
|
|
 |
40
|
|
| |
41
|
W. Wang, H. Jiang, H. Lu, and J. X. Yu. PBiTree coding and efficient processing of containment joins. In ICDE, 2003.
|
| |
42
|
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and indexing documents and images. Morgan Kaufmann Publishers, San Francisco, 1999.
|
| |
43
|
Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. In SIGMOD, 2005.
|
| |
44
|
N. Zhang, T. Ozsu, I. F. Ilyas, and A. Aboulnaga. Fix: Feature-based indexing technique for XML documents. In VLDB, 2006.
|
CITED BY 8
|
|
Michael Cafarella , Edward Chang , Andrew Fikes , Alon Halevy , Wilson Hsieh , Alberto Lerner , Jayant Madhavan , S. Muthukrishnan, Data management projects at Google, ACM SIGMOD Record, v.37 n.1, March 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|