ACM Home Page
Please provide us with feedback. Feedback
Indexing dataspaces
Full text PdfPdf (279 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2007 ACM SIGMOD international conference on Management of data table of contents
Beijing, China
SESSION: Database technology for novel applications table of contents
Pages: 43 - 54  
Year of Publication: 2007
ISBN:978-1-59593-686-8
Authors
Xin Dong  University of Washington, Seattle, WA
Alon Halevy  Google Inc., Mountain View, CA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 31,   Downloads (12 Months): 233,   Citation Count: 4
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1247480.1247487
What is a DOI?

ABSTRACT

Dataspaces are collections of heterogeneous and partially unstructured data. Unlike data-integration systems that also offer uniform access to heterogeneous data sources, dataspaces do not assume that all the semantic relationships between sources are known and specified. Much of the user interaction with dataspaces involves exploring the data, and users do not have a single schema to which they can pose queries. Consequently, it is important that queries are allowed to specify varying degrees of structure, spanning keyword queries to more structure-aware queries.

This paper considers indexing support for queries that combine keywords and structure. We describe several extensions to inverted lists to capture structure when it is present. In particular, our extensions incorporate attribute labels, relationships between data items, hierarchies of schema elements, and synonyms among schema elements. We describe experiments showing that our indexing techniques improve query efficiency by an order of magnitude compared with alternative approaches, and scale well with the size of the data.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In Proc. of ICDE, 2002.
 
2
S. Al-Khalifa, H. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.
3
 
4
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, New York, 1999.
 
5
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.
 
6
H. Bast and I. Weber. Type less, find more: Fast autocompletion search with a succinct index. In SigIR, 2006.
 
7
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. of ICDE, 2002.
 
8
L. Blunschi, J.-P. Dittrich, O. R. Girard, S. K. Karakashian, and M. A. V. Salles. A dataspace odyssey: The iMeMex personal dataspace management system. In CIDR, 2007.
 
9
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal XML pattern matching. In Sigmod, 2002.
 
10
S. Chakrabarti, K. Puniyani, and S. Das. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In WWW, 2006.
11
 
12
 
13
S.-Y. Chien, Z. Vagena, D. Zhang, V. J. Tsotras, and C. Zaniolo. Efficient structural joins on indexed XML documents. In Proc. of VLDB, 2002.
 
14
J. Cho and S. Rajagopalan. A fast regular expression indexing engine. In Proc. of ICDE, 2001.
15
 
16
B. F. Cooper, N. Sample, M. J.Franklin, G. R. Hjaltason,and M. Shadmon. A fast index for semistructured data. In Proc. of VLDB, 2001.
 
17
L. Denoyer and P. Gallinari. The Wikipedia XML Corpus. SIGIR Forum, 2006.
 
18
P. DeRose, W. Shen, F. Chen, Y. Lee, D. Burdick, A. Doan, and R. Ramakrishnan. D B Life: A community information management platform for the database research community. In CIDR, 2007.
 
19
X. Dong and A. Halevy. A Platform for Personal Information Management and Integration. In CIDR, 2005.
 
20
 
21
 
22
J. Graupmann, R. Schenkel, and G. Weikum. The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents. In VLDB, 2005.
 
23
M. Gubanov and P. A. Berstein. Structural text search and comparison using automatically extracted schema. In WebDB, 2006.
 
24
A. Y. Halevy, M. J. Franklin, and D. Maier. Principles of dataspace systems. In PODS, 2006.
 
25
H. He and J. Yang. Multiresolution indexing of XML for frequent queries. In Proc. of ICDE, 2004.
 
26
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. In VLDB, 2002.
 
27
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In Proc. of ICDE, 2003.
 
28
Jena. http://jena.sourceforge.net/, 2005.
 
29
H. Jiang, H. Lu, W. Wang, and B. C. Ooi. XR-Tree: Indexing XML data for efficient structural joins. In ICDE, 2003.
 
30
Y.-J. Joung and L.-W. Yang. KISS: A simple prefix search scheme in P2P networks. In WebDB, 2006.
31
 
32
R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the integration of structure indexes and inverted lists. In Proc. of SIGMOD, 2004.
 
33
R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for indexing paths in graph-structured data. In Proc. of ICDE, 2002.
 
34
Lucene. http://jakarta.apache.org/lucene/docs/index.html,2005.
 
35
T. Milo and D. Suciu. Index structures for path expressions. In Proc. of ICDT, 1999.
 
36
P. Rao and B. Moon. PRIX: Indexing and querying XML using Prufer sequences. In ICDE, 2004.
 
37
M. Sayyadian, H. Lekhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.
 
38
A. Schmidt, F. Waas, M. Kersten, M. J. Carey, I. Manolescu, and R. Busse. XMark: A benchmark for XML data management. In VLDB, 2002.
39
40
 
41
W. Wang, H. Jiang, H. Lu, and J. X. Yu. PBiTree coding and efficient processing of containment joins. In ICDE, 2003.
 
42
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and indexing documents and images. Morgan Kaufmann Publishers, San Francisco, 1999.
 
43
Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. In SIGMOD, 2005.
 
44
N. Zhang, T. Ozsu, I. F. Ilyas, and A. Aboulnaga. Fix: Feature-based indexing technique for XML documents. In VLDB, 2006.