ACM Home Page
Please provide us with feedback. Feedback
Efficient keyword search for smallest LCAs in XML databases
Full text PdfPdf (479 KB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Research papers: XML query, update, and search table of contents
Pages: 527 - 538  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Yu Xu  University of California, San Diego
Yannis Papakonstantinou  University of California, San Diego
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 116,   Citation Count: 37
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066217
What is a DOI?

ABSTRACT

Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corresponding efficient algorithms. The proposed keyword search returns the set of smallest trees containing all keywords, where a tree is designated as "smallest" if it contains no tree that also contains all keywords. Our core contribution, the Indexed Lookup Eager algorithm, exploits key properties of smallest trees in order to outperform prior algorithms by orders of magnitude when the query contains keywords with significantly different frequencies. The Scan Eager variant is tuned for the case where the keywords have similar frequencies. We analytically and experimentally evaluate two variants of the Eager algorithm, along with the Stack algorithm [13]. We also present the XKSearch system, which utilizes the Indexed Lookup Eager, Scan Eager and Stack algorithms and a demo of which on DBLP data is available at http://www.db.ucsd.edu/projects/xksearch. Finally, we extend the Indexed Lookup Eager algorithm to answer Lowest Common Ancestor (LCA) queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.
 
2
V. Aguilera et al. Querying XML documents in XYleme. In SIGIR Workshop on XML and Information Retrieval, 2000.
 
3
 
4
BerkeleyDB. http://www.sleepycat.com/.
 
5
 
6
 
7
 
8
S. Cohen, J. Namou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In VLDB, 2003.
 
9
10
 
11
 
12
13
 
14
V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava. Keyword Proximity Search in XML Trees. Available at http://www.db.ucsd.edu/publications/treeproximity.pdf.
 
15
V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002.
 
16
V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on XML graphs. In ICDE, 2003.
 
17
 
18
Y. Li, C. Yu, and H. V. Jagadish. Schema-free xquery. In VLDB, 2004.
 
19
J. Naughton et al. The Niagara Internet Query System. IEEE Data Engineering Bulletin, 24(2):27--33, 2001.
 
20
 
21
 
22
D. Srivastava et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.
23
 
24
 
25
 
26
 
27
XYZFind. http://www.searchtools.com/tools/xyzfind.html.

CITED BY  37
Collaborative Colleagues:
Yu Xu: colleagues
Yannis Papakonstantinou: colleagues