|
ABSTRACT
Querying XML data is a well-explored topic with powerful database-style query languages such as XPath and XQuery set to become W3C standards. An equally compelling paradigm for querying XML documents is full-text search on textual content. In this paper, we study fundamental challenges that arise when we try to integrate these two querying paradigms.While keyword search is based on approximate matching, XPath has exact match semantics. We address this mismatch by considering queries on structure as a "template", and looking for answers that best match this template and the full-text search. To achieve this, we provide an elegant definition of relaxation on structure and define primitive operators to span the space of relaxations. Query answering is now based on ranking potential answers on structural and full-text search conditions. We set out certain desirable principles for ranking schemes and propose natural ranking schemes that adhere to these principles. We develop efficient algorithms for answering top-K queries and discuss results from a comprehensive set of experiments that demonstrate the utility and scalability of the proposed framework and algorithms.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Al-Khalifa et al. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2002.
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
J. M. Bremer and M. Gertz. XQuery/IR: Integrating XML Document and Data Retrieval. WebDB 2002.
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
David Carmel , Yoelle S. Maarek , Matan Mandelbrod , Yosi Mass , Aya Soffer, Searching XML documents via XML fragments, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860464]
|
| |
10
|
C. Chen and Y. Ling. A Sampling-Based Estimator for Top-K Query. In ICDE 2002.
|
| |
11
|
T. T. Chinenyanga and N. Kushmerick. Expressive and Efficient Ranked Querying of XML Data. 4th International Workshop on the Web and Databases (WebDB). Santa Barbara, California, 2001.
|
| |
12
|
S. Cohen et al. XSEarch: A Semantic Search Engine for XML. In VLDB 2003.
|
| |
13
|
M. Cutler et al. Using the Structure of HTML Documents to Improve Retrieval. USENIX Symposium on Internet Technologies and Systems. California 1997.
|
| |
14
|
Ernesto Damiani , Nico Lavarini , Stefania Marrara , Barbara Oliboni , Daniele Pasini , Letizia Tanca , Giuseppe Viviani, The APPROXML Tool Demonstration, Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology, p.753-755, March 25-27, 2002
|
| |
15
|
C. Delobel and M. C. Rousset. A Uniform Approach for Querying Large Tree-structured Data through a Mediated Schema. International Workshop on Foundations of Models for Information Integration (FMII-2001).
|
| |
16
|
S. Flesca et al. On the minimization of XPath queries. In VLDB 2003: 153--164
|
| |
17
|
|
| |
18
|
N. Fuhr and K. Grossjohann. XIRQL: An Extension of XQL for Information Retrieval. ACM SIGIR Workshop on XML and Information Retrieval. Athens, Greece, 2000.
|
| |
19
|
|
 |
20
|
|
| |
21
|
Y. Hayashi et al. Searching Text-rich XML Documents with Relevance Ranking. ACM SIGIR 2000 Workshop on XML and Information Retrieval, Greece, 2000.
|
 |
22
|
Vagelis Hristidis , Nick Koudas , Yannis Papakonstantinou, PREFER: a system for the efficient execution of multi-parametric ranked queries, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.259-270, May 21-24, 2001, Santa Barbara, California, United States
|
| |
23
|
P. Kilpelainen. Tree Matching Problems with Applications to Structured Text Databases. PhD thesis, University of Helsinki, Finland, November 1992.
|
 |
24
|
|
 |
25
|
Sung Hyon Myaeng , Don-Hyun Jang , Mun-Seok Kim , Zong-Cheol Zhoo, A flexible model for retrieval of SGML documents, Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, p.138-145, August 24-28, 1998, Melbourne, Australia
[doi> 10.1145/290941.290980]
|
| |
26
|
J. Naughton et al. The Niagara Internet Query System. http://www.cs.wisc.edu/niagara/Publications.html
|
| |
27
|
|
| |
28
|
|
| |
29
|
|
| |
30
|
T. Schlieder. Similarity Search in XML Data using Cost-Based Query Transformations. ACM SIGMOD 2001 Web and Databases Workshop. May, 2001. Santa Barbara, California.
|
| |
31
|
|
CITED BY 26
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sihem Amer-Yahia , Nick Koudas , Amélie Marian , Divesh Srivastava , David Toman, Structure and content scoring for XML, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Marcos Antonio Vaz Salles , Jens-Peter Dittrich , Shant Kirakos Karakashian , Olivier René Girard , Lukas Blunschi, iTrails: pay-as-you-go information integration in dataspaces, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Pawel Placek , Dimitri Theodoratos , Stefanos Souldatos , Theodore Dalamagas , Timos Sellis, A heuristic approach for checking containment of generalized tree-pattern queries, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|