|
ABSTRACT
XML databases often contain documents comprising structured text. Therefore, it is important to integrate "information retrieval style" query evaluation, which is well-suited for natural language text, with standard "database style" query evaluation, which handles structured queries efficiently. Relevance scoring is central to information retrieval. In the case of XML, this operation becomes more complex because the data required for scoring could reside not directly in an element itself but also in its descendant elements.In this paper, we propose a bulk-algebra, TIX, and describe how it can be used as a basis for integrating information retrieval techniques into a standard pipelined database query evaluation engine. We develop new evaluation strategies essential to obtaining good performance, including a stack-based TermJoin algorithm for efficiently scoring composite elements. We report results from an extensive experimental evaluation, which show, among other things, that the new TermJoin access method outperforms a direct implementation of the same functionality using standard operators by a large factor.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
S. Al-Khalifa, H. V. Jagadish, N. Kouda, J. Patel, D. Srivastava, and Y. Wu. Structural joins: A primitive for efficient XML query pattern matching. In ICDE, 2001.
|
| |
3
|
D. Beech, A. Malhotra, and M. Rys. A formal data model and algebra for XML. W3C XML Query Working Group Note, September 1999.
|
| |
4
|
C. Beeri and Y. Tzaban. SAL: An algebra for semi-structured data and XML. In ACM SIGMOD Workshop on the Web and Databases, pages 37--42, Philadelphia, PA, June 1999.
|
| |
5
|
N. Bruno, L. Gravano, and A. Marian. Evaluating top-k queries over web-accessible databases. In ICDE, 2002.
|
 |
6
|
|
| |
7
|
D. D. Chamberlin, J. Clark, D. Florescu, J. Robie, J. Simon, and M. Stefanescu. XQuery 1.0: An XML query language. W3C working draft, June 2001. http://www.w3.org/TR/xquery/.
|
 |
8
|
|
| |
9
|
S.-Y. Chien, Z. Vagena, D. Zhang, V. J. Tsotras, and C. Zaniolo. Efficient structural joins on indexed XML documents. In VLDB, 2002.
|
 |
10
|
|
| |
11
|
DELOS. Initiative for the evaluation of XML retrieval. http://qmir.dcs.qmw.ac.uk/inex/.
|
| |
12
|
P. Fankhauser, M. Fernandez, A. Malhotra, M. Rys, J. Simeon, and P. Wadler. The XML query algebra. W3C Working Draft, Feburary 2001.
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Vagelis Hristidis , Nick Koudas , Yannis Papakonstantinou, PREFER: a system for the efficient execution of multi-parametric ranked queries, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.259-270, May 21-24, 2001, Santa Barbara, California, United States
|
| |
17
|
|
| |
18
|
A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In VLDB, 2002.
|
| |
19
|
G. Ozsoyoglu, A. Al-Hamdani, I. S. Altingovde, S. A. Ozel, O. Ulusoy, and Z. M. Ozsoyoglu. Sideway value algebra for object-relational databases. In VLDB, 2002.
|
| |
20
|
|
| |
21
|
T. Schlieder and H. Meuss. Result ranking for structured queries against XML documents. In DELOS Workshop on Information Seeking, Searching and Querying in Digital Libraries, 2000.
|
| |
22
|
|
| |
23
|
|
| |
24
|
U. of Michigan. The Timber system. http://www.eecs.umich.edu/db/timber/.
|
 |
25
|
Chun Zhang , Jeffrey Naughton , David DeWitt , Qiong Luo , Guy Lohman, On supporting containment queries in relational database management systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.425-436, May 21-24, 2001, Santa Barbara, California, United States
|
CITED BY 22
|
|
Stelios Paparizos , Shurug Al-Khalifa , Adriane Chapman , H. V. Jagadish , Laks V. S. Lakshmanan , Andrew Nierman , Jignesh M. Patel , Divesh Srivastava , Nuwee Wiwatwattana , Yuqing Wu , Cong Yu, TIMBER: a native system for querying XML, Proceedings of the 2003 ACM SIGMOD international conference on Management of data, June 09-12, 2003, San Diego, California
|
|
|
|
|
|
|
|
|
|
|
|
Carina F. Dorneles , Carlos A. Heuser , Andrei E. N. Lima , Altigran Soares da Silva , Edleno Silva de Moura, Measuring similarity between collection of values, Proceedings of the 6th annual ACM international workshop on Web information and data management, November 12-13, 2004, Washington DC, USA
|
|
|
|
|
|
|
|
|
Gültekin Özsoyoǧlu , Ismail Sengör Altingövde , Abdullah Al-Hamdani , Selma Ayşe Özel , Özgür Ulusoy , Zehra Meral özsoyoǧlu, Querying web metadata: Native score management and text support in databases, ACM Transactions on Database Systems (TODS), v.29 n.4, p.581-634, December 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Feng Shao , Lin Guo , Chavdar Botev , Anand Bhaskar , Muthiah Chettiar , Fan Yang , Jayavel Shanmugasundaram, Efficient keyword search over virtual XML views, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
|
|
|
|
|
Feng Shao , Lin Guo , Chavdar Botev , Anand Bhaskar , Muthiah Chettiar , Fan Yang , Jayavel Shanmugasundaram, Efficient keyword search over virtual XML views, The VLDB Journal — The International Journal on Very Large Data Bases, v.18 n.2, p.543-570, April 2009
|
|
|
|
|