| Content and structure in indexing and ranking XML |
| Full text |
Pdf
(320 KB)
|
| Source
|
WebDB; Vol. 67
archive
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
table of contents
Paris, France
SESSION: Paper session 5: approximate and ranked query processing
table of contents
Pages: 67 - 72
Year of Publication: 2004
|
|
Authors
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 55, Citation Count: 5
|
|
|
ABSTRACT
Rooted in electronic publishing, XML is now widely used for modelling and storing structured text documents. Especially in the WWW, retrieval of XML documents is most useful in combination with a relevance-based ranking of the query result. Index structures with ranking support are therefore needed for fast access to relevant parts of large document collections. This paper proposes a classification scheme for both XML ranking models and index structures, allowing to determine which index suits which ranking model. An analysis reveals that ranking parameters related to both the content and structure of the data are poorly supported by most known XML indices. The IR-CADG index, owing to its tight integration of content and structure, supports various XML ranking models in a very efficient retrieval process. Experiments show that it outperforms separate content/structure indexing by more than two orders of magnitude for large corpora of several hundred MB.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
| |
4
|
|
 |
5
|
|
| |
6
|
H. Meuss, K. Schulz, and F. Bry. Visual Querying and Exploration of Large Answers in XML DBs with X2. In Proc. 19th Int. Conf. on Data Engineering, 2003.
|
| |
7
|
|
| |
8
|
|
| |
9
|
K. Sauvagnat and M. Boughanem. XFIRM: A Flexible IR Model for Indexing and Searching XML Documents. Poster at 26th Europ. Conf. on IR, 2004.
|
| |
10
|
T. Schlieder. Similarity Search in XML Data using Cost-Based Query Transformations. In Proc. 4th Int. Workshop on the Web and Databases, 2001.
|
| |
11
|
|
 |
12
|
Dongwook Shin , Hyuncheol Jang , Honglan Jin, BUS: an effective indexing and retrieval scheme in structured documents, Proceedings of the third ACM conference on Digital libraries, p.235-243, June 23-26, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276675.276702]
|
| |
13
|
|
| |
14
|
F. Weigel, H. Meuss, F. Bry, and K. U. Schulz. Content-Aware DataGuides: Interleaving IR and DB Indexing Techniques for Efficient Retrieval of Textual XML Data. In Proc. 26th Europ. Conf. on IR, 2004.
|
| |
15
|
|
| |
16
|
H. Zargayouna and S. Salotti. SemIndex: A Model of Semantic Indexing on XML Documents. Poster at 26th Europ. Conf. on IR, 2004.
|
CITED BY 5
|
|
|
|
|
Sihem Amer-Yahia , Nick Koudas , Amélie Marian , Divesh Srivastava , David Toman, Structure and content scoring for XML, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|