| Configurable indexing and ranking for XML information retrieval |
| Full text |
Pdf
(362 KB)
|
| Source
|
Annual ACM Conference on Research and Development in Information Retrieval
archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
table of contents
Sheffield, United Kingdom
SESSION: XML retrieval
table of contents
Pages: 88 - 95
Year of Publication: 2004
ISBN:1-58113-881-4
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 3, Downloads (12 Months): 91, Citation Count: 11
|
|
|
ABSTRACT
Indexing and ranking are two key factors for efficient and effective XML information retrieval. Inappropriate indexing may result in false negatives and false positives, and improper ranking may lead to low precisions. In this paper, we propose a configurable XML information retrieval system, in which users can configure appropriate index types for XML tags and text contents. Based on users' index configurations, the system transforms XML structures into a compact tree representation, Ctree, and indexes XML text contents. To support XML ranking, we propose the concepts of "weighted term frequency" and "inverted element frequency," where the weight of a term depends on its frequency and location within an XML element as well as its popularity among similar elements in an XML dataset. We evaluate the effectiveness of our system through extensive experiments on the INEX 03 dataset and 30 content and structure (CAS) topics. The experimental results reveal that our system has significantly high precision at low recall regions and achieves the highest average precision (0.3309) as compared with 38 official INEX 03 submissions using the strict evaluation metric.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Amer-Yahia, M. Fernandez, D. Srivastava and Y. Xu. Phrase Matching in XML. In VLDB 2003, pp. 177--188, 2003.
|
 |
2
|
Ricardo Baeza Yates , Norbert Fuhr , Yoelle S. Maarek, Second edition of the "XML and information retrieval" workshop held at SIGIR'2002, Tampere, Finland, Aug 15th, 2002, ACM SIGIR Forum, v.36 n.2, Fall 2002
[doi> 10.1145/792550.792560]
|
 |
3
|
David Carmel , Yoelle S. Maarek , Matan Mandelbrod , Yosi Mass , Aya Soffer, Searching XML documents via XML fragments, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860464]
|
 |
4
|
|
| |
5
|
T. Grabs and H. J. Schek. Generating Vector Spaces On-the-fly for Flexible XML Retrieval. In {1}.
|
| |
6
|
|
| |
7
|
G. Kazai, M. Lalmas and S. Malik. INEX'03 Guidelines for Topic Development.
|
| |
8
|
G. Kazai, M. Lalmas and B. Piwowarski. INEX'03 Relevance Assessment Guide.
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
| |
12
|
Q. Zou, S. Liu and W. Chu. Ctree: A Compact Two-level Bidirectional Tree for Indexing XML Data. UCLA-CS Technical Report #TR040010, 2004.
|
| |
13
|
INitiative for the evaluation of XML Retrieval. http://qmir.dcs.qmul.ac.uk/INEX
|
| |
14
|
XPATH. http://www.w3.org/TR/xpath. http://fargo.cs.ucla.edu/inexdemo/inexsearch.aspx.
|
|