ACM Home Page
Please provide us with feedback. Feedback
Configurable indexing and ranking for XML information retrieval
Full text PdfPdf (362 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Sheffield, United Kingdom
SESSION: XML retrieval table of contents
Pages: 88 - 95  
Year of Publication: 2004
ISBN:1-58113-881-4
Authors
Shaorong Liu  University of Los Angeles, Los Angeles, CA
Qinghua Zou  University of Los Angeles, Los Angeles, CA
Wesley W. Chu  University of Los Angeles, Los Angeles, CA
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 91,   Citation Count: 11
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1008992.1009010
What is a DOI?

ABSTRACT

Indexing and ranking are two key factors for efficient and effective XML information retrieval. Inappropriate indexing may result in false negatives and false positives, and improper ranking may lead to low precisions. In this paper, we propose a configurable XML information retrieval system, in which users can configure appropriate index types for XML tags and text contents. Based on users' index configurations, the system transforms XML structures into a compact tree representation, Ctree, and indexes XML text contents. To support XML ranking, we propose the concepts of "weighted term frequency" and "inverted element frequency," where the weight of a term depends on its frequency and location within an XML element as well as its popularity among similar elements in an XML dataset. We evaluate the effectiveness of our system through extensive experiments on the INEX 03 dataset and 30 content and structure (CAS) topics. The experimental results reveal that our system has significantly high precision at low recall regions and achieves the highest average precision (0.3309) as compared with 38 official INEX 03 submissions using the strict evaluation metric.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
S. Amer-Yahia, M. Fernandez, D. Srivastava and Y. Xu. Phrase Matching in XML. In VLDB 2003, pp. 177--188, 2003.
2
3
4
 
5
T. Grabs and H. J. Schek. Generating Vector Spaces On-the-fly for Flexible XML Retrieval. In {1}.
 
6
 
7
G. Kazai, M. Lalmas and S. Malik. INEX'03 Guidelines for Topic Development.
 
8
G. Kazai, M. Lalmas and B. Piwowarski. INEX'03 Relevance Assessment Guide.
 
9
 
10
 
11
 
12
Q. Zou, S. Liu and W. Chu. Ctree: A Compact Two-level Bidirectional Tree for Indexing XML Data. UCLA-CS Technical Report #TR040010, 2004.
 
13
INitiative for the evaluation of XML Retrieval. http://qmir.dcs.qmul.ac.uk/INEX
 
14
XPATH. http://www.w3.org/TR/xpath. http://fargo.cs.ucla.edu/inexdemo/inexsearch.aspx.

CITED BY  11

Collaborative Colleagues:
Shaorong Liu: colleagues
Qinghua Zou: colleagues
Wesley W. Chu: colleagues