|
ABSTRACT
A general re-weighting method, called contextualization, for more efficient element ranking in XML retrieval is introduced. Re-weighting is based on the idea of using the ancestors of an element as a context: if the element appears in a good context -- good interpreted as probability of relevance -- its weight is increased in relevance scoring; if the element appears in a bad context, its weight is decreased. The formal presentation of contextualization is given in a general XML representation and manipulation frame, which is based on utilization of structural indices. This provides a general approach independent of weighting schemas or query languages.Contextualization is evaluated with the INEX test collection. We tested four runs: no contextualization, parent, root and tower contextualizations. The contextualization runs were significantly better than no contextualization. The root contextualization was the best among the re-weighted runs.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Fuhr, N., Malik, S., and Lalmas, M. Overview of the initiative for the evaluation of XML retrieval (INEX) 2003. In INEX 2003 Workshop Proc., 2003, 1--11. Retrieved 13.1.2005 from http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf
|
| |
3
|
Gövert, N., Abolhassani, M., Fuhr, N., and Grossjohan, K. Content-oriented XML retrieval with HyRex. In Proc. of the First Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), 2002, 26--32. Retrieved 27.1.2005 from http://qmir.dcs.qmul.ac.uk/inex/Workshop.html
|
| |
4
|
Gövert, N., and Kazai, G. Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002. In Proc. of the First Workshop of the Initiative for the Evaluation of XML Retrieval (INEX), 2002, 1--17. Retrieved 27.1.2005 from http://qmir.dcs.qmul.ac.uk/inex/Workshop.html
|
| |
5
|
Hawking, D., Thistlewaite, P., and Craswell, P. ANU/ACSys TREC-6 experiments. In Proc. of TREC-6, 1998. Retrieved 10.3.2004 from http://trec.nist.gov/pubs/trec6/papers/anu.ps
|
| |
6
|
ISO/IEC 14977. International standard ISO/IEC 14977 : 1996(E). Extended BNF. Draft, 1996.
|
| |
7
|
|
 |
8
|
|
| |
9
|
Kazai, G. Lalmas, M, and Malik, S. INEX'03 guidelines for topic development. In INEX 2003 Workshop Proc., 2003, 192--199. Retrieved 21.1.2005 from http://inex.is.informatik.uni-duisburg.de: 2003/internal/downloads/INEXTopicDevGuide.pdf
|
| |
10
|
Kazai, G., Lalmas, M., and Piwowarski, B. INEX 2004 Relevance Assessment Guide. In INEX 2004 Workshop Pre-Proc., 2004, 241--248. Retrieved 18.1.2005 from http://inex.is.informatik.uni-duisburg.de: 2004/pdf/INEX2004PreProceedings.pdf
|
| |
11
|
Kekäläinen, J., Junkkari, M., Arvola, P., and Aalto, T. TRIX 2004: Struggling with the overlap. In Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004. LNCS 3493. Springer, Heidelberg, 2005, 127--139.
|
 |
12
|
|
| |
13
|
Malik, S., Lalmas, M., and Fuhr, N. Overview of INEX 2004. In Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004. LNCS 3493. Springer, Heidelberg, 2005, 1--15.
|
| |
14
|
Mass, Y., and Mandelbrod, M. Component ranking and automatic query refinement for XML retrieval. In Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004. LNCS 3493. Springer, Heidelberg, 2005, 73--84.
|
| |
15
|
McCallum, A., and Nigam, K. Text classification by bootstrapping with keywords, EM and shrinkage. In Proc. of ACL 99 Workshop for Unsupervised Learning in Natural Language Processing. 1999, 52--58.
|
| |
16
|
Niemi, T. A seven-tuple representation for hierarchical data structures. Information Systems, 8(3), 1983, 151--157.
|
| |
17
|
Niemi, T., and Järvelin K. The processing strategy for the NF2 relational FRC-interface. Information & Software Technology, 38, 1996, 11--24.
|
| |
18
|
Ogilvie, P., and Callan, J. Hierarchical language models for XML component retrieval. In Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004. LNCS 3493. Springer, Heidelberg, 2005, 224--237.
|
| |
19
|
Robertson S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., and Gatford, M. Okapi at TREC-3. In NIST Special Publication 500-226: Overview of the Third Text REtrieval Conference (TREC-3). 1994. Retrieved 21.11.2004 from http://trec.nist.gov/pubs/trec3/papers/city.ps.gz
|
| |
20
|
Sigurbjörnsson, B., Kamps J., and de Rijke, M. An element-based approach to XML retrieval. In INEX 2003 Workshop Proc., 2003, 19--26.
|
 |
21
|
Igor Tatarinov , Stratis D. Viglas , Kevin Beyer , Jayavel Shanmugasundaram , Eugene Shekita , Chun Zhang, Storing and querying ordered XML using a relational database system, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
[doi> 10.1145/564691.564715]
|
| |
22
|
de Vries, A.P, Kazai, G., and Lalmas, M. Evaluation metrics 2004. In INEX 2004 Workshop Pre-proc., 2004, 249--250. Retrieved 18.1.2005 from http://inex.is.informatik.uni-duisburg.de:2004/pdf/INEX2004PreProceedings.pdf
|
INDEX TERMS
Primary Classification:
H.
Information Systems
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.3
Information Search and Retrieval
Subjects:
Retrieval models
Additional Classification:
E.
Data
E.1
DATA STRUCTURES
Subjects:
Trees
E.5
FILES
Subjects:
Organization/structure
H.
Information Systems
H.2
DATABASE MANAGEMENT
H.2.1
Logical Design
Subjects:
Data models
H.3
INFORMATION STORAGE AND RETRIEVAL
H.3.4
Systems and Software
Subjects:
Performance evaluation (efficiency and effectiveness)
General Terms:
Design,
Experimentation,
Languages,
Management,
Measurement,
Performance
Keywords:
Dewey ordering,
XML,
contextualization,
re-weighting,
semi-structured data,
structural indices,
structured documents
|