|
ABSTRACT
We propose and evaluate a family of measures, the eXtended Cumulated Gain (XCG) measures, for the evaluation of content-oriented XML retrieval approaches. Our aim is to provide an evaluation framework that allows the consideration of dependency among XML document components. In particular, two aspects of dependency are considered: (1) near-misses, which are document components that are structurally related to relevant components, such as a neighboring paragraph or container section, and (2) overlap, which regards the situation wherein the same text fragment is referenced multiple times, for example, when a paragraph and its container section are both retrieved. A further consideration is that the measures should be flexible enough so that different models of user behavior may be instantiated within. Both system- and user-oriented aspects are investigated and both recall and precision-like qualities are measured. We evaluate the reliability of the proposed measures based on the INEX 2004 test collection. For example, the effects of assessment variation and topic set size on evaluation stability are investigated, and the upper and lower bounds of expected error rates are established. The evaluation demonstrates that the XCG measures are stable and reliable, and in particular, that the novel measures of effort-precision and gain-recall (ep/gr) show comparable behavior to established IR measures like precision and recall.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Amati, G. 2003. Probability models for information retrieval based on divergence from randomness. Ph.D. thesis, University of Glasgow.
|
| |
2
|
Baeza-Yates, R., Fuhr, N., and Maarek, Y., eds. 2002. Proceedings of the SIGIR Workshop on XML and Information Retrieval.
|
| |
3
|
|
| |
4
|
Blanken, H. M., Grabs, T., Schek, H.-J., Schenkel, R., and Weikum, G., eds. 2003. Intelligent Search on XML Data, Applications, Languages, Models, Implementations, and Benchmarks. Lecture Notes in Computer Science, vol. 2818. Springer-Verlag.
|
| |
5
|
|
| |
6
|
Bray, T., Paoli, J., and Sperberg-McQueen, C. M. 1998. Extensible markup language (XML) 1.0. http://www.w3.org/TR/1998/REC-xml-19980210, W3C Recommendation. Tech. Rep., W3C (World Wide Web Consortium). Feb.
|
 |
7
|
|
| |
8
|
|
| |
9
|
Chiaramella, Y., Mulhem, P., and Fourel, F. 1996. A model for multimedia information retrieval. Tech. Rep. Fermi ESPRIT BRA 8134, University of Glasgow.
|
| |
10
|
Clark, J. and DeRose, S. 1999. XML path language (XPath) version 1.0. W3C Recommendation. http://www.w3.org/TR/xpath. Tech. Rep. REC-xpath-19991116, WWW Consortium. Nov.
|
| |
11
|
Conover, W. 1980. Practical Non-Parametric Statistics, 2nd ed. John Wiley, New York.
|
| |
12
|
Cooper, W. 1968. Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation 19, 1, 30--41.
|
| |
13
|
de Vries, A., Kazai, G., and Lalmas, M. 2004. Tolerance to irrelevance: A user-effort oriented evaluation of retrieval systems without predefined retrieval unit. In Proceedings of the Recherche d'Informations Assistee par Ordinateur (RIAO) Conference. Avignon, France.
|
| |
14
|
Fuhr, N., Lalmas, M., and Malik, S., eds. 2004. Proceedings of the 2nd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX). Dagstuhl, Germany, Dec. 15--17, 2003. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.
|
| |
15
|
Fuhr, N., Lalmas, M., Malik, S., and Szlavik, Z., eds. 2005. Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Schloss Dagstuhl, Germany, 6--8 Dec. 2004. Lecture Notes in Computer Science, vol. 3493. Springer.
|
| |
16
|
Fuhr, N., Malik, S., and Lalmas, M. 2004. Overview of the INitiative for the Evaluation of XML Retrieval (INEX) 2003. In Proceedings of the 2nd workshop of the initiative for the Evaluation of XML Retrieval. Dagstuhl, Germany. 1--11. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.
|
| |
17
|
|
| |
18
|
Gövert, N. and Kazai, G. 2003. Overview of the INitiative for the Evaluation of XML retrieval (INEX) 2002. In Proceedings of the 1st Workshop of the INitiative for the Evaluation of XML Retrieval (INEX). Dagstuhl, Germany, 8--11 Dec. 2002, Sophia Antipolis, France. 1--17.
|
| |
19
|
|
| |
20
|
Hawking, D., Voorhees, E., Craswell, N., and Bailey, P. 1999. Overview of the TREC-8 Web Track. In Proceedings of the TREC Conference.
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
Kando, N., Kuriyama, K., and Yoshioka, M. 2001. Information retrieval system evaluation using multi-grade relevance judgements - Discussion on averageable single-numbered measures (in japanese). Tech. Rep.
|
| |
25
|
Kazai, G. and Lalmas, M. 2005. Notes on what to measure in INEX. In Proceedings of the INEX Workshop on Element Retrieval Methodology. Glasgow, July 2005.
|
| |
26
|
Kazai, G. and Lalmas, M. 2006. INEX 2005 evaluation metrics. In Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Schloss Dagstuhl, 28--30 Nov. 2005. Lecture Notes in Computer Science vol. 3977. Springer-Verlag. 16--29.
|
 |
27
|
|
| |
28
|
Kazai, G., Lalmas, M., and de Vries, A. P. 2005. Reliability tests for the XCG and inex-2002 metrics. In Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Schloss Dagstuhl, 6--8 Germany, Dec. 2004. Lecture Notes in Computer Science vol. 3493. Springer-Verlag. 60--72.
|
| |
29
|
Kazai, G., Lalmas, M., and Piwowarski, B. 2004. INEX relevance assessment guide. In Proceedings of the 2nd workshop of the Initiative for the Evaluation of XML Retrieval. Dagstuhl, Germany. 204--209. http://inex.is.informatik.uni-duisburg.de:2003/proceedings.pdf.
|
| |
30
|
|
| |
31
|
|
| |
32
|
Lalmas, M. and Malik, S. 2004. INEX 2004 retrieval task and result submission specification. In Proceedings of the Advances in XML Information Retrieval: 3rd Workshop of the Initiative for the Evaluation of XML Retrieval. Schloss Dagstuhl, Germany. Lecture Notes in Computer Science vol. 3493. Springer-Verlag.
|
| |
33
|
Larsen, B., Malik, S., and Tombros, A. 2006. The interactive track at INEX 2005. In Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Schloss Dagstuhl, Germany, 28--30 Nov. Lecture Notes in Computer Science vol. 3977. Springer-Verlag. 404--417.
|
| |
34
|
Lesk, M. and Salton, G. 1969. Relevance assessments and retrieval system evaluation. Inf. Storage and Retrieval 4, 4, 343--359.
|
| |
35
|
Malik, S., Kazai, G., Lalmas, M., and Fuhr, N. 2006. Overview of INEX 2005. In Advances in XML Information Retrieval and Evaluation: 4th Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2005). Schloss Dagstuhl, Germany, 28--30 Nov. Lecture Notes in Computer Science vol. 3977. Springer-Verlag. 1--15.
|
| |
36
|
Malik, S., Lalmas, M., and Fuhr, N. 2005. Overview of INEX 2004. In Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Schloss Dagstuhl, Germany, 6--8 Dec. Lecture Notes in Computer Science vol. 3493. Springer-Verlag. 1--15.
|
| |
37
|
Piwowarski, B. and Gallinari, P. 2004. Expected ratio of relevant units: A measure for structured document information retrieval. In Proceedings of the 2nd Workshop of the INitiative for the Evaluation of XML retrieval (INEX). Dagstuhl, Germany, Dec. 2003. 158--166.
|
| |
38
|
Piwowarski, B., Gallinari, P., and Dupret, G. 2006. An extension of precision-recall with user modelling (PRUM): Application to XML retrieval. ACM Trans. Inf. Syst. (to appear).
|
 |
39
|
V. V. Raghavan , P. Bollmann , G. S. Jung, Retrieval system evaluation using recall and precision: problems and answers, Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval, p.59-68, June 25-28, 1989, Cambridge, Massachusetts, United States
|
| |
40
|
|
| |
41
|
Sakai, T. 2004. New performance metrics based on multigrade relevance: Their application to question answering. In Proceedings of the NTCIR Workshop 4 Meeting Working Notes.
|
| |
42
|
Sakai, T. 2005. The reliability of metrics based on graded relevance. In AIRS, G. G. Lee et al. eds. Lecture Notes in Computer Science vol. 3689. Springer-Verlag. 1--16.
|
 |
43
|
|
| |
44
|
Schamber, L. 1994. Relevance and information behavior. Ammal Rev. Inf. Sci. Technol. 3--48.
|
| |
45
|
|
| |
46
|
|
| |
47
|
Tombros, T., Larsen, B., and Malik, S. 2005. The interactive track at INEX 2004. In Proceedings of the 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX). Dagstuhl, Germany, Dec. 2004.
|
| |
48
|
Trotman, A. and Sigurbjörnsson, B. 2005. Narrowed Extended XPath I (NEXI). In Advances in XML Information Retrieval: 3rd Workshop of the INitiative for the Evaluation of XML Retrieval (INEX 2004). Schloss Dagstuhl, Germany, 6--8 Dec. 2004. Lecture Notes in Computer Science vol. 3493. Springer-Verlag. 41--53.
|
| |
49
|
|
| |
50
|
|
 |
51
|
|
| |
52
|
Voorhees, E. M. 2003a. Overview of the TREC 2003 question answering track. In Proceedings of the Text REtrieval Conference. Gaithersburg, Germany.
|
| |
53
|
Voorhees, E. M. 2003b. Overview of the TREC 2003 robust retrieval track. In Proceedings of the TREC Conference. 69--77.
|
 |
54
|
|
| |
55
|
|
| |
56
|
|
CITED BY 4
|
|
|
|
|
|
|
|
|
|
|
M S. Ali , Mariano P. Consens , Gabriella Kazai , Mounia Lalmas, Structural relevance: a common basis for the evaluation of structured document retrieval, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|