ACM Home Page
Please provide us with feedback. Feedback
Deconstructing nuggets: the stability and reliability of complex question answering evaluation
Full text PdfPdf (169 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Question answering table of contents
Pages: 327 - 334  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Jimmy Lin  University of Maryland
Pengyi Zhang  University of Maryland
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 84,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277799
What is a DOI?

ABSTRACT

A methodology based on "information nuggets" has recently emerged as the de facto standard by which answers to complex questions are evaluated. After several implementations in the TREC question answering tracks, the community has gained a better understanding of its many characteristics. This paper focuses on one particular aspect of the evaluation: the human assignment of nuggets to answer strings, which serves as the basis of the F-score computation. As a byproduct of the TREC 2006 ciQA task, identical answer strings were independently evaluated twice, which allowed us to assess the consistency of human judgments. Based on these results, we explored simulations of assessor behavior that provide a method to quantify scoring variations. Understanding these variations in turn lets researchers be more confident in their comparisons of systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan. HARD track overview in TREC 2005: High accuracy retrieval from documents. In Proceedings of TREC 2005.
2
3
 
4
C. Cleverdon, J. Mills, and E. Keen. Factors determining the performance of indexing systems. Two volumes, ASLIB Cranfield Research Project, Cranfield, England, 1968.
 
5
W. Hildebrandt, B. Katz, and J. Lin. Answering definition questions with multiple knowledge sources. In Proceedings of HLT/NAACL 2004.
6
 
7
 
8
 
9
10
 
11
E. Voorhees. Overview of the TREC 2003 question answering track. In Proceedings of TREC 2003.
 
12
E. Voorhees. Overview of the TREC 2004 question answering track. In Proceedings of TREC 2004.
 
13
 
14
E. Voorhees and H. Dang. Overview of the TREC 2005 question answering track. In Proceedings of TREC 2005.
15
16