ACM Home Page
Please provide us with feedback. Feedback
The statistical significance of the MUC-4 results
Full text Publisher SitePublisher Site PdfPdf (1.28 MB)
Source Message Understanding Conference archive
Proceedings of the 4th conference on Message understanding table of contents
McLean, Virginia
SESSION: General papers table of contents
Pages: 30 - 50  
Year of Publication: 1992
ISBN:1-55860-273-9
Author
Nancy Chinchor  Science Applications International Corporation, San Diego, CA
Publisher
Association for Computational Linguistics  Morristown, NJ, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 20,   Citation Count: 9
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: 10.3115/1072064.1072068

ABSTRACT

The MUC-4 scores of recall, precision, and the F-measures are used to measure the performance of the participating systems. The differences in the scores between any two systems may be due to chance or may be due to a significant difference between the two systems. To rule out the possibility that the difference is due to chance, statistical hypothesis testing is used. The method of hypothesis testing used is a computationally-intensive method known as approximate randomization. The method and the statistical significance of the results for the two MUC-4 test sets, TST3 and TST4, will be discussed in this paper.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Noreen, E. W. (1989) Computer Intensive Methods for Testing Hypotheses: An Introduction. New York: John Wiley & Sons.
 
2
Efron, B. and R. Tibshirani (1991) "Statistical Data Analysis in the Computer Age" Science, vol. 253, pp. 390--395.
 
3

CITED BY  9