| The statistical significance of the MUC-4 results |
| Full text |
Publisher Site
,
Pdf
(1.28 MB)
|
| Source
|
Message Understanding Conference
archive
Proceedings of the 4th conference on Message understanding
table of contents
McLean, Virginia
SESSION: General papers
table of contents
Pages: 30 - 50
Year of Publication: 1992
ISBN:1-55860-273-9
|
|
Author
|
|
Nancy Chinchor
|
Science Applications International Corporation, San Diego, CA
|
|
| Publisher |
Association for Computational Linguistics
Morristown, NJ, USA
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 20, Citation Count: 9
|
|
|
ABSTRACT
The MUC-4 scores of recall, precision, and the F-measures are used to measure the performance of the participating systems. The differences in the scores between any two systems may be due to chance or may be due to a significant difference between the two systems. To rule out the possibility that the difference is due to chance, statistical hypothesis testing is used. The method of hypothesis testing used is a computationally-intensive method known as approximate randomization. The method and the statistical significance of the results for the two MUC-4 test sets, TST3 and TST4, will be discussed in this paper.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Noreen, E. W. (1989) Computer Intensive Methods for Testing Hypotheses: An Introduction. New York: John Wiley & Sons.
|
| |
2
|
Efron, B. and R. Tibshirani (1991) "Statistical Data Analysis in the Computer Age" Science, vol. 253, pp. 390--395.
|
| |
3
|
|
|