|
ABSTRACT
Data-driven approaches in question answering (QA) are increasingly common. Since availability of training data for such approaches is very limited, we propose an unsupervised algorithm that generates high quality question-answer pairs from local corpora. The algorithm is ontology independent, requiring very small seed data as its starting point. Two alternating views of the data make learning possible: 1) question types are viewed as relations between entities and 2) question types are described by their corresponding question-answer pairs. These two aspects of the data allow us to construct an unsupervised algorithm that acquires high precision question-answer pairs. We show the quality of the acquired data for different question types and perform a task-based evaluation. With each iteration, pairs acquired by the unsupervised algorithm are used as training data to a simple QA system. Performance increases with the number of question-answer pairs acquired confirming the robustness of the unsupervised algorithm. We introduce the notion of <i>semantic drift</i> and show that it is a desirable quality in training data for question answering systems.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
C. Clarke, G. Cormack, G. Kemkes, M. Laszlo, T. Lynam, E. Terra, and P. Tilker. Statistical selection of exact answers. Text Retrieval Conference (TREC), 2003.
|
 |
2
|
|
| |
3
|
M. Collins and Y. Singer. Unsupervised models for named entity classification. Conference on Empirical Methods in Natural Language Processing (EMNLP)/VLC, 1999.
|
 |
4
|
Susan Dumais , Michele Banko , Eric Brill , Jimmy Lin , Andrew Ng, Web question answering: is more always better?, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564428]
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
U. Hermjakob, E. Hovy, and C. Lin. Knowledge-based question answering. Text Retrieval Conference (TREC), 2000.
|
| |
9
|
E. Hovy, L. Gerber, U. Hermjakob, M. Junk, and C. Lin. Question answering in webclopedia. Text Retrieval Conference (TREC), 2000.
|
| |
10
|
Eduard Hovy , Ulf Hermjakob , Chin-Yew Lin , Deepak Ravichandran, Using knowledge to facilitate factoid answer pinpointing, Proceedings of the 19th international conference on Computational linguistics, p.1-7, August 24-September 01, 2002, Taipei, Taiwan
[doi> 10.3115/1072228.1072270]
|
| |
11
|
Lita and J. Carbonell. Instance-based question answering: A data-driven approach. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004.
|
| |
12
|
Lita, W. Hunt, and E. Nyberg. Resource analysis for question answering. Association for Computational Linguistics Conference (ACL), 2004.
|
| |
13
|
B. Magnini, S. Romagnoli, A. Vallin, J. Herrera, A. Penas, V. Peiado, F. Verdejo, and M. de Rijke. The multiple language question answering track at cross-lingual evaluation forum (clef) 2003. Cross-Lingual Evaluation Forum (CLEF), 2003.
|
| |
14
|
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Five papers on wordnet. International Journal of Lexicography, 1990.
|
| |
15
|
D. Moldovan, D. Clark, S. Harabagiu, and S. Maiorano. Cogex: A logic prover for question answering. Association for Computational Linguistics Conference (ACL), 2003.
|
| |
16
|
Dan Moldovan , Sanda Harabagiu , Marius Pasca , Rada Mihalcea , Roxana Girju , Richard Goodrum , Vasile Rus, The structure and performance of an open-domain question answering system, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, p.563-570, October 03-06, 2000, Hong Kong
[doi> 10.3115/1075218.1075289]
|
| |
17
|
E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. V. Lita, V. Pedro, D. Svoboda, and B. V. Durme. The javelin question-answering system at trec 2003: A multi strategy approach with dynamic planning. Text Retrieval Conference (TREC), 2003.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
E. Voorhees. Overview of the text retrieval conference (trec) 2003 question answering track. Text Retrieval Conference (TREC), 2003.
|
| |
22
|
|
|