ACM Home Page
Please provide us with feedback. Feedback
Unsupervised question answering data acquisition from local corpora
Full text PdfPdf (169 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the thirteenth ACM international conference on Information and knowledge management table of contents
Washington, D.C., USA
SESSION: IR-7 (information retrieval): natural language processing for IR table of contents
Pages: 607 - 614  
Year of Publication: 2004
ISBN:1-58113-874-1
Authors
Lucian Vlad Lita  Carnegie Mellon University, Pittsburgh, PA
Jaime Carbonell  Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 50,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031171.1031283
What is a DOI?

ABSTRACT

Data-driven approaches in question answering (QA) are increasingly common. Since availability of training data for such approaches is very limited, we propose an unsupervised algorithm that generates high quality question-answer pairs from local corpora. The algorithm is ontology independent, requiring very small seed data as its starting point. Two alternating views of the data make learning possible: 1) question types are viewed as relations between entities and 2) question types are described by their corresponding question-answer pairs. These two aspects of the data allow us to construct an unsupervised algorithm that acquires high precision question-answer pairs. We show the quality of the acquired data for different question types and perform a task-based evaluation. With each iteration, pairs acquired by the unsupervised algorithm are used as training data to a simple QA system. Performance increases with the number of question-answer pairs acquired confirming the robustness of the unsupervised algorithm. We introduce the notion of <i>semantic drift</i> and show that it is a desirable quality in training data for question answering systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
C. Clarke, G. Cormack, G. Kemkes, M. Laszlo, T. Lynam, E. Terra, and P. Tilker. Statistical selection of exact answers. Text Retrieval Conference (TREC), 2003.
2
 
3
M. Collins and Y. Singer. Unsupervised models for named entity classification. Conference on Empirical Methods in Natural Language Processing (EMNLP)/VLC, 1999.
4
 
5
 
6
 
7
 
8
U. Hermjakob, E. Hovy, and C. Lin. Knowledge-based question answering. Text Retrieval Conference (TREC), 2000.
 
9
E. Hovy, L. Gerber, U. Hermjakob, M. Junk, and C. Lin. Question answering in webclopedia. Text Retrieval Conference (TREC), 2000.
 
10
 
11
Lita and J. Carbonell. Instance-based question answering: A data-driven approach. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004.
 
12
Lita, W. Hunt, and E. Nyberg. Resource analysis for question answering. Association for Computational Linguistics Conference (ACL), 2004.
 
13
B. Magnini, S. Romagnoli, A. Vallin, J. Herrera, A. Penas, V. Peiado, F. Verdejo, and M. de Rijke. The multiple language question answering track at cross-lingual evaluation forum (clef) 2003. Cross-Lingual Evaluation Forum (CLEF), 2003.
 
14
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Five papers on wordnet. International Journal of Lexicography, 1990.
 
15
D. Moldovan, D. Clark, S. Harabagiu, and S. Maiorano. Cogex: A logic prover for question answering. Association for Computational Linguistics Conference (ACL), 2003.
 
16
 
17
E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. V. Lita, V. Pedro, D. Svoboda, and B. V. Durme. The javelin question-answering system at trec 2003: A multi strategy approach with dynamic planning. Text Retrieval Conference (TREC), 2003.
 
18
 
19
 
20
 
21
E. Voorhees. Overview of the text retrieval conference (trec) 2003 question answering track. Text Retrieval Conference (TREC), 2003.
 
22


Collaborative Colleagues:
Lucian Vlad Lita: colleagues
Jaime Carbonell: colleagues