ACM Home Page
Please provide us with feedback. Feedback
Soft pattern matching models for definitional question answering
Full text PdfPdf (397 KB)
Source
ACM Transactions on Information Systems (TOIS) archive
Volume 25 ,  Issue 2  (April 2007) table of contents
Article No. 8  
Year of Publication: 2007
ISSN:1046-8188
Authors
Hang Cui  National University of Singapore, Singapore
Min-Yen Kan  National University of Singapore, Singapore
Tat-Seng Chua  National University of Singapore, Singapore
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 16,   Downloads (12 Months): 153,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1229179.1229182
What is a DOI?

ABSTRACT

We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression-based hard matching patterns to identify definition sentences. Such rigid surface matching often fares poorly when faced with language variations. We propose two soft matching models to address this problem: one based on bigrams and the other on the Profile Hidden Markov Model (PHMM). Both models provide a theoretically sound method to model pattern matching as a probabilistic process that generates token sequences. We demonstrate the effectiveness of the models on definition sentence retrieval for definitional question answering. We show that both models significantly outperform the state-of-the-art manually constructed hard matching patterns on recent TREC data.

A critical difference between the two models is that the PHMM has a more complex topology. We experimentally show that the PHMM can handle language variations more effectively but requires more training data to converge.

While we evaluate soft pattern models only on definitional question answering, we believe that both models are generic and can be extended to other areas where lexico-syntactic pattern matching can be applied.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
Ahn, D., Jijkoun, V., Mishne, G., &Muml;uller, K., de Rijke, M., and Schlobach, S. 2004. Using Wikipedia at the TREC QA Track. In Proceedings of TREC.
 
3
Blair-Goldensohn, S., McKeown, K., and Schlaikjer, A. H. 2003. A hybrid approach for QA track definitional questions. In Proceedings of TREC. 185--192.
 
4
Blair-Goldensohn, S., McKeown, K., and Schlaikjer, A. H. 2004. Answering definitional questions: A hybrid approach. In New Directions in Question Answering. 47--58.
5
6
 
7
Chu-Carroll, J., Czuba, K., Prager, J., Ittycheriah, A., and Blair-Goldensohn, S. 2004. IBM's PIQUANT II in TREC 2004. In Proceedings of TREC.
8
9
 
10
Cui, H., Kan, M.-Y., Chua, T.-S., and Xiao, J. 2004b. A comparative study on sentence retrieval for definitional question answering. In Proceedings of SIGIR 2005 Workshop IR4QA: Information Retrieval for Question Answering.
 
11
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. 39, 1--38.
12
 
13
Gaizauskas, R., Greenwood, M. A., Hepple, M., Roberts, I., and Saggion, H. 2004. The University of Sheffields TREC 2004 Q&A experiments. In Proceedings of TREC.
 
14
Han, K.-S., Chung, H., Kim, S.-B., Song, Y.-I., Lee, J.-Y., and Rim, H.-C. 2004. Korea University Question Answering System at TREC 2004. In Proceedings of TREC.
 
15
Harabagiu, S. M., Moldovan, D. I., Clark, C., Bowden, M., Hickl, A., and Wang, P. 2005. Employing two question answering systems in TREC-2005. In Proceedings of TREC.
 
16
Harabagiu, S. M., Moldovan, D. I., Clark, C., Bowden, M., Williams, J., and Bensley, J. 2003. Answer mining by combining extraction techniques with abductive reasoning. In Proceedings of TREC. 375--382.
 
17
Harabagiu, S. M., Moldovan, D. I., Pasca, M., Mihalcea, R., Surdeanu, M., Bunescu, R. C., Girju, R., Rus, V., and Morarescu, P. 2000. FALCON: Boosting knowledge for answer engines. In Proceedings of TREC.
 
18
Hildebrandt, W., Katz, B., and Lin, J. J. 2004. Answering definition questions with multiple knowledge sources. In Proceedings of HLT-NAACL. 49--56.
 
19
 
20
Katz, B., Bilotti, M., Felshin, S., Fernandes, A., Hildebrandt, W., Katzir, R., Lin, J., Loreto, D., Marton, G., Mora, F., and Uzuner, O. 2004. Answering multiple questions on a topic from heterogeneous resources. In Proceedings of TREC.
21
 
22
Lannon, J. M. 1991. Technical Writing. HarperCollins, New York, NY.
 
23
 
24
 
25
26
27
 
28
 
29
McCallum, A. 2003. Efficiently inducing features of conditional random fields. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03).
 
30
 
31
 
32
Muslea, I. 1999. Extraction patterns for information extraction tasks: A survey. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction. 1--6.
 
33
 
34
 
35
Prager, J. M., Chu-Carroll, J., Czuba, K., Welty, C. A., Ittycheriah, A., and Mahindru, R. 2003. IBM's PIQUANT in TREC2003. In Proceedings of TREC. 283--292.
 
36
 
37
 
38
 
39
Rosenfeld, R. 2000. Two decades of statistical language modeling: Where do we go from here. Proc. the IEEE 88, 8.
 
40
 
41
Schwartz, A. S. and Hearst, M. A. 2003. A simple algorithm for identifying abbreviation definitions in biomedical text. In Proceedings of Pacific Symposium on Biocomputing. 451--462.
 
42
Skounakis, M., Craven, M., and Ray, S. 2003. Hierarchical Hidden Markov Models for information extraction. In Proceedings of 18th International Joint Conference on Artificial Intelligence. 427--433.
 
43
 
44
Voorhees, E. M. 2001. Overview of the TREC 2001 Question Answering Track. In Proceedings of TREC.
 
45
 
46
Voorhees, E. M. 2003b. Overview of the TREC 2003 Question Answering Track. In Proceedings of TREC. 54--68.
 
47
Voorhees, E. M. 2004. Overview of the TREC 2004 Question Answering Track. In Proceedings of TREC.
 
48
 
49
Xu, J., Licuanan, A., and Weischedel, R. M. 2003. TREC 2003 QA at BBN: Answering definitional questions. In Proceedings of TREC. 98--106.
50
 
51
Yang, H., Cui, H., Maslennikov, M., Qiu, L., Kan, M.-Y., and Chua, T.-S. 2003. QUALIFIER In TREC-12 QA main task. In Proceedings of TREC. 480--488.
 
52
Zahariev, M. 2003. Efficient acronym-expansion matching for automatic acronym acquisition. In Proceedings of IKE. 32--37.


Collaborative Colleagues:
Hang Cui: colleagues
Min-Yen Kan: colleagues
Tat-Seng Chua: colleagues