|
ABSTRACT
We address the task of answering natural language questions by using the large number of Frequently Asked Questions (FAQ) pages available on the web. The task involves three steps: (1) fetching FAQ pages from the web; (2) automatic extraction of question/answer (Q/A) pairs from the collected pages; and (3) answering users' questions by retrieving appropriate Q/A pairs. We discuss our solutions for each of the three tasks, and give detailed evaluation results on a collected corpus of about 3.6Gb of text data (293K pages, 2.8M Q/A pairs), with real users' questions sampled from a web search engine log. Specifically, we propose simple but effective methods for Q/A extraction and investigate task-specific retrieval models for answering questions. Our best model finds answers for 36% of the test questions in the top 20 results. Our overall conclusion is that FAQ pages on the web provide an excellent resource for addressing real users' information needs in a highly focused manner.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Apache Lucene: A high-performance, full-featured text search engine library. http://lucene.apache.org.
|
 |
2
|
|
 |
3
|
Adam Berger , Rich Caruana , David Cohn , Dayne Freitag , Vibhu Mittal, Bridging the lexical chasm: statistical approaches to answer-finding, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.192-199, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345576]
|
| |
4
|
R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. Natural language processing in the FAQFinder system: Results and prospects. In Proc. 1997 AAAI Spring Symposium on Natural Language Processing for the World Wide Web, pages 17--26, 1997.
|
| |
5
|
R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the FAQFinder system. AI Magazine, 18(2):57--66, 1997.
|
| |
6
|
|
| |
7
|
|
| |
8
|
W. Daelemans, J. Zavrel, K. Van Der Sloot, and A. Van Den Bosch. TiMBL: Tilburg Memory Based Learner, version 5.0. Tech. Report 03--10, 2003.
|
 |
9
|
Oren Etzioni , Michael Cafarella , Doug Downey , Stanley Kok , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Web-scale information extraction in knowitall: (preliminary results), Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988687]
|
| |
10
|
A. Foster and N. Ford. Serendipity and information seeking: an empirical study. J. Documentation, 59(3):321--340, 2003.
|
| |
11
|
N. Fuhr, M. Lalmas, S. Malik, and Z. Szlavik, editors. Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2004), LNCS 3493, Springer, 2005
|
| |
12
|
|
| |
13
|
B. Katz. Annotating the World Wide Web using natural language. In Proc. RIAO'97, 1997.
|
| |
14
|
Boris Katz , Sue Felshin , Deniz Yuret , Ali Ibrahim , Jimmy J. Lin , Gregory Marton , Alton Jerome McFarland , Baris Temelkuran, Omnibase: Uniform Access to Heterogeneous Data for Question Answering, Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers, p.230-234, June 27-28, 2002
|
| |
15
|
H. Kim and J. Seo. High-performance FAQ retrieval using an automatic clustering method of query logs. Information Processing & Management, in press.
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
Hanny Yulius Limanto , Nguyen Ngoc Giang , Vo Tan Trung , Jun Zhang , Qi He , Nguyen Quang Huy, An information extraction engine for web discussion forums, Special interest tracks and posters of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
[doi> 10.1145/1062745.1062827]
|
| |
21
|
C.-Y. Lin, D. Quan, V. Sinha, K. Bakshi, D. Huynh, B. Katz, and D. Karger. What makes a good answer? The role of context in question answering systems. In Proc. INTERACT 2003, 2003.
|
| |
22
|
S. Lytinen and N. Tomuro. The use of question types to match questions in FAQFinder. In Proc. AAAI-2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, pages 46--53, 2002.
|
| |
23
|
S. Lytinen, N. Tomuro, and T. Repede. The use of WordNet sense tagging in FAQFinder. In Proc. AAAI-2000 Workshop on AI and Web Search, Austin, TX, 2000.
|
| |
24
|
|
| |
25
|
G. Mishne and M. de Rijke. Boosting Web Retrieval through Query Operations. In Proc. ECIR 2005, pages 502--516, 2005.
|
| |
26
|
M. Porter. An algorithm for suffix stripping. Program, 14 (3):130--137, 1980.
|
 |
27
|
Dragomir Radev , Weiguo Fan , Hong Qi , Harris Wu , Amardeep Grewal, Probabilistic question answering on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511500]
|
 |
28
|
Ganesh Ramakrishnan , Soumen Chakrabarti , Deepa Paranjpe , Pushpak Bhattacharya, Is question answering an acquired skill?, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988688]
|
| |
29
|
R. Soricut and E. Brill. Automatic question answering: Beyond the factoid. In Proc. HLT/NAACL, 2004.
|
| |
30
|
|
 |
31
|
|
| |
32
|
|
| |
33
|
|
| |
34
|
Z. Zheng. AnswerBus question answering system. In Proc. HLT 2002, 2002.
|
CITED BY 7
|
|
Jiwoon Jeon , W. Bruce Croft , Joon Ho Lee , Soyeon Park, A framework to predict the quality of answers with non-textual features, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11, 2006, Seattle, Washington, USA
|
|
|
Gao Cong , Long Wang , Chin-Yew Lin , Young-In Song , Yueheng Sun, Finding question-answer pairs from online forums, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, July 20-24, 2008, Singapore, Singapore
|
|
|
|
|
|
|
|
|
|
|
|
Xin-Jing Wang , Xudong Tu , Dan Feng , Lei Zhang, Ranking community answers by modeling question-answer relationships via analogical reasoning, Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, July 19-23, 2009, Boston, MA, USA
|
|
|
|
|