ACM Home Page
Please provide us with feedback. Feedback
Retrieving answers from frequently asked questions pages on the web
Full text PdfPdf (233 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the 14th ACM international conference on Information and knowledge management table of contents
Bremen, Germany
SESSION: Paper session IR-2 (information retrieval): question answering table of contents
Pages: 76 - 83  
Year of Publication: 2005
ISBN:1-59593-140-6
Authors
Valentin Jijkoun  University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke  University of Amsterdam, Amsterdam, The Netherlands
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 22,   Downloads (12 Months): 117,   Citation Count: 7
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1099554.1099571
What is a DOI?

ABSTRACT

We address the task of answering natural language questions by using the large number of Frequently Asked Questions (FAQ) pages available on the web. The task involves three steps: (1) fetching FAQ pages from the web; (2) automatic extraction of question/answer (Q/A) pairs from the collected pages; and (3) answering users' questions by retrieving appropriate Q/A pairs. We discuss our solutions for each of the three tasks, and give detailed evaluation results on a collected corpus of about 3.6Gb of text data (293K pages, 2.8M Q/A pairs), with real users' questions sampled from a web search engine log. Specifically, we propose simple but effective methods for Q/A extraction and investigate task-specific retrieval models for answering questions. Our best model finds answers for 36% of the test questions in the top 20 results. Our overall conclusion is that FAQ pages on the web provide an excellent resource for addressing real users' information needs in a highly focused manner.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Apache Lucene: A high-performance, full-featured text search engine library. http://lucene.apache.org.
2
3
 
4
R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. Natural language processing in the FAQFinder system: Results and prospects. In Proc. 1997 AAAI Spring Symposium on Natural Language Processing for the World Wide Web, pages 17--26, 1997.
 
5
R. Burke, K. Hammond, V. Kulyukin, S. Lytinen, N. Tomuro, and S. Schoenberg. Question answering from frequently asked question files: Experiences with the FAQFinder system. AI Magazine, 18(2):57--66, 1997.
 
6
 
7
 
8
W. Daelemans, J. Zavrel, K. Van Der Sloot, and A. Van Den Bosch. TiMBL: Tilburg Memory Based Learner, version 5.0. Tech. Report 03--10, 2003.
9
 
10
A. Foster and N. Ford. Serendipity and information seeking: an empirical study. J. Documentation, 59(3):321--340, 2003.
 
11
N. Fuhr, M. Lalmas, S. Malik, and Z. Szlavik, editors. Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2004), LNCS 3493, Springer, 2005
 
12
 
13
B. Katz. Annotating the World Wide Web using natural language. In Proc. RIAO'97, 1997.
 
14
 
15
H. Kim and J. Seo. High-performance FAQ retrieval using an automatic clustering method of query logs. Information Processing & Management, in press.
 
16
 
17
18
 
19
20
 
21
C.-Y. Lin, D. Quan, V. Sinha, K. Bakshi, D. Huynh, B. Katz, and D. Karger. What makes a good answer? The role of context in question answering systems. In Proc. INTERACT 2003, 2003.
 
22
S. Lytinen and N. Tomuro. The use of question types to match questions in FAQFinder. In Proc. AAAI-2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases, pages 46--53, 2002.
 
23
S. Lytinen, N. Tomuro, and T. Repede. The use of WordNet sense tagging in FAQFinder. In Proc. AAAI-2000 Workshop on AI and Web Search, Austin, TX, 2000.
 
24
 
25
G. Mishne and M. de Rijke. Boosting Web Retrieval through Query Operations. In Proc. ECIR 2005, pages 502--516, 2005.
 
26
M. Porter. An algorithm for suffix stripping. Program, 14 (3):130--137, 1980.
27
28
 
29
R. Soricut and E. Brill. Automatic question answering: Beyond the factoid. In Proc. HLT/NAACL, 2004.
 
30
31
 
32
 
33
 
34
Z. Zheng. AnswerBus question answering system. In Proc. HLT 2002, 2002.

CITED BY  7

Collaborative Colleagues:
Valentin Jijkoun: colleagues
Maarten de Rijke: colleagues