| Mining the web for answers to natural language questions |
| Full text |
Pdf
(1.47 MB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the tenth international conference on Information and knowledge management
table of contents
Atlanta, Georgia, USA
Session: World Wide Web
table of contents
Pages: 143 - 150
Year of Publication: 2001
ISBN:1-58113-436-3
|
|
Authors
|
|
Dragomir R. Radev
|
University of Michigan, Ann Arbor, MI
|
|
Hong Qi
|
University of Michigan, Ann Arbor, MI
|
|
Zhiping Zheng
|
University of Michigan, Ann Arbor, MI
|
|
Sasha Blair-Goldensohn
|
University of Michigan, Ann Arbor, MI
|
|
Zhu Zhang
|
University of Michigan, Ann Arbor, MI
|
|
Weiguo Fan
|
University of Michigan, Ann Arbor, MI
|
|
John Prager
|
IBM TJ Watson Research Center, Hawthorne, NY
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 68, Citation Count: 21
|
|
|
ABSTRACT
The web is now becoming one of the largest information and knowledge repositories. Many large scale search engines (Google, Fast, Northern Light, etc.) have emerged to help users find information. In this paper, we study how we can effectively use these existing search engines to mine the Web and discover the "correct" answers to factual natural language questions.We propose a probabilistic algorithm called QASM (Question Answering using Statistical Models) that learns the best query paraphrase of a natural language question. We validate our approach for both local and web search engines using questions from the TREC evaluation. We also show how this algorithm can be combined with another algorithm (AnSel) to produce precise answers to natural language questions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
The Fast search engine. http://www.alltheweb.com, 2001.
|
| |
2
|
|
| |
3
|
Adam L. Berger , Peter F. Brown , Stephen A. Della Pietra , Vincent J. Della Pietra , John R. Gillett , John D. Lafferty , Robert L. Mercer , Harry Printz , Luboš Ureš, The Candide system for machine translation, Proceedings of the workshop on Human Language Technology, March 08-11, 1994, Plainsboro, NJ
[doi> 10.3115/1075812.1075844]
|
 |
4
|
|
| |
5
|
Peter F. Brown , John Cocke , Stephen A. Della Pietra , Vincent J. Della Pietra , Fredrick Jelinek , John D. Lafferty , Robert L. Mercer , Paul S. Roossin, A statistical approach to machine translation, Computational Linguistics, v.16 n.2, p.79-85, June 1990
|
| |
6
|
|
| |
7
|
|
| |
8
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society series B, 39: l-38, 1977.
|
| |
9
|
The Excite query corpus. ftp:Nftp.excite.comlpub/jack/Excite-Log-l2201999.gz, 1999.
|
 |
10
|
|
| |
11
|
S. Harabagiu, D. Moldovan, M. Pasta, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Girju, V. Rus, and P. Morarescu. The TREC-9 question answering track evaluation. In Text Retrieval Conference TREC-9, Gaithersburg, MD, 200 1.
|
| |
12
|
|
| |
13
|
|
| |
14
|
K. Knight and D. Marcu. Statistics-based summarization -step one: sentence compression. In Proceedings, Seventeenth Annual Conference of the American Association for ArtiJicial Intelligence, Austin, Texas, August 2000.
|
| |
15
|
|
| |
16
|
A. Mikheev. Tagging sentence boundaries. In Proceedings, SIGIR 2000,200O.
|
| |
17
|
G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography (special issue), 3(4):235-312, 1990.
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
John Prager , Eric Brown , Anni Coden , Dragomir Radev, Question-answering by predictive annotation, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.184-191, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345574]
|
| |
22
|
D. R. Radev, K. Libner, and W. Fan. An empirical evaluation of the capability of state-of-the-art search engines to answer natural language questions. Submitted, 2001.
|
| |
23
|
|
| |
24
|
E. Voorhees and D. Tice. The TREC-8 question answering track evaluation. In Text Retrieval Conference TREC-8, Gaithersburg, MD, 2000.
|
CITED BY 21
|
|
Susan Dumais , Michele Banko , Eric Brill , Jimmy Lin , Andrew Ng, Web question answering: is more always better?, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
|
|
|
|
|
|
|
|
|
|
|
|
Oren Etzioni , Michael Cafarella , Doug Downey , Stanley Kok , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Web-scale information extraction in knowitall: (preliminary results), Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
|
|
|
Ganesh Ramakrishnan , Soumen Chakrabarti , Deepa Paranjpe , Pushpak Bhattacharya, Is question answering an acquired skill?, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
|
|
|
Dragomir Radev , Weiguo Fan , Hong Qi , Harris Wu , Amardeep Grewal, Probabilistic question answering on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|