ACM Home Page
Please provide us with feedback. Feedback
Digital Library logoTake a look at the new version of this page: [ beta version ]. Tell us what you think.
Web-scale information extraction in knowitall: (preliminary results)
Full text PdfPdf (171 KB)
Source International World Wide Web Conference archive
Proceedings of the 13th international conference on World Wide Web table of contents
New York, NY, USA
SESSION: Information extraction table of contents
Pages: 100 - 110  
Year of Publication: 2004
ISBN:1-58113-844-X
Authors
Oren Etzioni  University of Washington, Seattle, WA
Michael Cafarella  University of Washington, Seattle, WA
Doug Downey  University of Washington, Seattle, WA
Stanley Kok  University of Washington, Seattle, WA
Ana-Maria Popescu  University of Washington, Seattle, WA
Tal Shaked  University of Washington, Seattle, WA
Stephen Soderland  University of Washington, Seattle, WA
Daniel S. Weld  University of Washington, Seattle, WA
Alexander Yates  University of Washington, Seattle, WA
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 29,   Downloads (12 Months): 228,   Citation Count: 77
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/988672.988687
What is a DOI?

ABSTRACT

Manually querying search engines in order to accumulate a large bodyof factual information is a tedious, error-prone process of piecemealsearch. Search engines retrieve and rank potentially relevantdocuments for human perusal, but do not extract facts, assessconfidence, or fuse information from multiple documents. This paperintroduces KnowItAll, a system that aims to automate the tedious process ofextracting large collections of facts from the web in an autonomous,domain-independent, and scalable manner.The paper describes preliminary experiments in which an instance of KnowItAll, running for four days on a single machine, was able to automatically extract 54,753 facts. KnowItAll associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KnowItAll's architecture and reports on lessons learned for the design of large-scale information extraction systems.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
M. Banko, E. Brill, S. Dumais, and J. Lin. AskMSR: Question answering using the Worldwide Web. In Proceedings of 2002 AAAI Spring Symposium on Mining Answers from Texts and Knowledge Bases, 2002.
 
3
 
4
 
5
 
6
 
7
8
 
9
 
10
O. Etzioni. Moving up the information food chain: softbots as information carnivores. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, 1996. Revised version reprinted in AI Magazine special issue, Summer '97.
 
11
Charles L. Forgy. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19(1):17--37, 1982.
 
12
D. Freitag and A. McCallum. Information extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.
 
13
 
14
 
15
Craig A. Knoblock, Kristina Lerman, Steven Minton, and Ion Muslea. Accurately and reliably extracting data from the Web: A machine learning approach. IEEE Data Engineering Bulletin, 23(4):33--41, 2000.
 
16
N. Kushmerick, D. Weld, and R. Doorenbos. Wrapper Induction for Information Extraction. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pages 729--737. San Francisco, CA: Morgan Kaufmann, 1997.
17
 
18
 
19
 
20
A. McCallum. Efficiently inducing features or conditional random fields. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, 2003.
 
21
D. Moldovan, S. Harabagiu, R. Girju, P. Morarescu, F. Lacatusu, A. Novischi, A. Badulescu, and O. Bolohan. Lcc tools for question answering.
22
 
23
 
24
 
25
M. Skounakis, M. Craven, and S. Ray. Hierarchical hidden markov models for information extraction. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, 2003.
 
26
 
27
S. Soderland, D. Fisher, J. Aseltine, and W. Lehnert. CRYSTAL: Inducing a conceptual dictionary. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1314--21, 1995.
 
28
 
29
Ellen M. Voorhees. Overview of the TREC 2001 question answering track. In Text REtrieval Conference, 2001.

CITED BY  79

Collaborative Colleagues:
Oren Etzioni: colleagues
Michael Cafarella: colleagues
Doug Downey: colleagues
Stanley Kok: colleagues
Ana-Maria Popescu: colleagues
Tal Shaked: colleagues
Stephen Soderland: colleagues
Daniel S. Weld: colleagues
Alexander Yates: colleagues