| Strategies for lifelong knowledge extraction from the web |
| Full text |
Pdf
(326 KB)
|
Source
|
International Conference On Knowledge Capture
archive
Proceedings of the 4th international conference on Knowledge capture
table of contents
Whistler, BC, Canada
SESSION: Text analysis for knowledge acquisition
table of contents
Pages: 95 - 102
Year of Publication: 2007
ISBN:978-1-59593-643-1
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 19, Downloads (12 Months): 146, Citation Count: 4
|
|
|
ABSTRACT
The increasing availability of electronic text has made it possible to acquire information using a variety of techniques that leverage the expertise of both humans and machines. In particular, the field of Information Extraction (IE), in which knowledge is extracted automatically from text, has shown promise for large-scale knowledge acquisition. While IE systems can uncover assertions about individual entities with an increasing level of sophistication,alltext understanding -- the formation of a coherent theory from a textual corpus -- involves representation and learning abilities not currently achievable by today's IE systems. Compared to individual relational assertions outputted by IE systems, a theory includes coherent knowledge of abstract concepts and the relationships among them. We believe that the ability to fully discover the richness of knowledge present within large, unstructured and heterogeneous corpora will require a lifelong learning process in which earlier learned knowledge is used to guide subsequent learning. This paper introduces Alice, a lifelong learning agent whose goal is to automatically discovera collection of concepts, facts and generalizations that describe a particular topic of interest directly from a large volume of Web text. Building upon recent advances in unsupervised information extraction, we demonstrate that Alice can iteratively discover new concepts and compose general domain knowledge with a precision of 78%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Banko, M. Cararella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In Procs. of IJCAI, 2007.
|
| |
2
|
|
| |
3
|
D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In Procs. of IJCAI 2005, 2005.
|
| |
4
|
O. Etzioni, M. Banko, and M. Cafarella. Machine reading. In AAAI, 2006.
|
| |
5
|
Oren Etzioni , Michael Cafarella , Doug Downey , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Unsupervised named-entity extraction from the web: an experimental study, Artificial Intelligence, v.165 n.1, p.91-134, June 2005
[doi> 10.1016/j.artint.2005.03.001]
|
| |
6
|
D. Lenat. Automated theory formation in mathematics. In Procs. of IJCAI, 1977.
|
| |
7
|
|
| |
8
|
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1967.
|
| |
9
|
T. Mitchell. Reading the web: A breakthrough goal for AI. In AI Magazine. AAAI Press, 2005.
|
| |
10
|
|
| |
11
|
|
| |
12
|
Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation
|
| |
13
|
|
 |
14
|
|
| |
15
|
A. Teller. Exegesis. Random House, 1999.
|
| |
16
|
|
| |
17
|
S. Thrun and T. Mitchell. Lifelong robot learning. Robotics and Autonomous Systems, 15:25--46, 1995.
|
|