|
ABSTRACT
This paper gives an overview on the YAGO-NAGA approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. YAGO harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of WordNet in order to ensure that all entities have proper classes and the class system is consistent. Currently, the YAGO knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the YAGO extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for YAGO, coined NAGA. It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Eugene Agichtein: Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull. 28(4), 2005.
|
| |
2
|
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary G. Ives: DBpedia: A Nucleus for a Web of Open Data. ISWC/ASWC 2007.
|
| |
3
|
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007.
|
| |
4
|
Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni: Structured Querying of Web Text Data: A Technical Challenge. CIDR 2007.
|
| |
5
|
Hamish Cunningham: An Introduction to Information Extraction. In: Encyclopedia of Language and Linguistics, 2nd Edition, Elsevier, 2005.
|
| |
6
|
Pedro DeRose , Warren Shen , Fei Chen , AnHai Doan , Raghu Ramakrishnan, Building structured web community portals: a top-down, compositional, and incremental approach, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria
|
| |
7
|
Minko Dudev, Shady Elbassuoni, Julia Luxenburger, Maya Ramanath, Gerhard Weikum: Personalizing the Search for Knowledge. PersDB 2008.
|
| |
8
|
Oren Etzioni , Michael Cafarella , Doug Downey , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Unsupervised named-entity extraction from the web: an experimental study, Artificial Intelligence, v.165 n.1, p.91-134, June 2005
[doi> 10.1016/j.artint.2005.03.001]
|
 |
9
|
|
| |
10
|
Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, Gerhard Weikum: NAGA: Searching and Ranking Knowledge. ICDE 2008.
|
| |
11
|
Gjergji Kasneci, Maya Ramanath, Mauro Sozio, Fabian M. Suchanek, Gerhard Weikum: STAR: Steiner Tree Approximation in Relationship-Graphs. ICDE 2009.
|
| |
12
|
Xiaoyong Liu, W. Bruce Croft: Statistical Language Modeling for Information Retrieval. Annual Review of Information Science and Technology 39, 2004.
|
 |
13
|
Zaiqing Nie , Yunxiao Ma , Shuming Shi , Ji-Rong Wen , Wei-Ying Ma, Web object retrieval, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242584]
|
| |
14
|
Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, Shivakumar Vaithyanathan: An Algebraic Approach to Rule-Based Information Extraction. ICDE 2008.
|
| |
15
|
|
| |
16
|
|
 |
17
|
|
 |
18
|
|
| |
19
|
|
 |
20
|
|
 |
21
|
|
| |
22
|
|
| |
23
|
Qi Zhang, Fabian M. Suchanek, Lihua Yue, Gerhard Weikum: TOB: Timely Ontologies for Business Relations. WebDB 2008.
|
 |
24
|
Jun Zhu , Zaiqing Nie , Ji-Rong Wen , Bo Zhang , Wei-Ying Ma, Simultaneous record detection and attribute labeling in web data extraction, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150457]
|
CITED BY 2
|
|
Xiaoyong Chai , Ba-Quy Vuong , AnHai Doan , Jeffrey F. Naughton, Efficiently incorporating user feedback into information extraction and integration programs, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|
|
Fei Chen , Byron J. Gao , AnHai Doan , Jun Yang , Raghu Ramakrishnan, Optimizing complex extraction programs over evolving text data, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|