ACM Home Page
Please provide us with feedback. Feedback
Harvesting, searching, and ranking knowledge on the web: invited talk
Full text PdfPdf (80 KB)
Source Web Search and Web Data Mining archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining table of contents
Barcelona, Spain
SESSION: Invited talks table of contents
Pages 3-4  
Year of Publication: 2009
ISBN:978-1-60558-390-7
Author
Gerhard Weikum  Max Planck Institute for Informatics, Saarbruecken, Germany
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
: Google
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
: Yahoo! Research
Microsoft : Microsoft
: Nokia
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 41,   Downloads (12 Months): 319,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1498759.1498763
What is a DOI?

ABSTRACT

There are major trends to advance the functionality of search engines to a more expressive semantic level (e.g., [2, 4, 6, 7, 8, 9, 13, 14, 18]). This is enabled by employing large-scale information extraction [1, 11, 20] of entities and relationships from semistructured as well as natural-language Web sources. In addition, harnessing Semantic-Web-style ontologies [22] and reaching into Deep-Web sources [16] can contribute towards a grand vision of turning the Web into a comprehensive knowledge base that can be efficiently searched with high precision.

This talk presents ongoing research towards this objective, with emphasis on our work on the YAGO knowledge base [23, 24] and the NAGA search engine [14] but also covering related projects. YAGO is a large collection of entities and relational facts that are harvested from Wikipedia and WordNet with high accuracy and reconciled into a consistent RDF-style "semantic" graph. For further growing YAGO from Web sources while retaining its high quality, pattern-based extraction is combined with logic-based consistency checking in a unified framework [25]. NAGA provides graph-template-based search over this data, with powerful ranking capabilities based on a statistical language model for graphs. Advanced queries and the need for ranking approximate matches pose efficiency and scalability challenges that are addressed by algorithmic and indexing techniques [15, 17].

YAGO is publicly available and has been imported into various other knowledge-management projects including DB-pedia. YAGO shares many of its goals and methodologies with parallel projects along related lines. These include Avatar [19], Cimple/DBlife [10, 21], DBpedia [3], Know-ItAll/TextRunner [12, 5], Kylin/KOG [26, 27], and the Libra technology [18, 28] (and more). Together they form an exciting trend towards providing comprehensive knowledge bases with semantic search capabilities.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Eugene Agichtein: Scaling Information Extraction to Large Document Collections. IEEE Data Eng. Bull. 28(4), 2005
2
 
3
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary G. Ives: DBpedia: A Nucleus for a Web of Open Data. ISWC/ASWC 2007
 
4
Ricardo A. Baeza-Yates, Massimiliano Ciaramita, Peter Mika, Hugo Zaragoza: Towards Semantic Search. NLDB 2008
 
5
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, Oren Etzioni: Open Information Extraction from the Web. IJCAI 2007
6
 
7
Michael J. Cafarella: Extracting and Querying a Comprehensive Web Database. CIDR 2009
 
8
Soumen Chakrabarti: Breaking Through the Syntax Barrier: Searching with Entities and Relations. ECML 2004
 
9
 
10
 
11
AnHai Doan, Luis Gravano, Raghu Ramakrishnan, Shivakumar Vaithyanathan (Editors): Special Issue on Information Extraction, SIGMOD Record 37(4), December 2008
 
12
 
13
 
14
Gjergji Kasneci, Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, Gerhard Weikum: NAGA: Searching and Ranking Knowledge. ICDE 2008
 
15
Gjergji Kasneci, Maya Ramanath, Mauro Sozio, Fabian M. Suchanek, Gerhard Weikum: STAR: Steiner Tree Approximation in Relationship-Graphs. ICDE 2009
 
16
Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy: Harnessing the Deep Web: Present and Future. CIDR 2009
 
17
18
 
19
Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, Shivakumar Vaithyanathan: An Algebraic Approach to Rule-Based Information Extraction. ICDE 2008
 
20
 
21
 
22
Steffen Staab, Rudi Studer: Handbook on Ontologies, 2nd Edition. Springer 2008
23
 
24
 
25
Fabian Suchanek, Mauro Sozio, Gerhard Weikum: SOFIE: a Self-Organizing Framework for Information Extraction. Technical Report MPI-I-2008-5-004, 2008
26
27
28