ACM Home Page
Please provide us with feedback. Feedback
Autonomously semantifying wikipedia
Full text PdfPdf (397 KB)
Source
Conference on Information and Knowledge Management archive
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management table of contents
Lisbon, Portugal
SESSION: Semantic annotation (KM) table of contents
Pages 41-50  
Year of Publication: 2007
ISBN:978-1-59593-803-9
Authors
Fei Wu  University of Washington, Seattle, WA
Daniel S. Weld  University of Washington, Seattle, WA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 55,   Downloads (12 Months): 334,   Citation Count: 23
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1321440.1321449
What is a DOI?

ABSTRACT

Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describea prototype implementation of a self-supervised, machine learning system which realizes our vision. Preliminary experiments demonstrate the high precision of our system's extracted data - in one case equaling that of humans.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In ESWC, 2007.
 
4
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007.
 
5
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, May 2001.
 
6
 
7
8
 
9
R. de Salvo Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. An inference model for semantic entailment in natural language. In National Conference on Artificial Intelligence (AAAI), pages 1678--1679, 2005.
10
 
11
 
12
D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In Procs. of IJCAI 2005, 2005.
13
 
14
 
15
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st National Conference on Artificial Intelligence, pages 1301--1306, 2006.
 
16
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of The 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 2007.
 
17
A. Y. Halevy, O. Etzioni, A. Doan, Z. G. Ives, J. Madhavan, L. McDowell, and I. Tatarinov. Crossing the structure chasm. In Proceedings of CIDR, 2003.
18
 
19
 
20
B. MacCartney and C. D. Manning. Natural logic for textual inference. In Workshop on Textual Entailment and Paraphrasing, ACL 2007, 2007.
 
21
A. K. McCallum. Mallet: A machine learning for language toolkit. In http://mallet.cs.umass.edu, 2002.
 
22
 
23
D. P. Nguyen, Y. Matsuo, and M. Ishizuka. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS, 2007.
 
24
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999.
 
25
D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, pages 169--198, 1999.
 
26
S. P. Ponzetto and M. Strube. Deriving a large scale taxonomy from wikipedia. In Proceedings of the 22st National Conference on Artificial Intelligence, pages 1440--1445, 2007.
 
27
E. Riloff and J. Shepherd. A corpus-based approach for building semantic lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 117--124, Providence, RI, 1997.
28
29
 
30
W. Wu, A. Doan, C. Yu, and W. Meng. Bootstrapping domain ontology for Semantic Web services from source web sites. In Proceedings of the VLDB-05 Workshop on Technologies for E-Services, 2005.

CITED BY  23