|
ABSTRACT
Berners-Lee's compelling vision of a Semantic Web is hindered by a chicken-and-egg problem, which can be best solved by a bootstrapping method - creating enough structured data to motivate the development of applications. This paper argues that autonomously "Semantifying Wikipedia" is the best way to solve the problem. We choose Wikipedia as an initial data source, because it is comprehensive, not too large, high-quality, and contains enough manually-derived structure to bootstrap an autonomous, self-supervised process. We identify several types of structures which can be automatically enhanced in Wikipedia (e.g., link structure, taxonomic data, infoboxes, etc.), and we describea prototype implementation of a self-supervised, machine learning system which realizes our vision. Preliminary experiments demonstrate the high precision of our system's extracted data - in one case equaling that of humans.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
|
| |
3
|
S. Auer and J. Lehmann. What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In ESWC, 2007.
|
| |
4
|
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007.
|
| |
5
|
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, May 2001.
|
| |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
R. de Salvo Braz, R. Girju, V. Punyakanok, D. Roth, and M. Sammons. An inference model for semantic entailment in natural language. In National Conference on Artificial Intelligence (AAAI), pages 1678--1679, 2005.
|
 |
10
|
Stephen Dill , Nadav Eiron , David Gibson , Daniel Gruhl , R. Guha , Anant Jhingran , Tapas Kanungo , Sridhar Rajagopalan , Andrew Tomkins , John A. Tomlin , Jason Y. Zien, SemTag and seeker: bootstrapping the semantic web via automated semantic annotation, Proceedings of the 12th international conference on World Wide Web, May 20-24, 2003, Budapest, Hungary
[doi> 10.1145/775152.775178]
|
| |
11
|
|
| |
12
|
D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In Procs. of IJCAI 2005, 2005.
|
 |
13
|
Susan Dumais , Michele Banko , Eric Brill , Jimmy Lin , Andrew Ng, Web question answering: is more always better?, Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, August 11-15, 2002, Tampere, Finland
[doi> 10.1145/564376.564428]
|
| |
14
|
Oren Etzioni , Michael Cafarella , Doug Downey , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Unsupervised named-entity extraction from the web: an experimental study, Artificial Intelligence, v.165 n.1, p.91-134, June 2005
[doi> 10.1016/j.artint.2005.03.001]
|
| |
15
|
E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st National Conference on Artificial Intelligence, pages 1301--1306, 2006.
|
| |
16
|
E. Gabrilovich and S. Markovitch. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of The 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 2007.
|
| |
17
|
A. Y. Halevy, O. Etzioni, A. Doan, Z. G. Ives, J. Madhavan, L. McDowell, and I. Tatarinov. Crossing the structure chasm. In Proceedings of CIDR, 2003.
|
 |
18
|
|
| |
19
|
|
| |
20
|
B. MacCartney and C. D. Manning. Natural logic for textual inference. In Workshop on Textual Entailment and Paraphrasing, ACL 2007, 2007.
|
| |
21
|
A. K. McCallum. Mallet: A machine learning for language toolkit. In http://mallet.cs.umass.edu, 2002.
|
| |
22
|
|
| |
23
|
D. P. Nguyen, Y. Matsuo, and M. Ishizuka. Exploiting syntactic and semantic information for relation extraction from wikipedia. In IJCAI07-TextLinkWS, 2007.
|
| |
24
|
K. Nigam, J. Lafferty, and A. McCallum. Using maximum entropy for text classification. In Proceedings of the IJCAI-99 Workshop on Machine Learning for Information Filtering, 1999.
|
| |
25
|
D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, pages 169--198, 1999.
|
| |
26
|
S. P. Ponzetto and M. Strube. Deriving a large scale taxonomy from wikipedia. In Proceedings of the 22st National Conference on Artificial Intelligence, pages 1440--1445, 2007.
|
| |
27
|
E. Riloff and J. Shepherd. A corpus-based approach for building semantic lexicons. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 117--124, Providence, RI, 1997.
|
 |
28
|
|
 |
29
|
Max Völkel , Markus Krötzsch , Denny Vrandecic , Heiko Haller , Rudi Studer, Semantic Wikipedia, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135863]
|
| |
30
|
W. Wu, A. Doan, C. Yu, and W. Meng. Bootstrapping domain ontology for Semantic Web services from source web sites. In Proceedings of the VLDB-05 Workshop on Technologies for E-Services, 2005.
|
CITED BY 23
|
|
|
|
|
|
|
|
|
|
|
Huan Wang , Xing Jiang , Liang-Tien Chia , Ah-Hwee Tan, Ontology enhanced web image retrieval: aided by wikipedia & spreading activation theory, Proceeding of the 1st ACM international conference on Multimedia information retrieval, October 30-31, 2008, Vancouver, British Columbia, Canada
|
|
|
|
|
|
|
|
|
Raphael Hoffmann , Saleema Amershi , Kayur Patel , Fei Wu , James Fogarty , Daniel S. Weld, Amplifying community content creation with mixed initiative information extraction, Proceedings of the 27th international conference on Human factors in computing systems, April 04-09, 2009, Boston, MA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Earl J. Wagner , Jiahui Liu , Larry Birnbaum , Kenneth D. Forbus, Rich interfaces for reading news on the web, Proceedings of the 13th international conference on Intelligent user interfaces, February 08-11, 2009, Sanibel Island, Florida, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Daniel S. Weld , Fei Wu , Eytan Adar , Saleema Amershi , James Fogarty , Raphael Hoffmann , Kayur Patel , Michael Skinner, Intelligence in wikipedia, Proceedings of the 23rd national conference on Artificial intelligence, p.1609-1614, July 13-17, 2008, Chicago, Illinois
|
|
|
|
|
|
Fei Chen , Byron J. Gao , AnHai Doan , Jun Yang , Raghu Ramakrishnan, Optimizing complex extraction programs over evolving text data, Proceedings of the 35th SIGMOD international conference on Management of data, June 29-July 02, 2009, Providence, Rhode Island, USA
|
|