| StatSnowball: a statistical approach to extracting entity relationships |
| Full text |
Pdf
(853 KB)
|
Source
|
International World Wide Web Conference
archive
Proceedings of the 18th international conference on World wide web
table of contents
Madrid, Spain
SESSION: Data mining/session: statistical methods
table of contents
Pages 101-110
Year of Publication: 2009
ISBN:978-1-60558-487-4
|
|
Authors
|
|
Jun Zhu
|
Tsinghua University, Beijing, China
|
|
Zaiqing Nie
|
Microsoft Research Asia, Beijing, China
|
|
Xiaojiang Liu
|
University of Science and Technology of China, Hefei, China
|
|
Bo Zhang
|
Tsinghua University, Beijing, China
|
|
Ji-Rong Wen
|
Microsoft Research Asia, Beijing, China
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 41, Downloads (12 Months): 213, Citation Count: 0
|
|
|
ABSTRACT
Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic networks (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an l1-norm penalized maximum likelihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during iterations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.
|
| |
4
|
M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. In ACL, 2008.
|
| |
5
|
|
| |
6
|
|
| |
7
|
Oren Etzioni , Michael Cafarella , Doug Downey , Ana-Maria Popescu , Tal Shaked , Stephen Soderland , Daniel S. Weld , Alexander Yates, Unsupervised named-entity extraction from the web: an experimental study, Artificial Intelligence, v.165 n.1, p.91-134, June 2005
[doi> 10.1016/j.artint.2005.03.001]
|
| |
8
|
C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In EACL, 2006.
|
| |
9
|
A. Harabagiu, C. A. Bejan, and P. Morcheckarescu. Shallow semantics for relation extraction. In IJCAI, 2005.
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
A. McCallum. Efficiently inducing features of conditional random fields. In UAI, 2003.
|
| |
17
|
A. McCallum and D. Jensen. A note on the unification of information extraction and data mining using conditional probability, relational models. In IJCAI-2003 Workshop on Learning Statistical Models from Relational Data, 2003.
|
| |
18
|
Z. Nie, J.-R. Wen, and W.-Y. Ma. Object-level vertical search. In CIDR, 2007.
|
| |
19
|
|
| |
20
|
H. Poon and P. Domingos. Joint inference in information extraction. In AAAI, 2007.
|
| |
21
|
|
| |
22
|
|
| |
23
|
P. Singla and P. Domingos. Discriminative training of markov logic networks. In AAAI, 2005.
|
 |
24
|
Choon Hui Teo , Alex Smola , S. V.N. Vishwanathan , Quoc Viet Le, A scalable modular convex solver for regularized risk minimization, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281270]
|
| |
25
|
R. Tibshirani. Regression shrinkage and selection via the LASSO. J. Royal. Statist. Soc., B(58):267--288, 1996.
|
| |
26
|
|
| |
27
|
G. Zhou, M. Zhang, D. H. Ji, and Q. Zhu. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In EMNLP-CoNLL, 2005.
|
 |
28
|
Jun Zhu , Zaiqing Nie , Ji-Rong Wen , Bo Zhang , Wei-Ying Ma, Simultaneous record detection and attribute labeling in web data extraction, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150457]
|
|