ACM Home Page
Please provide us with feedback. Feedback
StatSnowball: a statistical approach to extracting entity relationships
Full text PdfPdf (853 KB)
Source
International World Wide Web Conference archive
Proceedings of the 18th international conference on World wide web table of contents
Madrid, Spain
SESSION: Data mining/session: statistical methods table of contents
Pages 101-110  
Year of Publication: 2009
ISBN:978-1-60558-487-4
Authors
Jun Zhu  Tsinghua University, Beijing, China
Zaiqing Nie  Microsoft Research Asia, Beijing, China
Xiaojiang Liu  University of Science and Technology of China, Hefei, China
Bo Zhang  Tsinghua University, Beijing, China
Ji-Rong Wen  Microsoft Research Asia, Beijing, China
Sponsor
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 41,   Downloads (12 Months): 213,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1526709.1526724
What is a DOI?

ABSTRACT

Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE.

StatSnowball uses the discriminative Markov logic networks (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an l1-norm penalized maximum likelihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during iterations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
 
3
M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.
 
4
M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. In ACL, 2008.
 
5
 
6
 
7
 
8
C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In EACL, 2006.
 
9
A. Harabagiu, C. A. Bejan, and P. Morcheckarescu. Shallow semantics for relation extraction. In IJCAI, 2005.
10
 
11
12
13
 
14
 
15
 
16
A. McCallum. Efficiently inducing features of conditional random fields. In UAI, 2003.
 
17
A. McCallum and D. Jensen. A note on the unification of information extraction and data mining using conditional probability, relational models. In IJCAI-2003 Workshop on Learning Statistical Models from Relational Data, 2003.
 
18
Z. Nie, J.-R. Wen, and W.-Y. Ma. Object-level vertical search. In CIDR, 2007.
 
19
 
20
H. Poon and P. Domingos. Joint inference in information extraction. In AAAI, 2007.
 
21
 
22
 
23
P. Singla and P. Domingos. Discriminative training of markov logic networks. In AAAI, 2005.
24
 
25
R. Tibshirani. Regression shrinkage and selection via the LASSO. J. Royal. Statist. Soc., B(58):267--288, 1996.
 
26
 
27
G. Zhou, M. Zhang, D. H. Ji, and Q. Zhu. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In EMNLP-CoNLL, 2005.
28

Collaborative Colleagues:
Jun Zhu: colleagues
Zaiqing Nie: colleagues
Xiaojiang Liu: colleagues
Bo Zhang: colleagues
Ji-Rong Wen: colleagues