|
ABSTRACT
Network data is ubiquitous, encoding collections of relationships between entities such as people, places, genes, or corporations. While many resources for networks of interesting entities are emerging, most of these can only annotate connections in a limited fashion. Although relationships between entities are rich, it is impractical to manually devise complete characterizations of these relationships for every pair of entities on large, real-world corpora. In this paper we present a novel probabilistic topic model to analyze text corpora and infer descriptions of its entities and of relationships between those entities. We develop variational methods for performing approximate inference on our model and demonstrate that our model can be practically deployed on large corpora such as Wikipedia. We show qualitatively and quantitatively that our model can construct and annotate graphs of relationships and make useful predictions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
E. Agichtein and L. Gravano. Querying text databases for efficient information extraction. Data Engineering, International Conference on, 0:113, 2003.
|
 |
2
|
|
| |
3
|
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI 2007, 2007.
|
 |
4
|
|
| |
5
|
|
 |
6
|
Deng Cai , Zheng Shao , Xiaofei He , Xifeng Yan , Jiawei Han, Mining hidden community in heterogeneous social networks, Proceedings of the 3rd international workshop on Link discovery, p.58-65, August 21-25, 2005, Chicago, Illinois
[doi> 10.1145/1134271.1134280]
|
| |
7
|
A. Culotta, R. Bekkerman, and A. McCallum. Extracting social networks and contact information from email and the web. AAAI 2005, 2005.
|
| |
8
|
D. Davidov, A. Rappoport, and M. Koppel. Fully unsupervised discovery of concept-specific relationships by web mining. In ACL, 2007.
|
| |
9
|
C. Diehl, G. M. Namata, and L. Getoor. Relationship identification for social network discovery. In AAAI 2007, July 2007.
|
| |
10
|
B. Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78(382), 1983.
|
 |
11
|
David Gibson , Jon Kleinberg , Prabhakar Raghavan, Inferring Web communities from link topology, Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems, p.225-234, June 20-24, 1998, Pittsburgh, Pennsylvania, United States
[doi> 10.1145/276627.276652]
|
 |
12
|
|
| |
13
|
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. Oct 1999.
|
| |
14
|
S. Katrenko and P. Adriaans. Learning relations from biomedical corpora using dependency trees. Lecture Notes in Computer Science, 2007.
|
 |
15
|
Jure Leskovec , Lars Backstrom , Ravi Kumar , Andrew Tomkins, Microscopic evolution of social networks, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
[doi> 10.1145/1401890.1401948]
|
 |
16
|
|
| |
17
|
A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and role discovery in social networks. IJCAI 2005, 2005.
|
 |
18
|
Amy McGovern , Lisa Friedland , Michael Hay , Brian Gallagher , Andrew Fast , Jennifer Neville , David Jensen, Exploiting relational structure to understand publication patterns in high-energy physics, ACM SIGKDD Explorations Newsletter, v.5 n.2, December 2003
[doi> 10.1145/980972.980999]
|
| |
19
|
E. Meeds, Z. Ghahramani, R. Neal, and S. Roweis. Modeling dyadic data with binary latent factors. NIPS 2007, 2007.
|
 |
20
|
|
 |
21
|
|
 |
22
|
Ramesh M. Nallapati , Amr Ahmed , Eric P. Xing , William W. Cohen, Joint latent topic models for text and citations, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
[doi> 10.1145/1401890.1401957]
|
| |
23
|
O. J. Nave. Nave's Topical Bible. Thomas Nelson, 2003.
|
 |
24
|
|
| |
25
|
M. E. J. Newman. Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 2006.
|
| |
26
|
|
| |
27
|
M. Rabbat, M. Figueiredo, and R. Nowak. Inferring network structure from co-occurrences. NIPS 2006, 2006.
|
| |
28
|
Michal Rosen-Zvi , Thomas Griffiths , Mark Steyvers , Padhraic Smyth, The author-topic model for authors and documents, Proceedings of the 20th conference on Uncertainty in artificial intelligence, p.487-494, July 07-11, 2004, Banff, Canada
|
 |
29
|
Saurav Sahay , Sougata Mukherjea , Eugene Agichtein , Ernest V. Garcia , Shamkant B. Navathe , Ashwin Ram, Discovering semantic biomedical relations utilizing the Web, ACM Transactions on Knowledge Discovery from Data (TKDD), v.2 n.1, p.1-15, March 2008
[doi> 10.1145/1342320.1342323]
|
| |
30
|
M. Steyvers and T. Griffiths. Probabilistic topic models. Handbook of Latent Semantic Analysis, 2007.
|
| |
31
|
L. Tanabe, N. Xie, L. H. Thom, W. Matten, and W. J. Wilbur. Genetag: a tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 6 Suppl 1, 2005.
|
| |
32
|
B. Taskar, M.-F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. NIPS 2003, 2003.
|
 |
33
|
|
| |
34
|
S. Wasserman and P. Pattison. Logit models and logistic regressions for social networks: I. an introduction to markov graphs and p*. Psychometrika, 1996.
|
 |
35
|
Ding Zhou , Shenghuo Zhu , Kai Yu , Xiaodan Song , Belle L. Tseng , Hongyuan Zha , C. Lee Giles, Learning multiple graphs for document recommendations, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
[doi> 10.1145/1367497.1367517]
|
|