| Joint latent topic models for text and citations |
| Full text |
Pdf
(453 KB)
|
Source
|
International Conference on Knowledge Discovery and Data Mining
archive
Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Las Vegas, Nevada, USA
SESSION: Research papers
table of contents
Pages 542-550
Year of Publication: 2008
ISBN:978-1-60558-193-4
|
|
Authors
|
|
Ramesh M. Nallapati
|
Stanford University, Stanford, CA, USA
|
|
Amr Ahmed
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
Eric P. Xing
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
William W. Cohen
|
Carnegie Mellon University, Pittsburgh, PA, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 34, Downloads (12 Months): 239, Citation Count: 3
|
|
|
ABSTRACT
In this work, we address the problem of joint modeling of text and citations in the topic modeling framework. We present two different models called the Pairwise-Link-LDA and the Link-PLSA-LDA models. The Pairwise-Link-LDA model combines the ideas of LDA [4] and Mixed Membership Block Stochastic Models [1] and allows modeling arbitrary link structure. However, the model is computationally expensive, since it involves modeling the presence or absence of a citation (link) between every pair of documents. The second model solves this problem by assuming that the link structure is a bipartite graph. As the name indicates, Link-PLSA-LDA model combines the LDA and PLSA models into a single graphical model. Our experiments on a subset of Citeseer data show that both these models are able to predict unseen data better than the baseline model of Erosheva and Lafferty [8], by capturing the notion of topical similarity between the contents of the cited and citing documents. Our experiments on two different data sets on the link prediction task show that the Link-PLSA-LDA model performs the best on the citation prediction task, while also remaining highly scalable. In addition, we also present some interesting visualizations generated by each of the models.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Airodi, D. Blei, E. Xing, and S. Fienberg. Mixed membership stochastic block models for relational data, with applications to protein-protein interactions. In International Biometric Society-ENAR Annual Meetings, 2006.
|
| |
2
|
|
| |
3
|
D. Blei and J. Lafferty. Correlated topic models. In Advances in Neural Information Processing Systems, 2006.
|
| |
4
|
|
 |
5
|
|
| |
6
|
D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In Advances in Neural Information Processing Systems 13, 2001.
|
 |
7
|
|
| |
8
|
E. Erosheva, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 101:5220--5227, 2004.
|
| |
9
|
T. Hoffman. Probabilistic Latent Semantic Analysis. In Uncertainty in Artificial Intelligence, 1999.
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
| |
13
|
A. McCallum and K. Nigam. A comparison of event models for Naïve Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.
|
| |
14
|
R. Nallapati and W. Cohen. Link-LDA-PLSA: a new unsupervised technique for topics and influence in blogs. In International Conference for Weblogs and Social Media, 2008.
|
 |
15
|
Ramesh M. Nallapati , Susan Ditmore , John D. Lafferty , Kin Ung, Multiscale topic tomography, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281249]
|
| |
16
|
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. In Technical report, Department of Computer Science, Stanford University, 1998.
|
 |
17
|
|
 |
18
|
|
| |
19
|
B. Taskar, Ming-FaiWong, P. Abbeel, and D. Koller. Link prediction in relational data. In Neural Information Processing Systems, 2003.
|
| |
20
|
M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. In UC Berkeley, Dept. of Statistics, Technical Report, 2003.
|
|