|
ABSTRACT
In this paper, we consider the problem of combining link and content analysis for community detection from networked data, such as paper citation networks and Word Wide Web. Most existing approaches combine link and content information by a generative model that generates both links and contents via a shared set of community memberships. These generative models have some shortcomings in that they failed to consider additional factors that could affect the community memberships and isolate the contents that are irrelevant to community memberships. To explicitly address these shortcomings, we propose a discriminative model for combining the link and content analysis for community detection. First, we propose a conditional model for link analysis and in the model, we introduce hidden variables to explicitly model the popularity of nodes. Second, to alleviate the impact of irrelevant content attributes, we develop a discriminative model for content analysis. These two models are unified seamlessly via the community memberships. We present efficient algorithms to solve the related optimization problems based on bound optimization and alternating projection. Extensive experiments with benchmark data sets show that the proposed framework significantly outperforms the state-of-the-art approaches for combining link and content analysis for community detection.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
]]E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic block models for relational data with application to protein-protein interactions. In IBS, 2006.
|
| |
3
|
]]J. Baumes, M. Goldberg, and M. Magdon-ismail. Efficient identification of overlapping communities. In IEEE ISI, 2005.
|
| |
4
|
|
| |
5
|
|
| |
6
|
]]A. Clauset, M. E. J. Newman, and C. Moore. Finding community structure in very large networks. Phy. Rev. E, 70, 2004.
|
| |
7
|
|
| |
8
|
]]D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS, 2001.
|
 |
9
|
|
| |
10
|
]]E. Erosheva, S. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. PNAS, 101, 2004.
|
| |
11
|
|
| |
12
|
|
| |
13
|
]]A. Gruber, M. Rosen-Zvi, and Y. Weiss. Latent topic models for hypertext. In UAI, 2008.
|
| |
14
|
]]J. M. Hofman and C. H. Wiggins. A Bayesian approach to network modularity. Phy. Rev. Letters, 100, 2008.
|
 |
15
|
|
| |
16
|
|
 |
17
|
|
| |
18
|
]]S. Lacoste-Julien, F. Sha, and M. I. Jordan. DiscLDA: Discriminative learning for dimensionality reduction and classification. In NIPS, 2008.
|
| |
19
|
]]A. McCallum and K. Nigam. A comparisoin of event models for naive bayes text classification. AAAI Workshop, 1998.
|
| |
20
|
|
 |
21
|
Ramesh M. Nallapati , Amr Ahmed , Eric P. Xing , William W. Cohen, Joint latent topic models for text and citations, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
[doi> 10.1145/1401890.1401957]
|
| |
22
|
]]M. E. J. Newman. Fast algorithm for detecting community structure in networks. Phy. Rev. E, 69, 2004.
|
| |
23
|
]]M. E. J. Newman. Modularity and community structure in networks. PNAS, 103, 2006.
|
| |
24
|
]]M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phy. Rev. E, 69, 2003.
|
| |
25
|
]]M. E. J. E. Newman and E. A. A. Leicht. Mixture models and exploratory analysis in networks. PNAS, 104, 2007.
|
| |
26
|
]]A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.
|
| |
27
|
]]K. Nowicki and T. A. B. Snijders. Estimation and prediction for stochastic blockstructures. J. of ASA, 96, 2001.
|
| |
28
|
]]L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. In Technical report, Stanford Digital Library Technologies Project, Stanford University, Stanford, CA, USA, 1998.
|
| |
29
|
]]G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435, 2005.
|
| |
30
|
]]W. Ren, G. Yan, X. Liao, and L. Xiao. Simple probabilistic algorithm for detecting community structure. Phy. Rev. E.
|
| |
31
|
|
| |
32
|
]]X. Wang, N. Mohanty, and A. McCallum. Group and topic discovery from relations and their attributes. In NIPS, 2005.
|
| |
33
|
]]K. Yu, S. Yu, and V. Tresp. Soft clustering on graphs. In NIPS, 2005.
|
| |
34
|
]]S. Yu, B. D. Moor, and Y. Moreau. Clustering by heterogeneous data fusion: framework and applications. NIPS workshop, 2009.
|
 |
35
|
|
|