| Topic modeling with network regularization |
| Full text |
Pdf
(717 KB)
|
Source
|
International World Wide Web Conference
archive
Proceeding of the 17th international conference on World Wide Web
table of contents
Beijing, China
SESSION: Data mining: modeling
table of contents
Pages 101-110
Year of Publication: 2008
ISBN:978-1-60558-085-2
|
|
Authors
|
|
Qiaozhu Mei
|
University of Illinois at Urbana-Champaign, Urbana, IL, USA
|
|
Deng Cai
|
University of Illinois at Urbana-Champaign, Urbana, IL, USA
|
|
Duo Zhang
|
University of Illinois at Urbana-Champaign, Urbana, IL, USA
|
|
ChengXiang Zhai
|
University of Illinois at Urbana-Champaign, Urbana, IL, USA
|
|
| Sponsor |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 36, Downloads (12 Months): 266, Citation Count: 11
|
|
|
ABSTRACT
In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to this problem, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data. The proposed method bridges topic modeling and social network analysis, which leverages the power of both statistical topic models and discrete regularization. The output of this model well summarizes topics in text, maps a topic on the network, and discovers topical communities. With concrete selection of a topic model and a graph-based regularizer, our model can be applied to text mining problems such as author-topic analysis, community discovery, and spatial text mining. Empirical experiments on two different genres of data show that our approach is effective, which improves text-oriented methods as well as network-oriented methods. The proposed model is general; it can be applied to any text collections with a mixture of topics and an associated network structure.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Lars Backstrom , Dan Huttenlocher , Jon Kleinberg , Xiangyang Lan, Group formation in large social networks: membership, growth, and evolution, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
[doi> 10.1145/1150402.1150412]
|
 |
2
|
|
| |
3
|
|
| |
4
|
|
| |
5
|
S. Borgatti, M. Everett, and L. Freeman. Ucinet for windows: Software for social network analysis. Harvard: Analytic Technologies, 2002.
|
 |
6
|
Pak K. Chan , Martine D. F. Schlag , Jason Y. Zien, Spectral K-way ratio-cut partitioning and clustering, Proceedings of the 30th international conference on Design automation, p.749-754, June 14-18, 1993, Dallas, Texas, United States
[doi> 10.1145/157485.165117]
|
| |
7
|
D. A. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS, 2000.
|
| |
8
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.
|
 |
9
|
Daniel Gruhl , R. Guha , David Liben-Nowell , Andrew Tomkins, Information diffusion through blogspace, Proceedings of the 13th international conference on World Wide Web, May 17-20, 2004, New York, NY, USA
[doi> 10.1145/988672.988739]
|
 |
10
|
|
 |
11
|
|
| |
12
|
|
 |
13
|
Jure Leskovec , Jon Kleinberg , Christos Faloutsos, Graphs over time: densification laws, shrinking diameters and possible explanations, Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
[doi> 10.1145/1081870.1081893]
|
| |
14
|
J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading behavior in large blog graphs. In Proceeding of SDM '07, 2007.
|
 |
15
|
|
 |
16
|
Bo Long , Zhongfei (Mark) Zhang , Xiaoyun Wú , Philip S. Yu, Spectral clustering for multi-type relational data, Proceedings of the 23rd international conference on Machine learning, p.585-592, June 25-29, 2006, Pittsburgh, Pennsylvania
[doi> 10.1145/1143844.1143918]
|
| |
17
|
A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and role discovery in social networks. In IJCAI, pages 786--791, 2005.
|
| |
18
|
G. J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley, 1997.
|
 |
19
|
|
 |
20
|
|
| |
21
|
|
| |
22
|
M. Richardson and P. Domingos. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In NIPS, pages 1441--1448, 2002.
|
| |
23
|
Michal Rosen-Zvi , Thomas Griffiths , Mark Steyvers , Padhraic Smyth, The author-topic model for authors and documents, Proceedings of the 20th conference on Uncertainty in artificial intelligence, p.487-494, July 07-11, 2004, Banff, Canada
|
| |
24
|
|
| |
25
|
L. Si and R. Jin. Adjusting mixture weights of gaussian mixture model via regularized probabilistic latent semantic analysis. In PAKDD, pages 622--631, 2005.
|
 |
26
|
Mark Steyvers , Padhraic Smyth , Michal Rosen-Zvi , Thomas Griffiths, Probabilistic author-topic models for information discovery, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014087]
|
 |
27
|
|
 |
28
|
|
| |
29
|
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In NIPS, 2004.
|
 |
30
|
Ding Zhou , Xiang Ji , Hongyuan Zha , C. Lee Giles, Topic evolution and social interactions: how authors effect research, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
[doi> 10.1145/1183614.1183653]
|
 |
31
|
Ding Zhou , Eren Manavoglu , Jia Li , C. Lee Giles , Hongyuan Zha, Probabilistic models for discovering e-communities, Proceedings of the 15th international conference on World Wide Web, May 23-26, 2006, Edinburgh, Scotland
[doi> 10.1145/1135777.1135807]
|
| |
32
|
D. Zhou and B. Schölkopf. Discrete regularization. Semi-supervised learning, pages 221--232, 2006.
|
| |
33
|
X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.
|
 |
34
|
|
CITED BY 11
|
|
Yun Chi , Shenghuo Zhu , Yihong Gong , Yi Zhang, Probabilistic polyadic factorization and its application to personalized recommendation, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
|
|
|
Xu Ling , Qiaozhu Mei , ChengXiang Zhai , Bruce Schatz, Mining multi-faceted overviews of arbitrary topics in a text collection, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
Deng Cai , Xuanhui Wang , Xiaofei He, Probabilistic dyadic data analysis with local and global consistency, Proceedings of the 26th Annual International Conference on Machine Learning, p.105-112, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Yan Liu , Alexandru Niculescu-Mizil , Wojciech Gryc, Topic-link LDA: joint models of topic and author community, Proceedings of the 26th Annual International Conference on Machine Learning, p.665-672, June 14-18, 2009, Montreal, Quebec, Canada
|
|
|
Munmun De Choudhury , Hari Sundaram , Ajita John , Dorée Duncan Seligmann, What makes conversations interesting?: themes, participants and consequences of conversations in online social media, Proceedings of the 18th international conference on World wide web, April 20-24, 2009, Madrid, Spain
|
|
|
|
|
|
|
|
|
|
|
|
Yun Chi , Shenghuo Zhu , Koji Hino , Yihong Gong , Yi Zhang, iOLAP: A framework for analyzing the internet, social networks, and other networked data, IEEE Transactions on Multimedia, v.11 n.3, p.372-382, April 2009
|
|