|
ABSTRACT
Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations of themes over different contexts. Since the topics covered in a document are usually related to the context of the document, analyzing topical themes within context can potentially reveal many interesting theme patterns. In this paper, we generalize some of these models proposed in the previous work and we propose a new general probabilistic model for contextual text mining that can cover several existing models as special cases. Specifically, we extend the probabilistic latent semantic analysis (PLSA) model by introducing context variables to model the context of a document. The proposed mixture model, called contextual probabilistic latent semantic analysis (CPLSA) model, can be applied to many interesting mining tasks, such as temporal text mining, spatiotemporal text mining, author-topic analysis, and cross-collection comparative analysis. Empirical experiments show that the proposed mixture model can discover themes and their contextual variations effectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, 1998.
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
| |
5
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statist. Soc. B, 39:1--38, 1977.
|
| |
6
|
T. L. Griffiths and M. Steyvers. Fiding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl.1):5228--5235, 2004.
|
| |
7
|
T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of UAI'99.
|
 |
8
|
|
 |
9
|
|
| |
10
|
A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
Ramesh Nallapati , Ao Feng , Fuchun Peng , James Allan, Event threading within news topics, Proceedings of the thirteenth ACM international conference on Information and knowledge management, November 08-13, 2004, Washington, D.C., USA
[doi> 10.1145/1031171.1031258]
|
| |
16
|
|
 |
17
|
Mark Steyvers , Padhraic Smyth , Michal Rosen-Zvi , Thomas Griffiths, Probabilistic author-topic models for information discovery, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
[doi> 10.1145/1014052.1014087]
|
 |
18
|
|
CITED BY 15
|
|
Qiaozhu Mei , Xu Ling , Matthew Wondra , Hang Su , ChengXiang Zhai, Topic sentiment mixture: modeling facets and opinions in weblogs, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xuanhui Wang , ChengXiang Zhai , Xiao Hu , Richard Sproat, Mining correlated bursty topic patterns from coordinated text streams, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Xu Ling , Jing Jiang , Xin He , Qiaozhu Mei , Chengxiang Zhai , Bruce Schatz, Generating gene summaries from biomedical literature: A study of semi-structured summarization, Information Processing and Management: an International Journal, v.43 n.6, p.1777-1791, November, 2007
|
|
|
|
|
|
Xu Ling , Qiaozhu Mei , ChengXiang Zhai , Bruce Schatz, Mining multi-faceted overviews of arbitrary topics in a text collection, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Youngho Kim , Yingshi Tian , Yoonjae Jeong , Ryu Jihee , Sung-Hyon Myaeng, Automatic discovery of technology trends from patent text, Proceedings of the 2009 ACM symposium on Applied Computing, March 08-12, 2009, Honolulu, Hawaii
|
|