| Mining common topics from multiple asynchronous text streams |
| Full text |
Pdf
(574 KB)
|
| Source
|
Web Search and Web Data Mining
archive
Proceedings of the Second ACM International Conference on Web Search and Data Mining
table of contents
Barcelona, Spain
SESSION: Web mining II
table of contents
Pages 192-201
Year of Publication: 2009
ISBN:978-1-60558-390-7
|
|
Authors
|
|
Xiang Wang
|
Tsinghua University, Beijing, China
|
|
Kai Zhang
|
Tsinghua University, Beijing, China
|
|
Xiaoming Jin
|
Tsinghua University, Beijing, China
|
|
Dou Shen
|
Microsoft Adcenter Labs, One Microsoft Way, Redmond, WA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 35, Downloads (12 Months): 312, Citation Count: 0
|
|
|
ABSTRACT
Text streams are becoming more and more ubiquitous, in the forms of news feeds, weblog archives and so on, which result in a large volume of data. An effective way to explore the semantic as well as temporal information in text streams is topic mining, which can further facilitate other knowledge discovery procedures. In many applications, we are facing multiple text streams which are related to each other and share common topics. The correlation among these streams can provide more meaningful and comprehensive clues for topic mining than those from each individual stream. However, it is nontrivial to explore the correlation with the existence of asynchronism among multiple streams, i.e. documents from different streams about the same topic may have different timestamps, which remains unsolved in the context of topic mining. In this paper, we formally address this problem and put forward a novel algorithm based on the generative topic model. Our algorithm consists of two alternate steps: the first step extracts common topics from multiple streams based on the adjusted timestamps by the second step; the second step adjusts the timestamps of the documents according to the time distribution of the discovered topics by the first step. We perform these two steps alternately and a monotone convergence of our objective function is guaranteed. The effectiveness and advantage of our approach were justified by extensive empirical studies on two real data sets consisting of six research paper streams and two news article streams, respectively.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
D. M. Blei and J. D. Lafferty. Correlated topic models. In NIPS, 2005.
|
 |
3
|
|
| |
4
|
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. In NIPS, pages 601--608, 2001.
|
| |
5
|
|
 |
6
|
|
 |
7
|
|
 |
8
|
|
 |
9
|
|
 |
10
|
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
 |
16
|
Xuanhui Wang , ChengXiang Zhai , Xiao Hu , Richard Sproat, Mining correlated bursty topic patterns from coordinated text streams, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
[doi> 10.1145/1281192.1281276]
|
 |
17
|
|
 |
18
|
|
|