ACM Home Page
Please provide us with feedback. Feedback
Mining correlated bursty topic patterns from coordinated text streams
Full text PdfPdf (941 KB)
Source
International Conference on Knowledge Discovery and Data Mining archive
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
San Jose, California, USA
SESSION: Research track papers table of contents
Pages: 784 - 793  
Year of Publication: 2007
ISBN:978-1-59593-609-7
Authors
Xuanhui Wang  UIUC
ChengXiang Zhai  UIUC
Xiao Hu  UIUC
Richard Sproat  UIUC
Sponsors
ACM: Association for Computing Machinery
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 27,   Downloads (12 Months): 271,   Citation Count: 5
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1281192.1281276
What is a DOI?

ABSTRACT

Previous work on text mining has almost exclusively focused on a single stream. However, we often have available multiple text streams indexed by the same set of time points (called coordinated text streams), which offer new opportunities for text mining. For example, when a major event happens, all the news articles published by different agencies in different languages tend to cover the same event for a certain period, exhibiting a correlated bursty topic pattern in all the news article streams. In general, mining correlated bursty topic patterns from coordinated text streams can reveal interesting latent associations or events behind these streams. In this paper, we define and study this novel text mining problem. We propose a general probabilistic algorithm which can effectively discover correlated bursty patterns and their bursty periods across text streams even if the streams have completely different vocabularies (e.g., English vs Chinese). Evaluation of the proposed method on a news data set and a literature data set shows that it can effectively discover quite meaningful topic patterns from both data sets: the patterns discovered from the news data set accurately reveal the major common events covered in the two streams of news articles (in English and Chinese, respectively), while the patterns discovered from two database publication streams match well with the major research paradigm shifts in database research. Since the proposed method is general and does not require the streams to share vocabulary, it can be applied to any coordinated text streams to discover correlated topic patterns that burst in multiple streams in the same period.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998.
5
 
6
7
 
8
 
9
10
11
12
13
14
15
16
 
17
18
19
20
21
22
23
24
25


Collaborative Colleagues:
Xuanhui Wang: colleagues
ChengXiang Zhai: colleagues
Xiao Hu: colleagues
Richard Sproat: colleagues