|
ABSTRACT
We consider the problem of analyzing word trajectories in both time and frequency domains, with the specific goal of identifying important and less-reported, periodic and aperiodic words. A set of words with identical trends can be grouped together to reconstruct an event in a completely un-supervised manner. The document frequency of each word across time is treated like a time series, where each element is the document frequency - inverse document frequency (DFIDF) score at one time point. In this paper, we 1) first applied spectral analysis to categorize features for different event characteristics: important and less-reported, periodic and aperiodic; 2) modeled aperiodic features with Gaussian density and periodic features with Gaussian mixture densities, and subsequently detected each feature's burst by the truncated Gaussian approach; 3) proposed an unsupervised greedy event detection algorithm to detect both aperiodic and periodic events. All of the above methods can be applied to time series data in general. We extensively evaluated our methods on the 1-year Reuters News Corpus [3] and showed that they were able to uncover meaningful aperiodic and periodic events.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Apache lucene-core 2.0.0, http://lucene.apache.org.
|
| |
2
|
Google news alerts, http://www.google.com/alerts.
|
| |
3
|
Reuters corpus, http://www.reuters.com/researchandstandards/corpus/.
|
| |
4
|
|
 |
5
|
James Allan , Victor Lavrenko , Hubert Jin, First story detection in TDT is hard, Proceedings of the ninth international conference on Information and knowledge management, p.374-381, November 06-11, 2000, McLean, Virginia, United States
[doi> 10.1145/354756.354843]
|
 |
6
|
|
 |
7
|
|
| |
8
|
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.
|
| |
9
|
|
| |
10
|
Q. He, K. Chang, and E.-P. Lim. A model for anticipatory event detection. In ER, pages 168--181, 2006.
|
| |
11
|
Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature reprensentation for clustering text streams. In SDM, accepted, 2007.
|
 |
12
|
|
| |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
W. D. Penny. Kullback-liebler divergences of normal, gamma, dirichlet and wishart densities. Technical report, 2001.
|
 |
17
|
|
 |
18
|
|
 |
19
|
Michail Vlachos , Christopher Meek , Zografoula Vagena , Dimitrios Gunopulos, Identifying similarities, periodicities and bursts for online search queries, Proceedings of the 2004 ACM SIGMOD international conference on Management of data, June 13-18, 2004, Paris, France
[doi> 10.1145/1007568.1007586]
|
 |
20
|
|
 |
21
|
Yiming Yang , Jian Zhang , Jaime Carbonell , Chun Jin, Topic-conditioned novelty detection, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775150]
|
CITED BY 4
|
|
|
|
|
Canhui Wang , Min Zhang , Liyun Ru , Shaoping Ma, Automatic online news topic ranking using media focus and user attention based on aging theory, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|
|
|
|
|
Theodoros Lappas , Benjamin Arai , Manolis Platakis , Dimitrios Kotsakos , Dimitrios Gunopulos, On burstiness-aware search for document sequences, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France
|
|