|
ABSTRACT
In a wide range of business areas dealing with text data streams, including CRM, knowledge management, and Web monitoring services, it is an important issue to discover topic trends and analyze their dynamics in real-time. Specifically we consider the following three tasks in topic trend analysis: 1)Topic Structure Identification; identifying what kinds of main topics exist and how important they are, 2)Topic Emergence Detection; detecting the emergence of a new topic and recognizing how it grows, 3)Topic Characterization; identifying the characteristics for each of main topics. For real topic analysis systems, we may require that these three tasks be performed in an on-line fashion rather than in a retrospective way, and be dealt with in a single framework. This paper proposes a new topic analysis framework which satisfies this requirement from a unifying viewpoint that a topic structure is modeled using a finite mixture model and that any change of a topic trend is tracked by learning the finite mixture model dynamically. In this framework we propose the usage of a time-stamp based discounting learning algorithm in order to realize real-time topic structure identification. This enables tracking the topic structure adaptively by forgetting out-of-date statistics. Further we apply the theory of dynamic model selection to detecting changes of main components in the finite mixture model in order to realize topic emergence detection. We demonstrate the effectiveness of our framework using real data collected at a help desk to show that we are able to track dynamics of topic trends in a timely fashion.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
|
| |
3
|
|
 |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
Y.Matsunaga and K.Yamanishi: An information-theoretic approach to detecting anomalous behaviors, in Information Technology Letters vol.2 (Proc. of the 2nd Forum on Information Technologies), pp:123--124, (in Japanese) 2003.
|
| |
8
|
G.McLahlan and D.Peel: Finite Mixture Models, Wiley Series in Probability and Statistics, John Wiley and Sons, 2000.
|
| |
9
|
|
| |
10
|
J.Rissanen: Universal coding, information, and estimation, IEEE Trans. on Inform. Theory, 30:629--636, 1984.
|
 |
11
|
|
 |
12
|
|
| |
13
|
K.Yamanishi: A Decision-theoretic Extension of Stochastic Complexity and Its Applications to Learning, IEEE Trans. on Inform. Theory, vol.44/4, pp:1424--1439, 1998.
|
 |
14
|
Kenji Yamanishi , Jun-Ichi Takeuchi , Graham Williams , Peter Milne, On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.320-324, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347160]
|
 |
15
|
|
 |
16
|
Yiming Yang , Jian Zhang , Jaime Carbonell , Chun Jin, Topic-conditioned novelty detection, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775150]
|
CITED BY 15
|
|
|
|
Satoshi Morinaga , Hiroki Arimura , Takahiro Ikeda , Yosuke Sakao , Susumu Akamine, Key semantics extraction by dependency tree mining, Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, August 21-24, 2005, Chicago, Illinois, USA
|
|
Myra Spiliopoulou , Irene Ntoutsi , Yannis Theodoridis , Rene Schult, MONIC: modeling and monitoring cluster transitions, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA
|
|
|
|
|
Shizhu Liu , Yuval Merhav , Wai Gen Yee , Nazli Goharian , Ophir Frieder, A sentence level probabilistic model for evolutionary theme pattern mining from news corpora, Proceedings of the 2009 ACM symposium on Applied Computing, March 08-12, 2009, Honolulu, Hawaii
|
|
|
|
|
|
|
|
|
|
Ding Zhou , Xiang Ji , Hongyuan Zha , C. Lee Giles, Topic evolution and social interactions: how authors effect research, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
Shizhu Liu , Yuval Merhav , Wai Gen Yee , Nazli Goharian , Ophir Frieder, A sentence level probabilistic model for evolutionary theme pattern mining from news corpora, Proceedings of the 2009 ACM symposium on Applied Computing, March 08-12, 2009, Honolulu, Hawaii
|
|
Gabriel Pui Cheong Fung , Jeffrey Xu Yu , Huan Liu , Philip S. Yu, Time-dependent event hierarchy construction, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
Fabian Mörchen , Mathäus Dejori , Dmitriy Fradkin , Julien Etienne , Bernd Wachmann , Markus Bundschus, Anticipating annotations and emerging trends in biomedical literature, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, August 24-27, 2008, Las Vegas, Nevada, USA
|
|
|
|
Chaomei Chen , Jian Zhang , Weizhong Zhu , Michael Vogeley, Delineating the citation impact of scientific discoveries, Proceedings of the 2007 conference on Digital libraries, June 18-23, 2007, Vancouver, BC, Canada
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Inferring constraints from multiple snapshots
ACM Transactions on Graphics (TOG)
12, 4
David Kurlander
, Steven Feiner
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|