|
ABSTRACT
When an event occurs, it attracts attention of information sources to publish related documents along its lifespan. The task of event detection is to automatically identify events and their related documents from a document stream, which is a set of chronologically ordered documents collected from various information sources. Generally, each event has a distinct activeness development so that its status changes continuously during its lifespan. When an event is active, there are a lot of related documents from various information sources. In contrast when it is inactive, there are very few documents, but they are focused. Previous works on event detection did not consider the characteristics of the event's activeness, and used rigid thresholds for event detection. We propose a concept called life profile, modeled by a hidden Markov model, to model the activeness trends of events. In addition, a general event detection framework, LIPED, which utilizes the learned life profiles and the burst-and-diverse characteristic to adjust the event detection thresholds adaptively, can be incorporated into existing event detection methods. Based on the official TDT corpus and contest rules, the evaluation results show that existing detection methods that incorporate LIPED achieve better performance in the cost and F1 metrics, than without.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
Aizen, J., Huttenlocher, D., Kleinberg, J., and Novak, A. 2004. Traffic-based feedback on the web. In Proc. Nat. Acad. Sci. 101, 525--5260.
|
 |
3
|
|
| |
4
|
Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y. 1998b. Topic detection and tracking pilot study: final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. 194--218.
|
| |
5
|
Allan, J., Lavrenko, V., Frey, D., and Khandelwal, V. 2000. Proceeding of the TDT Workshop.
|
| |
6
|
|
| |
7
|
Barlas, Y. and Kanar, K. 1999. A dynamic pattern-oriented test for model validation. In Proceedings of 4th Systems Science European Congress. 269--286.
|
| |
8
|
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 164--171.
|
| |
9
|
Chen, C. C., Chen, Y. T., Sun Y., and Chen, M. C. 2003. Life cycle modeling of news events using aging theory. In Proceedings of the 14th European Conference on Machine Learning. 47--59.
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Series B 39, 1--38.
|
| |
14
|
|
| |
15
|
|
| |
16
|
|
 |
17
|
Martin Franz , Todd Ward , J. Scott McCarley , Wei-Jing Zhu, Unsupervised and supervised clustering for topic tracking, Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, p.310-317, September 2001, New Orleans, Louisiana, United States
[doi> 10.1145/383952.384013]
|
 |
18
|
Venkatesh Ganti , Johannes Gehrke , Raghu Ramakrishnan, A framework for measuring changes in data characteristics, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.126-137, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303989]
|
| |
19
|
Ghahramani, S. 2000. Fundamentals of Probability. Prentice Hall.
|
 |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
| |
24
|
|
 |
25
|
|
| |
26
|
|
| |
27
|
Markov, A. A. 1913. An example of statistical investigation in the text of ‘Eugene Onyegin’ illustrating coupling of 'tests' in chains. In Proc. Acad. Sci. 7, 153--162.
|
| |
28
|
Martin, A., Doddington, G., Kamm, T., Ordowski, M., and Przybocki, M. 1997. The DET curve in assessment of detection task performance. In Proc. EuroSpeech, 4, 1985--1898.
|
| |
29
|
|
| |
30
|
Myers, C., Rabiner, L. R., and Rosenberg, A. E. 1980. Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Acoust. Speech Signal Proc. 28, 6, 623--635.
|
| |
31
|
Papka, R. 1999. P.h.D thesis, Department of Computer Science, University of Massachusetts.
|
| |
32
|
|
| |
33
|
|
| |
34
|
Rocchio, J. J. 1971. Relevance feedback in information retrieval, In The SMART Retrieval System, Prentice Hall, 313--323.
|
| |
35
|
|
| |
36
|
Silverman, B. 1986. Density Estimation for Statistics and Data Analysis. Chapman and Hall.
|
| |
37
|
Viterbi, A. J. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inform. Theory IT-13, 1260-1269.
|
 |
38
|
|
| |
39
|
Zhang, J., Ghahramani, Z., and Yang, Y. 2004. A probabilistic model for online document clustering with application to novelty detection. In Proceedings of the Conference on Neural Information Processing System. 1617--1624.
|
|