| Extracting significant time varying features from text |
| Full text |
Pdf
(971 KB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eighth international conference on Information and knowledge management
table of contents
Kansas City, Missouri, United States
Pages: 38 - 45
Year of Publication: 1999
ISBN:1-58113-146-1
|
|
Authors
|
|
Russell Swan
|
Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts
|
|
James Allan
|
Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 59, Citation Count: 25
|
|
|
ABSTRACT
We propose a simple statistical model for the frequency of occurrence of features in a stream of text. Adoption of this model allows us to use classical significance tests to filter the stream for interesting events. We tested the model by building a system and running it on a news corpus. By a subjective evaluation, the system worked remarkably well: almost all of the groups of identified tokens corresponded to news stories and were appropriately placed in time. A preliminary objective evaluation was also used to measure the quality of the system and it showed some of the weaknesses and the power of our approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Facts on File, 1996. Facts on File, New York, 1997.
|
| |
2
|
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News 2kanscription and Understanding Workshop, pages 194-218, 1998.
|
| |
3
|
R. B. Allen. Timelines as information system interfaces. In Proceedings International Symposium on Digital Libraries, pages 175-180, Tsukuba, Japan, 1995.
|
| |
4
|
Yvonne M. M. Bishop, Stephen E. Feinberg, and Paul W. Holland. Discrete multivariate analysis: theory and practice. MIT Press, Cambridge, Massachusetts, 1974.
|
| |
5
|
Ido Dagan and Ronen Feldman. Keyword-based browsing and analysis of large document sets. In Proceedings o/the Symposium on Document Analysis and Information Retrieval (SDAiR-96), Las Vegas, Nevada, 1996.
|
| |
6
|
|
| |
7
|
Robin L. Kultberg. Dynamic timelines: Visualizing historical information in three dimensions. Master's thesis, Massachusetts Institute of Technology Media Laboratory, 1995.
|
| |
8
|
|
| |
9
|
Ron Papka, James Allan, and Victor Lavrenko. Umass approaches to detection and tracking at TDT2. In Proceedings of the DARPA Broadcast Workshop, 1999.
|
 |
10
|
Catherine Plaisant , Brett Milash , Anne Rose , Seth Widoff , Ben Shneiderman, LifeLines: visualizing personal histories, Proceedings of the SIGCHI conference on Human factors in computing systems: common ground, p.221-ff., April 13-18, 1996, Vancouver, British Columbia, Canada
[doi> 10.1145/238386.238493]
|
 |
11
|
|
| |
12
|
|
CITED BY 25
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xuanhui Wang , ChengXiang Zhai , Xiao Hu , Richard Sproat, Mining correlated bursty topic patterns from coordinated text streams, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
Gabriel Pui Cheong Fung , Jeffrey Xu Yu , Huan Liu , Philip S. Yu, Time-dependent event hierarchy construction, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
Canhui Wang , Min Zhang , Liyun Ru , Shaoping Ma, Automatic online news topic ranking using media focus and user attention based on aging theory, Proceeding of the 17th ACM conference on Information and knowledge management, October 26-30, 2008, Napa Valley, California, USA
|
|