ACM Home Page
Please provide us with feedback. Feedback
Extracting significant time varying features from text
Full text PdfPdf (971 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the eighth international conference on Information and knowledge management table of contents
Kansas City, Missouri, United States
Pages: 38 - 45  
Year of Publication: 1999
ISBN:1-58113-146-1
Authors
Russell Swan  Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts
James Allan  Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts
Sponsors
SIGART: ACM Special Interest Group on Artificial Intelligence
SIGIR: ACM Special Interest Group on Information Retrieval
SIGMIS: ACM Special Interest Group on Management Information Systems
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 59,   Citation Count: 25
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/319950.319956
What is a DOI?

ABSTRACT

We propose a simple statistical model for the frequency of occurrence of features in a stream of text. Adoption of this model allows us to use classical significance tests to filter the stream for interesting events. We tested the model by building a system and running it on a news corpus. By a subjective evaluation, the system worked remarkably well: almost all of the groups of identified tokens corresponded to news stories and were appropriately placed in time. A preliminary objective evaluation was also used to measure the quality of the system and it showed some of the weaknesses and the power of our approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Facts on File, 1996. Facts on File, New York, 1997.
 
2
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News 2kanscription and Understanding Workshop, pages 194-218, 1998.
 
3
R. B. Allen. Timelines as information system interfaces. In Proceedings International Symposium on Digital Libraries, pages 175-180, Tsukuba, Japan, 1995.
 
4
Yvonne M. M. Bishop, Stephen E. Feinberg, and Paul W. Holland. Discrete multivariate analysis: theory and practice. MIT Press, Cambridge, Massachusetts, 1974.
 
5
Ido Dagan and Ronen Feldman. Keyword-based browsing and analysis of large document sets. In Proceedings o/the Symposium on Document Analysis and Information Retrieval (SDAiR-96), Las Vegas, Nevada, 1996.
 
6
 
7
Robin L. Kultberg. Dynamic timelines: Visualizing historical information in three dimensions. Master's thesis, Massachusetts Institute of Technology Media Laboratory, 1995.
 
8
 
9
Ron Papka, James Allan, and Victor Lavrenko. Umass approaches to detection and tracking at TDT2. In Proceedings of the DARPA Broadcast Workshop, 1999.
10
11
 
12

CITED BY  25

Collaborative Colleagues:
Russell Swan: colleagues
James Allan: colleagues