ACM Home Page
Please provide us with feedback. Feedback
Intelligent information triage
Full text PdfPdf (658 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
New Orleans, Louisiana, United States
Pages: 318 - 326  
Year of Publication: 2001
ISBN:1-58113-331-6
Authors
Sofus A. Macskassy  Rutgers Univ., Piscataway, NJ
Foster Provost  NYU Stern School of Business, New York, NY
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 68,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/383952.384015
What is a DOI?

ABSTRACT

In many applications, large volumes of time-sensitive textual information require triage: rapid, approximate prioritization for subsequent action. In this paper, we explore the use of prospective indications of the importance of a time-sensitive document, for the purpose of producing better document filtering or ranking. By prospective, we mean importance that could be assessed by actions that occur in the future. For example, a news story may be assessed (retrospectively) as being important, based on events that occurred after the story appeared, such as a stock price plummeting or the issuance of many follow-up stories. If a system could anticipate (prospectively) such occurrences, it could provide a timely indication of importance. Clearly, perfect prescience is impossible. However, sometimes there is sufficient correlation between the content of an information item and the events that occur subsequently. We describe a process for creating and evaluating approximate information-triage procedures that are based on prospective indications. Unlike many information-retrieval applications for which document labeling is a laborious, manual process, for many prospective criteria it is possible to build very large, labeled, training corpora automatically. Such corpora can be used to train text classification procedures that will predict the (prospective) importance of each document. This paper illustrates the process with two case studies, demonstrating the ability to predict whether a news story will be followed by many, very similar news stories, and also whether the stock price of one or more companies associated with a news story will move significantly following the appearance of that story. We conclude by discussing how the comprehensibility of the learned classifiers can be critical to success.}


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report. In Proceedings of the Broadcast News Understanding and Transcription Workshop, pages 194-218, 1998.
2
 
3
J. Allan, V. Lavrenko, and R. Papka. Event tracking. CIIR Technical Report IR-128, University of Massachusetts Computer Science Department, 1998.
4
 
5
W. W. Cohen. Fast effective rule induction. In Proceedings of the Twelfth International Conference on Machine Learning, 1995.
 
6
W. W. Cohen. Learning trees and rules with set-valued features. In Proceedings of the National Conference on Artificial Intelligence, 1996.
 
7
M. W. Craven and J. W. Shavlik. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems, pages 24-30, 1996.
 
8
A. Danyluk and F. Provost. Small disjuncts in action: Learning to diagnose errors in the telephone network local loop. In Proceedings of the Tenth International Conference on Machine Learning, 1993.
 
9
 
10
P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the 13th International Conference on Machine Learning, pages 105-112, 1996.
11
12
 
13
D. J. Hand. Construction and Assessment of Classification Rules. Chichester:John Wiley and Sons, 1997.
 
14
E. M. Houseman and D. E. Kaskela. State of the art of selective dissemination of information. IEEE Transactions on Engineering Writing and Speech, 13(2):78-83, 1970.
 
15
R. B. T. II, C. Olsen, and J. R. Dietrich. Attributes of news about firms: An analysis of firm-specific new reported in the wall street journal index. Journal of Accounting Research, 25(2), 1987.
 
16
17
18
 
19
A. Martin, G. Doddington, T. Kamm, , M. Ordowski, and M. Przybocki. The DET curve in assessment of detection task performance. In Proceedings EuroSpeech, volume4, pages 1895-1898, 1997.
 
20
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.
 
21
D. J. Mostow. Machine transformation of advice into a heuristic search procedure. In Machine Learning: An Artificial Intelligence Approach, pages 367-403. Morgan Kaufmann, 1983.
 
22
 
23
F. Provost and T. Fawcett. Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pages 445-453, 1997.
 
24
 
25
J. Rocchio. Relevance feedback in information retrieval. In Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, chapter 14, pages 313-323. Prentice-Hall, 1971.
 
26
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41:288-297, 1990.
 
27
28
 
29
J. Swets. Measuring the accuracy of diagnostic systems. Science, 240:1285-1293, 1988.
 
30
F. Walls, H. Jin, S. Sista, and R. Schwartz. Probabilistic models for topic detection and tracking. In IEEE International Conference On Acoustics, Speech and Signal Processing, 1999.
 
31
J. P. Yamron, L. Gillick, S. Knecht, S. Lowe, and P. van Mulbregt. Statistical models for tracking and detection. In Working notes of the DARPA TDT-3 Workshop, 2000.
32
 
33
34


Collaborative Colleagues:
Sofus A. Macskassy: colleagues
Foster Provost: colleagues