ACM Home Page
Please provide us with feedback. Feedback
Classifying news stories using memory based reasoning
Full text PdfPdf (588 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Copenhagen, Denmark
Pages: 59 - 65  
Year of Publication: 1992
ISBN:0-89791-523-2
Authors
Brij Masand  Thinking Machines Corporation, 245 First Street, Cambridge, Massachusetts
Gordon Linoff  Thinking Machines Corporation, 245 First Street, Cambridge, Massachusetts
David Waltz  Thinking Machines Corporation, 245 First Street, Cambridge, Massachusetts and Center for Complex Systems at Brandeis University, Waltham, MA
Sponsors
Royal School of Lib. : Royal School of Lib.
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 53,   Citation Count: 46
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/133160.133177
What is a DOI?

ABSTRACT

We describe a method for classifying news stories using Memory Based Reasoning (MBR) a k-nearest neighbor method), that does not require manual topic definitions. Using an already coded training database of about 50,000 stories from the Dow Jones Press Release News Wire, and SEEKER [Stanfill] (a text retrieval system that supports relevance feedback) as the underlying match engine, codes are assigned to new, unseen stories with a recall of about 80% and precision of about 70%. There are about 350 different codes to be assigned. Using a massively parallel supercomputer, we leverage the information already contained in the thousands of coded stories and are able to code a story in about 2 seconds. Given SEEKER, the text retrieval system, we achieved these results in about two person-months. We believe this approach is effective in reducing the development time to implement classification systems involving large number of topics for the purpose of classification, message routing etc.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Biebricher, Peter; Fuhr, Norbert et al, "The Automatic Indexing System AIR/PHYS -- From Research to Application." Internal report, TH Darmstadt, Department of Computer Science, Darmstadt, Germany.
2
 
3
Dasrathy B. V. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, California (1991).
 
4
 
5
 
6
7
 
8
Lewis, David D., "An Evaluation of Phrasal and Clustered Representation on a Text Categorization Task." University of Chicago, personal communication, manuscript in progress.
9
10
 
11
Stanfill, C. and Waltz, D. L. "The Memory-Based Reasoning Paradigm?' Proc. Case-Based Reasoning Workshop, Clearwater Beach, FL (May 1988), pp. 414-424.
12
 
13
Young, Sheryl R., Hayes, Philip J., "Automatic Classification and Summarization of Banking Telexes." Proceedings of the Second IEEE Conference on AI Applications, 1985, Miami Beach, FL.
 
14
Waltz, D. L. "Memory-Based Reasoning." In M.A. Arbib and J.A. Robinson (eds), Natural and Artificial Parallel Computation, The MIT Press, Cambridge, Mass., (1990), pp. 251-276.

CITED BY  46

Collaborative Colleagues:
Brij Masand: colleagues
Gordon Linoff: colleagues
David Waltz: colleagues