ACM Home Page
Please provide us with feedback. Feedback
An example-based mapping method for text categorization and retrieval
Full text PdfPdf (1.78 MB)
Source ACM Transactions on Information Systems (TOIS) archive
Volume 12 ,  Issue 3  (July 1994) table of contents
Pages: 252 - 277  
Year of Publication: 1994
ISSN:1046-8188
Authors
Yiming Yang  Mayo Clinic
Christopher G. Chute  Mayo Clinic
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 110,   Citation Count: 60
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/183422.183424
What is a DOI?

ABSTRACT

A unified model for text categorization and text retrieval is introduced. We use a training set of manually categorized documents to learn word-category associations, and use these associations to predict the categories of arbitrary documents. Similarly, we use a training set of queries and their related documents to obtain empirical associations between query words and indexing terms of documents, and use these associations to predict the related documents of arbitrary queries. A Linear Least Squares Fit (LLSF) technique is employed to estimate the likelihood of these associations. Document collections from the MEDLINE database and Mayo patient records are used for studies on the effectiveness of our approach, and on how much the effectiveness depends on the choices of training data, indexing language, word-weighting scheme, and morphological canonicalization. Alternative methods are also tested on these data collections for comparison. It is evident that the LLSF approach uses the relevance information effectively within human decisions of categorization and retrieval, and achieves a semantic mapping of free texts to their representations in an indexing language. Such a semantic mapping lead to a significant improvement in categorization and retrieval, compared to alternative approaches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
CHUTE, C. G., AND YANG, Y. 1992. An evaluatmn of concept based Latent Semantic Indexing for clinical information retrieval. In Proceedings of the 16th Annual Symposzum on Computer Applications ~n Medical Care, vol. 16. McGraw-HilL New York, 639-643.
 
2
CPHA. 1986. International Classifice, tion of Dzseases. 9th Rev. Clinical Modifications. Commission on Professional and Hospital Activities. Ann Arbor, Mich.
 
3
DEERWESTER, S., DUMAIS, S. T., FURNAS, G.W., LANDAUER, T. K., AND HARSHMAN, R. 1990. Indexing by Latent Semantic analysis. J. Am. Soc. Inf. Sci. 41, 6, 391-407.
 
4
DONGARRA, J. J., MOLER, C. B., BUNCH, J. R., AND STEWART, C.W. 1979. LINPACK Users' Guide. SIAM, Philadelphia, Pa.
 
5
DSC. 1991. M++ Class Library, User Guide. Rel. 3. Dyad Software Corporation, Bellevue, Wash.
 
6
EVANS, D. A., CHUTE, C. G., HANDERSON, S. K., YANG, Y., MONARCH, I. A., AND HERSH, W. R. 1992. Mapping vocabularies using "Latent Semantics." In MEDINFO 92. 1462-1468.
 
7
EVANS, D. A., HERSH, W. R., MONARCH, I. A., LEFFERTS, R. G., AND HANDERSON, S.K. 1991. Automatic indexing of abstracts via natural-language processing using a simple thesaurus. Medical Decision Making 11, 4, 108-115.
8
 
9
FUHR, N., ET AL. 1991. AIR/X--a rule-based multistage indexing systems for large subject fields. In Proceedings of the RIAO'91. 606-623.
 
10
 
11
HAYNES, R., McKSBBON, K., WALKER, C., RYAN, N., FITZGERALD, D., AND RAMSDEN, M. 1990. Online access to MEDLINE in clinical settings. Ann. Int. Med. 112, 1, 78 84.
 
12
HERSH, W. R., HICKAM, D. H., AND LEONE, T.J. 1992. Words, concepts, or both: Optimal indexing units for automated information retrieval. In Proceedings of the 16th Annual Symposium on Computer AppDcations in Medical Core, voL 16. McGraw-Hill, New York, 644 648.
 
13
LAWSON, C. L., AND HANSON, R. J. 1974. Solving Least Squares Problems. Prentice-Hall, Englewood Cliffs~ N.J.
 
14
 
15
NLM. 1993. Medical Subject Headings (MESH). National Library of Medicine, Bethesda, Md.
 
16
SALTON, G. 1991. Development in automatic text retrieval. Science 253, 974-980.
 
17
 
18
SALTON, G., AND BUCKLEY, C. 1990. Improving retrieval performance by relevance feedback. J. Am. Soc. Inf. Sci. 41, 4, 288-297.
19
 
20
YANG, Y., AND CHUTE, C.G. 1993b. Words or concepts: The features of indexing units and their optimal use in information retrieval. In Proceedings of the 17th Annual Symposium on Computer Apphcations tn Medical Cure, vol. 17. McGraw-Hill, New York, 685-689.
 
21

CITED BY  60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Collaborative Colleagues:
Yiming Yang: colleagues
Christopher G. Chute: colleagues

Peer to Peer - Readers of this Article have also read: