|
ABSTRACT
In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries. Singular-value decomposition is used to decompose a large term by document matrix into 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination; both documents and terms are represented as vectors in a 50- to 150- dimensional space. Queries are represented as pseudo-documents vectors formed from weighted combinations of terms, and documents are ordered by their similarity to the query. Initial tests find this automatic method very promising.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
ATHERTON65
|
Atherton, P. and Borko, H. A test of factor-analytically derived automated classification methods. AIP rept AIP-DRP 65-1, Jan. 1965.
|
 |
BAKER62
|
|
| |
BATES86
|
Bates, M.J. Subject access in online catalogs: A design model. JASIS, 1986, 37 (6), 357-376.
|
 |
BORKO63
|
|
| |
CARROLL70
|
Carroll, J.D. and Chang, J.J. Analysis of individual differences in multidimensional scaling via an N-way generalization of "Eckart-Young" decomposition. Psychometrika, 1970, 35, 283-319.
|
| |
CARROLL80
|
Carroll, J.D. and Arabie, P. Multidimensional scaling. In M.R. Rosenzweig & L.W. Porter (Eds.). Annual Review of Psychology, 1980, 31,607-649.
|
| |
COOMBS64
|
Coombs, C.H. A Theory of Data. New York: Wiley, 1964.
|
| |
DESARBO85
|
Desarbo, W.S., and Carroll, J.D. Three-way metric unfolding via alternating weighted least squares. Psychometrika, 1985, 50(3), 275-300.
|
| |
DUMAIS89
|
Dumais, S.T., Deerwester, S., Furnas, G.W., Landauer T.K., and Harshman, R., Indexing by Latent Structure Analysis. Journal of the American Society for Information Science. 1989, in press.
|
| |
FIDEL85
|
Fidel, R. Individual variability in online searching behavior. In C.A. Parkhurst (Ed.). ASIS'85: Proceedings of the ASIS 48th Annual Meeting, Vol. 22, October 20-24, 1985, Las Vegas, 69-72.
|
| |
FORSYTHE77
|
|
| |
FURNAS80
|
Furnas, G.W. Objects and their features: The metric representation of two-class data. Ph.D. Dissertation. Stanford University, 1980.
|
| |
FURNAS83
|
Furnas, G.W., Landauer, T.K., Dumais, S.T., and Gomez, L.M. Statistical semantics: Analysis of the potential performance of key-word information systems. Bell System Technical Journal, 1983, 62(6), 1753-1806.
|
| |
HARSHMAN70
|
Harshman, R.A. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-modaJ factor analysis. UCLA Work: Papers Phonetics, 1970, 16, 86pp.
|
| |
HARSHMAN84
|
Harshman, R.A. and {.undy, M.E. Data preprocessing and the extended PARA.FAC model. In H.G. Law, C.W. Snyder, Jr., .I.A. Hattie, and R.P. McDonald (Eds.). Resea#'ch Methods for Multimode Data Anatysia, Praeger, 1984b.
|
| |
HARSHMAN84
|
Harshman, R.A. and Lundy, M.E. The PARAFAC model for three-way factor analysis and multi-dimensional scaling. In H.G. Law, C.W. Snyder, Jr., J.A. Hattie, and R.P. McDonald (Eds.). Research Methods for Multimode Data Analysis, Praeger, 1984a.
|
| |
HEISER81
|
Heiser, W.J. Unfolding Analysis of Proximity Data. Leiden, The Netherlands: Reprodienst Psychologic RUL, 1981.
|
| |
JARDIN71
|
Jardin, N. and van Rijsbergen, C.J. The use of hierarchic clustering in infomlation retrieval. Information Storage and Retrieval, 1971, 7, 217-240.
|
 |
KOLL79
|
|
| |
KRUSKAL78
|
Kruskal, J.B. Factor analysis and principal components: Bilinear methods. In H. Ka#askai, J.M. Tanur (Eds.). lnternatiG, nal Encyclopedia of Statistics, New York: Free Press, 1978.
|
| |
LILEY54
|
Liley, O. Evaluation of the subject catalog. American Documentation, 1954, 5(2), 41-60,
|
| |
OSSORIO66
|
Ossorio, P.G. Classification space: A multivariate procedure for automatic document indexing and retrieval. Multivariate Behavioral Research, October 1966, 479-524.
|
| |
SALTON68
|
|
| |
SALTON83
|
|
| |
SPARCK71
|
Sparck Jones, K. Automatic Keyword Classification for Information Retrieval, Buttersworth, London, 1971.
|
| |
SPARCK72
|
Sparck Jones, K. A stati.stical interpretation of term specificity and its applications in retrieval. Journal of Documentation, March 1972, 28(1), 11-21.
|
| |
STREETER87
|
Streeter, L. A, and l_x#hb#um, K. E. An expert/expert-locating system based on aultomatic representation of semantic suucture. Proceedings of the Fourth Conference on Artificial Intelligence Applications. March 14-18, 1987, San Deigo, CA., pp. 345-350.
|
| |
TARR74
|
Tart, D. and Borko, H. Factors influencing inter-indexer consistency. In Proceedings of the ASIS 37th Annual Meeting, Vol. 11, 1974, 50-55.
|
CITED BY 43
|
|
|
|
|
|
|
|
|
|
|
Michael W. Berry , Susan T. Dumais , Todd A. Letsche, Computational Methods for Intelligent Information Access, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), p.20, December 04-08, 1995, San Diego, California, United States
|
|
|
|
|
|
|
|
|
|
|
|
Renato Bulcao Neto , Claudia Akemi Izeki , Maria da Graça Pimentel , Renata Pontin Fortes , Khai Nhut Truong, An open linking service supporting the authoring of web documents, Proceedings of the 2002 ACM symposium on Document engineering, November 08-09, 2002, McLean, Virginia, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abbe Don , Tim Oren , Brenda Laurel, Guides 3.0, Proceedings of the SIGCHI conference on Human factors in computing systems: Reaching through technology, p.447-448, April 27-May 02, 1991, New Orleans, Louisiana, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xia Lin , Dagobert Soergel , Gary Marchionini, A self-organizing semantic map for information retrieval, Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, p.262-269, October 13-16, 1991, Chicago, Illinois, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Alessandra A. Macedo , Laércio Baldochi, Jr. , José A. Camacho-Guerrero , Renan G. Cattelan , Maria Da Pimentel, Automatically linking live experiences captured with a ubiquitous infrastructure, Multimedia Tools and Applications, v.37 n.2, p.93-115, April 2008
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Debra T. Haley , Pete Thomas , Anne De Roeck , Marian Petre, Measuring improvement in latent semantic analysis-based marking systems: using a computer to mark questions about HTML, Proceedings of the ninth Australasian conference on Computing education, p.35-42, January 30-February 02, 2007, Ballarat, Victoria, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
José A. Camacho-Guerrero , Alex A. Carvalho , Maria G. C. Pimentel , Ethan V. Munson , Alessandra A. Macedo, Clustering as an approach to support the automatic definition of semantic hyperlinks, Proceedings of the eighteenth conference on Hypertext and hypermedia, September 10-12, 2007, Manchester, UK
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|