ACM Home Page
Please provide us with feedback. Feedback
Generation and search of clustered files
Full text PdfPdf (1.78 MB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 3 ,  Issue 4  (December 1978) table of contents
Pages: 321 - 346  
Year of Publication: 1978
ISSN:0362-5915
Authors
G. Salton  Cornell Univ., Ithaca, NY
A. Wong  Cornell Univ., Ithaca, NY
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 47,   Citation Count: 29
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Request Permissions Request Permissions    Review this Article  
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/320289.320291
What is a DOI?

ABSTRACT

A classified, or clustered file is one where related, or similar records are grouped into classes, or clusters of items in such a way that all items within a cluster are jointly retrievable. Clustered files are easily adapted to broad and narrow search strategies, and simple file updating methods are available. An inexpensive file clustering method applicable to large files is given together with appropriate file search methods. An abstract model is then introduced to predict the retrieval effectiveness of various search methods in a clustered file environment. Experimental evidence is included to test the versatility of the model and to demonstrate the role of various parameters in the cluster search process.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
ANDERBERG, M.R. Cluster Analysis for Applications. Academic Press, New York, 1973.
2
3
 
4
BAYER, R. Symmetric binary B-trees: data structure and maintenance algorithms. Acta Infor. matica 1, 4 (1972), 290-306.
 
5
BAYER, R., AND MCCREmHT, E. Organization and maintenance of large ordered indices. Acta Informatica 1, 3 (1972), 173-189.
 
6
BORKO, H. Research in automatic generation of classification systems. Proc. AFIPS 1964 SJCC, Vol. 25, AFIPS Press, Montvale, N.J., 1964, pp. 529-536.
7
 
8
CROFT, W.B. Clustering large files of documents using the single link method. Comptr. Lab. Rep., Cambridge U., Cambridge, England, 1977.
 
9
CROVCH, D.M. Cluster analysis: bibliography. SIGIR Forum (ACM) 6, 3 (Fall 1973), 11-14.
 
10
CROUCH, D.B. A file organization and maintenance procedure for dynamic document collections. Inform. Processing and Manage. 11 (1975), 11-21.
 
11
DALe., A.G. Some clumping experiments for associative document retrieval. Amer. Documentation 16, 1 (Jan. 1965), 5-9.
 
12
DATTOLA, R.T. Experiments with a fast clustering algorithm for automatic classification. In The SMART Retrieval System--Experiments in Automatic Document Processing, G. Salton, Ed., Prentice-Hall, Englewood Cliffs, N.J., 1971, chap. 12.
 
13
DIDAY, E. The dynamic cluster method and sequentialization in nonhierarchical clustering. Int. J. Comptr. and Inform. Sci. 2, 1 (1973), 63-69.
 
14
DOYLE, L.B. Some compromises between word grouping and document grouping. In Proc. Syrup. Statist. Assoc. Methods for Mechanized Documentation, M.E. Stevens, V.E. Giuliaho, and L.B. Heilprin, Eds., Nat. Bur. Stand., Miscellaneous Pub. 269, Washington, D.C., pp. 15-24.
 
15
EVERITT, B.S. Cluster Analysis. Halstead Press, London, 1974.
16
 
17
 
18
JACKSOS, D.M. The construction of retrieval environments and pseudo-classification based on external relevance. Inform. Storage and Retrieval 6, 2 (June 1970), 187-219.
 
19
JARDINE, N., AND SIBSON, R. Mathematical Taxonomy. Wiley, New York, 1971.
 
20
JARDINE, N., AND VAN RIJSBERGEN, C.J. The use of hierarchic clustering in information retrieval. Inform. Storage and Retrieval 7, 5 (1971), 217-240.
 
21
JOHNSON, D.B., AND LAFUENTE, J.M. A controlled single pass algorithm with applications to multi-level clustering. Rep. No. ISR-17 to NSF, ComeU U., Ithaca, N.Y., 1970.
 
22
 
23
MARATHE, V., AND RXEBER, S. The single pass cluster method. Rep. No. ISR-16 to NSF, CorneU U., Ithaca, N.Y., 1969.
 
24
MINKER, J., WILSON, G.A., AND ZIMMERMAN, B.H. An evaluation of query expansion by the addition of clustered terms for a document retrieval system. Inform. Storage and Retrieval 8, 6 (Dec. 1972), 329-348.
 
25
NEEDHAM, R.M. Applications of the theory of clumps. Mechanical Translation 8, 3-4 (June 1965), 113-127.
 
26
NEEDHAM, R.M., AND SPARCK JONES, K. Keywords and clumps. J. Documentaton 20, 1 (March 1964}, 5-15.
 
27
PRICE, N., AND SCHIMINOVICH, S. A clustering experiment: first step towards a computergenerated classification scheme. Inform. Storage and Retrieval 4, 3 (Aug. 1968), 271-280.
 
28
Roccmo, J.J. Document retrieval systems--optimization and evaluation. Rep. No. ISR-10 to NSF, Harvard Comput. Lab., Cambridge, Mass., 1966.
29
 
30
SALTOS, G. Experiments in automatic thesaurus construction for information retrieval. Information Processing 71, North-Holland Pub. Co., Amsterdam, 1972, pp. 115-123.
 
31
 
32
SALTON, G., YANC, C.S., AND YU, C.T. A theory of term importance in automatic text analysis. J. ASIS 26, 1 (Jan.-Feb. 1975), 33-44.
 
33
SCHIMINOVlCH, S. Automatic classification and retrieval of documents by means of a bibliographic pattern discovery algorithm. Inform. Storage and Retrieval 6, 6 (May 1971), 417-435.
 
34
SNEATH, P.H.A., AND SOKAL, R.R. Numerical Taxonomy. W.H. Freeman, San Francisco, 1973.
 
35
SPARCK JONES, K. Automatic Keyword Classifications. Butterworths, London, 1971.
 
36
SPARCK JONES, K., AND JACKSON, D.M. The use of automatically obtained keyword classifications for information retrieval. Inform. Storage and Retrieval 5, 4 (Feb. 1970), 175-201.
 
37
SPARCK JONES, K., AND NEEDHAM, R.M. Automatic term classifications and retrieval. Inform. Storage and Retrieval 4, 2 (June 1968), 91-100.
 
38
VAN RIJSBERGEN, C.J. A fast hierarchical clustering algorithm. Comptr. J. I3, 3 (Aug. 1970), 324-326.
 
39
 
40
VASWANI, P.K.T. A technique for cluster emphasis and its application to automatic indexing. Information Processing 68, Vol. II, North-Holland Pub. Co., Amsterdam, 1969, pp. 1300-1303.
 
41
WHITE, L.J., et al. CIRC II Data Base Classifications. Final Tech. Rep. RADC-77-211, Ohio State U., Columbus, Ohio, June 1977.
 
42
WILLIAMSON, R.E. Real time document retrieval. Ph.D. Th., Cornell U., Ithaca, N.Y., 1974.
43
44

CITED BY  29